Cloudflare: Perplexity uses stealth crawling techniques, like undeclared user agents and rotating IP addresses, to evade robots.txt rules and network blocks

Pro@mander.xyz · edit-2 5 months ago

Cloudflare: Perplexity uses stealth crawling techniques, like undeclared user agents and rotating IP addresses, to evade robots.txt rules and network blocks

CarbonatedPastaSauce@lemmy.world · 5 months ago

The only surprising thing to me from this article is that OpenAI actually follows the rules for bot crawlers.

0_o7@lemmy.dbzer0.com · 5 months ago

Or they haven’t been caught yet.

The article explains PerplexityBot respects robots.txt, but then sends a different request with a different IP and different user-agent. They could very well be using a different method to walk around it.

CarbonatedPastaSauce@lemmy.world · 5 months ago

The article explains how they tested for that, and as far as they could tell OpenAI is respecting the rules.

Cloudflare: Perplexity uses stealth crawling techniques, like undeclared user agents and rotating IP addresses, to evade robots.txt rules and network blocks

Cloudflare: Perplexity uses stealth crawling techniques, like undeclared user agents and rotating IP addresses, to evade robots.txt rules and network blocks

Perplexity is using stealth, undeclared crawlers to evade website no-crawl directives