Perplexity is repeatedly modifying their user agent and changing IPs and ASNs to hide their crawling activity, in direct conflict with explicit no-crawl preferences expressed by websites.
The article explains PerplexityBot respects robots.txt, but then sends a different request with a different IP and different user-agent. They could very well be using a different method to walk around it.
The only surprising thing to me from this article is that OpenAI actually follows the rules for bot crawlers.
Or they haven’t been caught yet.
The article explains PerplexityBot respects robots.txt, but then sends a different request with a different IP and different user-agent. They could very well be using a different method to walk around it.
The article explains how they tested for that, and as far as they could tell OpenAI is respecting the rules.