1. Cloudflare’s new “crawl” API is a game‑changer for AI and scraping
“Cloudflare’s network now supports real‑time content conversion at the source … when AI systems request pages … they can express the preference for text/markdown … the network will automatically and efficiently convert the HTML to markdown, when possible, on the fly.” – selcuka
“Cloudflare’s /crawl endpoint respects robots.txt directives, including crawl‑delay.” – arjie
“Cloudflare is just skating to where the puck is going to be on this one.” – jppope
2. The service raises serious privacy, copyright and abuse concerns
“Offering wholesale cache dumps blows up every assumption about origin privacy and copyright.” – hrmtst93837
“It is a short path from ‘helpful pre‑scraped JSON’ to handing an entire site to an AI scraper‑for‑hire with zero friction.” – hrmtst93837
“They are selling the solution to avoid their own content crawler.” – superkuh
3. Technical realities: cost, performance, and compliance
“Really hard to understand costs here. What is a reasonable pages per second?” – binarymax
“The practical gotcha for forum archival is pagination and authentication‑gated content.” – devnotes77
“The big question here is this a verified‑bot on the Cloudflare WAF?” – coreq
4. Cloudflare’s dual role as protector and provider creates a conflict of interest
“They are selling the wall and the ladder.” – allixsenos
“Cloudflare is a mafioso. They create the problem and then sell you the solution to themselves.” – superkuh
“Cloudflare’s /crawl respects robots.txt. It does not attempt to bypass any anti‑crawling measures.” – kentonv
“They are a monopoly that hurts for years to come.” – isodev