1. AI‑driven scrapers are flooding sites
- “The big nasty AI bots use 10s of thousands of IPs distributed all over China.” – JohnTHaller
- “They scrape everything they can find, indiscriminately, including endpoints that have to do quite a bit of work.” – Tharre
- “The attack is continuous.” – mzajc
2. Centralisation of protection services is eroding self‑hosting
- “Cloudflare seems to be taking over all of the last mile web traffic, and this extreme centralisation sounds really bad to me.” – Shorel
- “AWS / Azure / Cloudflare total centralisation means no one will be able to self‑host anything.” – Shorel
- “I can take all my self‑hosted stuff and stick it behind centralised enterprise tech to solve a problem caused by enterprise tech.” – denkmoon
3. Practical mitigation tactics are being shared
- “I added a robots.txt with explicit UAs for known scrapers.” – rozab
- “Rate limit read‑only access at the very least.” – PaulDavisThe1st
- “GeoIP blocking does wonders – just 5 countries are responsible for 50 % of all requests.” – mzajc
- “Fail2ban has decent jails for Apache httpd.” – GuestFAUniverse
4. Legal, ethical and business debates around AI scraping
- “Legislation is lax right now if you claim the purpose of scraping is for AI training even for copyrighted material.” – devsda
- “The point of the post was how something useless (AI) and its poorly implemented scrapers is wrecking havoc.” – isodev
- “Cloudflare is trying to monetise ‘protection from AI’ is just another grift.” – isodev
- “Some argue that AI companies are using residential proxies.” – esseph
These four themes—AI‑driven traffic, centralisation concerns, mitigation strategies, and the surrounding legal/ethical debate—capture the core of the discussion.