Project ideas from Hacker News discussions.

End of an era for me: no more self-hosted git

📝 Discussion Summary (Click to expand)

1. AI‑driven scrapers are flooding sites
- “The big nasty AI bots use 10s of thousands of IPs distributed all over China.” – JohnTHaller
- “They scrape everything they can find, indiscriminately, including endpoints that have to do quite a bit of work.” – Tharre
- “The attack is continuous.” – mzajc

2. Centralisation of protection services is eroding self‑hosting
- “Cloudflare seems to be taking over all of the last mile web traffic, and this extreme centralisation sounds really bad to me.” – Shorel
- “AWS / Azure / Cloudflare total centralisation means no one will be able to self‑host anything.” – Shorel
- “I can take all my self‑hosted stuff and stick it behind centralised enterprise tech to solve a problem caused by enterprise tech.” – denkmoon

3. Practical mitigation tactics are being shared
- “I added a robots.txt with explicit UAs for known scrapers.” – rozab
- “Rate limit read‑only access at the very least.” – PaulDavisThe1st
- “GeoIP blocking does wonders – just 5 countries are responsible for 50 % of all requests.” – mzajc
- “Fail2ban has decent jails for Apache httpd.” – GuestFAUniverse

4. Legal, ethical and business debates around AI scraping
- “Legislation is lax right now if you claim the purpose of scraping is for AI training even for copyrighted material.” – devsda
- “The point of the post was how something useless (AI) and its poorly implemented scrapers is wrecking havoc.” – isodev
- “Cloudflare is trying to monetise ‘protection from AI’ is just another grift.” – isodev
- “Some argue that AI companies are using residential proxies.” – esseph

These four themes—AI‑driven traffic, centralisation concerns, mitigation strategies, and the surrounding legal/ethical debate—capture the core of the discussion.


🚀 Project Ideas

BotGuard Middleware

Summary

  • A lightweight Nginx/Envoy module that detects AI‑bot traffic patterns (user‑agent, request frequency, IP reputation) and serves a custom “poison” payload to deter training.
  • Provides configurable rate limits, IP blacklists, and a honeypot endpoint that logs suspicious activity.

Details

Key Value
Target Audience Self‑hosted site owners, open‑source projects, small businesses
Core Feature Real‑time AI‑bot detection + poisoning response + rate limiting
Tech Stack Go (for module), Lua scripts for Nginx, Docker, Prometheus metrics
Difficulty Medium
Monetization Revenue‑ready: subscription + open‑source core

Notes

  • HN commenters lament “AI bots ignoring robots.txt” and “scrapers hammering Git repos”. BotGuard gives them a cheap, self‑hosted defense.
  • The poison payload can be a short text that explains the site’s policy, discouraging data collection and sparking discussion on bot ethics.

GitShield

Summary

  • A Docker‑based wrapper for Git hosting (Gitea/Forgejo) that exposes only the HEAD of each branch, blocks per‑commit URLs, and enforces strict rate limits.
  • Includes a simple UI for configuring allowed endpoints and a honeypot for detecting malicious crawlers.

Details

Key Value
Target Audience Open‑source maintainers, hobbyist developers, small teams
Core Feature Minimal Git HTTP API + anti‑scraper rate limiting
Tech Stack Docker, Go (Gitea), Nginx, Lua, SQLite for logs
Difficulty Medium
Monetization Hobby

Notes

  • Users like “m95d” and “mzijc” complained about “100k requests per day” to commit URLs. GitShield removes those endpoints, drastically cutting traffic.
  • The project encourages discussion on how to balance open access with protection, a hot topic on HN.

BotSight Analytics

Summary

  • A self‑hosted dashboard that aggregates access logs from multiple sites, identifies bot traffic patterns, and auto‑generates firewall rules (iptables/nftables, Cloudflare API).
  • Provides visual insights into IP ranges, user‑agents, request rates, and geographic distribution.

Details

Key Value
Target Audience Sysadmins, DevOps, site owners dealing with bot floods
Core Feature Log ingestion, bot detection, rule generation
Tech Stack Python (Flask), ElasticSearch, Kibana, Docker
Difficulty High
Monetization Revenue‑ready: tiered SaaS with free tier

Notes

  • Commenters like “snorremd” and “mqus” already use Crowdsec; BotSight offers a unified, open‑source alternative that can feed into existing firewalls.
  • The auto‑rule feature sparks debate on the ethics of automated blocking and the balance between security and accessibility.

RequestPay

Summary

  • An open‑source, self‑hosted micro‑service that lets site owners monetize traffic by charging per HTTP request (similar to Cloudflare’s pay‑per‑crawl).
  • Supports per‑endpoint pricing, usage limits, and integrates with Stripe for payouts.

Details

Key Value
Target Audience Content creators, API providers, self‑hosted sites
Core Feature Request‑level billing, analytics, API key management
Tech Stack Node.js, Express, PostgreSQL, Stripe SDK, Docker
Difficulty Medium
Monetization Revenue‑ready: subscription + transaction fees

Notes

  • “simonw” mentioned Cloudflare’s pay‑per‑crawl; RequestPay gives the same model to anyone who wants to keep control of their infrastructure.
  • The idea invites discussion on whether monetizing every request is sustainable and how it affects user experience.

Read Later