Waiting for dawn in search: Search index, Google rulings and impact on Kagi

📝 Discussion Summary (Click to expand)

Based on the Hacker News discussion, here are the four most prevalent themes expressed by users:

1. The Exorbitant Technical and Financial Difficulty of Building a Viable Search Index

Users widely acknowledge the immense barrier to entry for creating a search index. Several arguments are presented against the feasibility for new competitors: the staggering cost (even Microsoft spent "$100 billion over 20 years" on Bing to limited effect), the technical complexity of not just crawling but ranking, and the challenge of maintaining fresh data.

"Building a comparable one from scratch is like building a parallel national railroad." — ghm2199

"Microsoft spent roughly $100 billion over 20 years on Bing and still holds single-digit share. If Microsoft cannot close the gap, no startup can do it alone." — monooso

2. Google’s Anti-Competitive "Ladder-Pulling" and Enforcement of Double Standards

There is a consensus that Google leverages its monopoly to stifle competition. Users note that Google built its index without strict adherence to robots.txt or consent, yet now enforces strict terms against others. This is framed as a "climbing the wall and pulling the ladder up" scenario.

"Google built its index by crawling the open web before robots.txt was a widespread norm... Google now enforces ToS and robots.txt against others from a position of monopoly power it accumulated without those constraints." — oh_fiddlesticks

"A classic case of climbing the wall, and pulling the ladder up afterward. Others try to build their own ladder, and Google uses their deep pockets and political influence to knock the ladder over before it reaches the top." — baggachipz

3. The Critique of Kagi’s Reliance on Third-Party Google Results

A significant portion of the discussion focuses on Kagi’s business model. Critics argue that Kagi relies on third-party APIs (like SerpAPI) to obtain Google results, contradicting claims of independence and raising privacy concerns. There is skepticism about Kagi’s "own small-web index" and whether they are genuinely offering an alternative or simply reselling Google results with a privacy layer.

"Crazy for a company to admit: 'Google won't let us whitelabel their core product so we steal it and resell it.'" — xnx

"As a customer, the major implication of this is that even if Kagi's privacy policy says they try to not log your queries, it is sent to Google and still subject to Google's consumer privacy policy." — whs

4. The Iron Grip of "Search" as a Monopoly and Verb

Users discuss the entrenched nature of Google not just as a service, but as a cultural default. Even when using alternatives like Kagi or DDG, users often default to saying "I'll Google it," highlighting the difficulty of displacing a brand that has become genericized. This serves as a metaphor for the technical and psychological monopoly Google holds.

"Google is a verb, nobody can compete with that level of mindshare." — hamdingers

"I'd hear people say 'I'll Google that', then use Yahoo when they were still a major search engine." — pixl97

🚀 Project Ideas

Search API Marketplace

Summary

Solves the critical dependency on single-point-of-failure providers like SerpAPI by creating a decentralized marketplace for search result data.
Provides developers and companies with multiple, reliable sources for search results, reducing risk and increasing competition.
Core value proposition is resilience and cost optimization through diversified sourcing, rather than relying on one provider.

Details

Key	Value
Target Audience	Developers building search features, small-to-medium search engines (like Kagi), SEO tools, and data aggregators.
Core Feature	A unified API that aggregates results from multiple SERP providers (including SerpAPI, Nozzle, ScaleSERP, etc.) and developers who sell direct access to their crawl data.
Tech Stack	Go/Python for the API layer, PostgreSQL for metadata, Redis for caching/rate limiting, Kubernetes for orchestration.
Difficulty	Medium
Monetization	Revenue-ready: Transaction fee model (e.g., 10-15%) on data sales, or a SaaS subscription for managed aggregation services.

Notes

HN commenters highlighted the fragility of relying on SerpAPI (e.g., "Kagi needs to pen this because if SerpAPI shuts down they may lose access to Google").
Practical utility is high for any startup dependent on search data, offering a hedge against provider outages or price hikes.
Building a standardized adapter layer is a concrete technical problem that adds immediate value.

Small-Web Custom Indexing Tool

Summary

Addresses the frustration with the declining quality of mainstream search results dominated by SEO spam and AI-generated slop.
Allows users to build and maintain their own personal or niche search index of trusted sources (blogs, documentation, forums).
Core value proposition is user agency: you search what you choose to index, bypassing the noise of the open web.

Details

Key	Value
Target Audience	Power users, researchers, and developers who want curated, high-signal results without noise.
Core Feature	A local-first application that crawls a user-defined list of RSS feeds, blogrolls, or specific domains, indexes them, and provides a local search interface.
Tech Stack	Rust or Go (for performance), SQLite (for local storage), Headless browser or simple HTTP client for crawling.
Difficulty	Low to Medium
Monetization	Hobby (Open Source) or Revenue-ready: One-time purchase for a desktop app, or a managed cloud version for teams.

Notes

Commenters like renegat0x0 and SyneRyder mentioned building their own "personal internet" indexes using SQLite and bookmarks.
This tool formalizes that approach, making it accessible to non-engineers who want to escape Google's degradation.
Directly addresses the "small web" and "personal index" desires expressed in the discussion.

Monetized Crawl-for-Access Gateway

Summary

Solves the "chicken & egg" problem of new search engines struggling to crawl the web due to bot detection and restrictive robots.txt policies enforced by incumbents.
Provides a bridge where site owners can opt-in to allow crawling in exchange for micropayments or revenue sharing, bypassing the need for complex legal agreements.
Core value proposition is creating a financial incentive for site owners to welcome new crawlers, rather than blocking them.

Key	Value
Target Audience	Web publishers, niche site owners, and new search engine startups needing fresh data.
Core Feature	A middleware that sits between a crawler and a website, handling authentication, access control, and micropayments based on crawl volume or access requests.
Tech Stack	Node.js/Python, Stripe/Smart Contracts for payments, Cloudflare Workers/Serverless functions for the gateway.
Difficulty	High (Business & Tech integration)
Monetization	Revenue-ready: Transaction fees on micropayments or subscription fees for the gateway service.

Notes

This addresses the argument that "If a site doesn't want me to crawl them, that's fine. I probably don't need them" by turning it into a transaction rather than a block.
HN users discussed the difficulty of crawling in a post-Google world where bot protection is aggressive; this provides a legal/financial bypass.
It turns the adversarial relationship between crawler and server into a cooperative one.

"Verified Human Content" Search Layer

Summary

Targets the unmet need for search results free from AI-generated slop and SEO-optimized spam.
A browser extension or service that filters search results (from any engine) based on community-maintained blocklists of low-quality domains and AI content markers.
Core value proposition is quality filtering: delivering only human-curated or human-written content by leveraging community crowdsourcing.

Key	Value
Target Audience	Users frustrated with the degrading quality of Google/Kagi/Bing results (e.g., users who block "userbenchmark" or AI-generated listicles).
Core Feature	A plugin that intercepts search engine requests and responses, applying a dynamic filter list (similar to uBlock Origin but for domain quality rather than ads).
Tech Stack	Browser Extension (JavaScript/TypeScript), Local storage for preferences, Optional backend for syncing blocklists.
Difficulty	Low
Monetization	Hobby (Free/Open Source) or Revenue-ready: Freemium model (free lists, paid for advanced customization or priority updates).

Notes

Multiple commenters expressed a desire to block specific types of sites (e.g., m-schuetz: "Blocking, pinning and the general quality").
This tool provides an immediate solution to the "slop" problem without waiting for search engines to improve their algorithms.
It leverages the community aspect of HN, where users often share lists of domains to avoid.

Waiting for dawn in search: Search index, Google rulings and impact on Kagi

1. The Exorbitant Technical and Financial Difficulty of Building a Viable Search Index

2. Google’s Anti-Competitive "Ladder-Pulling" and Enforcement of Double Standards

3. The Critique of Kagi’s Reliance on Third-Party Google Results

4. The Iron Grip of "Search" as a Monopoly and Verb

🚀 Project Ideas

Search API Marketplace

Summary

Details

Notes

Small-Web Custom Indexing Tool

Summary

Details

Notes

Monetized Crawl-for-Access Gateway

Summary

Notes

"Verified Human Content" Search Layer

Summary

Notes

Read Later