Project ideas from Hacker News discussions.

We replaced RAG with a virtual filesystem for our AI documentation assistant

📝 Discussion Summary (Click to expand)

Threedominant themes from the discussion

  • Rediscovering non‑embedding, “library‑style” search
    Softwaredoug notes that people are returning to traditional, file‑system‑based semantic search that resembles how librarians organize shelves.

    "The real thing I think people are rediscovering with file system based search is that there’s a type of semantic search that’s not embedding based retrieval." – softwaredoug

  • Agents can drive any retrieval backend (Lucene, ontological NLP, etc.) Morkalork demonstrates that letting an LLM interact with a Lucene index yields strong results, showing retrieval is not limited to vector‑DB pipelines.

    "Doesn't have to be tho, I've had great success letting an agent loose on an Apache Lucene instance. Turns out LLMs are great at building queries." – morkalork

  • Practical and cost hurdles in real‑world deployments
    Mandeeepj highlights the steep cost of sandbox environments, questioning the viability of $70k‑plus annual expenses. Meanwhile, pboulos points out that messy organizational structures make RAG adoption especially hard.

    "even a minimal setup ... would put us north of $70,000 a year..." – mandeeepj
    "From personal experience, getting RAG to work well in places where the structure of the organisation ... is far from hierarchical ... is a very hard task." – pboulos


🚀 Project Ideas

Generating project ideas…

Librarian‑StyleSemantic File Search

Summary

  • A desktop‑style search UI that organizes files into domain‑based “shelves” and lets users query them with natural‑language terms, mimicking librarian intuition.
  • Provides deterministic, reversible search results without relying on embeddings or opaque vector similarity.

Details

Key Value
Target Audience Knowledge workers, researchers, developers managing large local collections of notes, docs, and code.
Core Feature Hierarchical domain indexing with NL query parsing that maps to folder paths and optional Boolean filters.
Tech Stack Node.js/TypeScript front‑end, SQLite/TinyDB for storage, Rust fuzzy‑matcher, optional Electron wrapper.
Difficulty Medium
Monetization Hobby

Notes- HN commenters highlighted “rediscovering semantic search that works like a librarian” and “LLMs are great at building queries,” showing clear community enthusiasm.

  • Addresses a concrete pain point: users want interpretable, editable search results rather than black‑box embedding matches.

AgentQueryEngine

Summary

  • A lightweight tool that lets LLM agents issue natural‑language queries against a local Lucene (or similar inverted‑index) database, auto‑generating the appropriate indexing and retrieval calls.
  • Enables agents to treat file‑system primitives as first‑class tools, reducing reliance on heavyweight VM sandboxes.

Details

Key Value
Target Audience Developers building agentic RAG pipelines, hobbyists experimenting with local LLMs, and teams needing low‑overhead knowledge search.
Core Feature Natural‑language to index‑query translation that drives Lucene queries, with fallback to plain‑text file reads.
Tech Stack Python backend (Whoosh/Lucene‑U), FastAPI for API, Docker for optional sandbox, JavaScript front‑end for agent integration.
Difficulty Medium
Monetization Revenue-ready: subscription tier $9/mo for cloud‑hosted index service + usage‑based compute.

Notes- Community remarks such as “LLMs are great at building queries” and “agents can call any retrieval backend” confirm demand for this capability.

  • Offers immediate utility for anyone wanting fast, executable search without spinning up full VMs.

SandboxCostSaver

Summary- A managed platform that provides ultra‑low‑cost, short‑lived sandbox environments for LLM agents (e.g., 1 vCPU, 2 GiB RAM for $0.005/h).

  • Dynamically scales resources and bills per second, dramatically lowering the barrier for experimentation.

Details

Key Value
Target Audience Start‑ups, indie hackers, and hobbyists developing agent‑based workflows who are constrained by high sandbox pricing.
Core Feature Pay‑per‑second VM provisioning with pre‑configured LLM runtimes and isolated network storage.
Tech Stack K3s + Kube‑virt for lightweight VMs, Prometheus for usage metering, Stripe for billing.
Difficulty High
Monetization Revenue-ready: usage‑based pricing with a free tier up to 100 hrs/month; $0.005 per vCPU‑hour and $0.001 per GiB‑hour thereafter.

Notes- Direct response to HN concerns about $70k/year sandbox costs (“how about if we round off one zero?”), indicating a sizable market for affordable alternatives.

  • Could spark discussion on sustainable pricing models for developer sandboxes while providing immediate, practical utility.

Read Later