Ask HN: How are you doing RAG locally?

📝 Discussion Summary (Click to expand)

Here are the 4 most prevalent themes from the Hacker News discussion on local RAG and document retrieval:

1. Hybrid Search is Superior to Pure Vector Search

Many users argue that combining traditional keyword search (BM25, FTS) with vector similarity search ("hybrid search") provides better results than using vector embeddings alone, especially for code and structured data.

"IME most documentation is coming from the web via web search. I like agentic RAG for this case, which you can achieve easily with a Claude Code subagent." — CuriouslyC

"keyword search is superior to embeddings based search. The moment google switched to bert based embeddings for search everyone agreed it was going down hill." — Der_Einzige

"Most forms of early enshittification were simply switching off BM25 to embeddings based search." — Der_Einzige

2. Simplicity and Using Existing Tools (SQLite, Grep, Ripgrep)

A significant portion of the discussion emphasizes the effectiveness of simple, existing tools like grep, ripgrep, and SQLite (with FTS5) over complex vector databases for local use cases.

"For all intents and purposes, running gpt-oss 20B in a while loop with access to ripgrep works pretty dang well." — postalcoder

"SQLite FTS5 works well." — porridgeraisin

"Well, I already had the index so I just wired it up to my mastra agents. This took about one hour to set up and works very well." — esperent

3. The Definitional Debate: What Actually Constitutes RAG?

There is confusion and debate over the definition of RAG itself, specifically whether retrieval requires a vector database or if simpler keyword-based retrieval qualifies.

"Well, that is what the acronym stands for. But every source I've ever seen quickly follows by noting it's retrieval backed by a vectordb." — esperent

"What you described is a perfect example of a RAG. An embedding-based search might be more common, but that's a detail." — dmos62

"Retrieval-augmented generation. What you described is a perfect example of a RAG." — dmos62

4. Code Search Is Unique (Embeddings Are Bad for Code)

Multiple users specifically called out that vector embeddings are often poor for searching code, suggesting alternatives like BM25, trigrams, or AST-based tools (Tree-sitter, ast-grep) instead.

"Don't use a vector database for code, embeddings are slow and bad for code. Code likes bm25+trigram, that gets better results while keeping search responses snappy." — CuriouslyC

"Code likes bm25+trigram, that gets better results while keeping search responses snappy." — CuriouslyC

"I don't see ast-grep as being very useful to an agent. What a coding agent needs is to be able to locate portions of source code relevant to what it has been tasked with..." — HarHarVeryFunny

🚀 Project Ideas

Local Document Search Engine with Hybrid Retrieval

Summary

A lightweight, single-binary local search engine that combines traditional keyword search (BM25) with modern vector search (embeddings) in one unified interface.
Solves the frustration of having to choose between speed (BM25/FTS) and semantic understanding (vectors) or managing multiple complex systems like Elasticsearch for a personal/local setup.
Core value proposition: "Best of both worlds" search on your local documents without the operational overhead of enterprise tools.

Details

Key	Value
Target Audience	Developers, researchers, and power users with personal knowledge bases who want enterprise-grade search on their own machine.
Core Feature	Single command to index a directory of documents (PDFs, markdown, text) and run hybrid queries combining keyword matching and semantic similarity with a unified relevance score.
Tech Stack	Rust or Go (for a single static binary), SQLite for metadata/FTS5, ONNX Runtime for embedding inference, and either Faiss (CPU) or sqlite-vec for vector storage.
Difficulty	Medium
Monetization	Revenue-ready: "Pro" version with GPU acceleration, advanced PDF table extraction, and pre-quantized embedding models for slower hardware. Free tier covers the core hybrid search.

Notes

HN users express a clear need for hybrid search without complexity: "Anybody know of a good service / docker that will do BM25 + vector lookup without spinning up half a dozen microservices?" and "SQLite with FTS5" being praised for simplicity, yet others recognize "keyword search is superior to embeddings based search" for specific cases.
This tool directly addresses the "operational overhead" complaint against Elasticsearch ("very complicated to operate compared to more modern alternatives") by providing a zero-config local alternative.
Practical utility is high as it allows users to experiment with hybrid search strategies (e.g., RRF fusion) on their own data without cloud costs or complex infrastructure setup.

Codebase-Aware Indexer for AI Agents

Summary

A CLI tool that generates a static, context-efficient index of a codebase specifically optimized for AI agents (Claude Code, Cursor), using AST parsing rather than naive chunking.
Solves the problem that vector embeddings are "slow and bad for code" and naive grep lacks semantic precision for code understanding and refactoring tasks.
Core value proposition: Enable AI agents to perform "jump to definition" and "find all usages" style queries at scale without loading entire files into context.

Details

Key	Value
Target Audience	Developers using AI coding assistants (Claude Code, Cursor, Copilot) on large, unfamiliar, or legacy codebases.
Core Feature	Parses source code into an AST (Abstract Syntax Tree) to index identifiers, function signatures, and import paths. Provides a query interface that returns precise code locations (file, line, function) rather than generic text chunks.
Tech Stack	Tree-sitter (for language-agnostic parsing), Rust (for speed), and a simple local file-based index format (e.g., JSON or SQLite).
Difficulty	High
Monetization	Hobby (open source). Potential for a SaaS that continuously indexes GitHub repositories for teams.

Notes

Users explicitly reject standard embeddings for code: "Don't use a vector database for code, embeddings are slow and bad for code. Code likes bm25+trigram" and "LLMs can be told that their RAG tool is using BM25+N-grams, and will search accordingly."
The lack of LSP utility in non-editor contexts ("LSP is not great for non-editor use cases. Everything is cursor position oriented") creates a gap for a tool that provides code intelligence via a CLI or API.
This provides a practical alternative to "grepping," which agents often struggle with due to low precision, improving the quality of code suggestions without heavy infrastructure.

Static Embedding Distiller for Local RAG

Summary

A utility that fine-tunes and compresses high-quality embedding models (like snowflake-arctic-embed) into tiny, static lookup tables for instant-on local retrieval.
Solves the hardware barrier to using modern embeddings locally; users want "static embedding models... 1ms on gpu" but lack the tools to create them for their specific domain.
Core value proposition: Enterprise-quality semantic search on a Raspberry Pi or older laptop with zero inference latency.

Details

Key	Value
Target Audience	Hobbyists and professionals running local LLMs on consumer hardware (M1/M2, older laptops, edge devices).
Core Feature	Converts a transformer-based embedding model into a static matrix (via distillation or quantization) stored in a single file. Search is performed via simple matrix multiplication (dot product), removing the need for a runtime like ONNX or PyTorch.
Tech Stack	Python (PyTorch, HuggingFace Transformers), ONNX (for initial export), and C++/Rust for the final runtime.
Difficulty	Medium
Monetization	Revenue-ready: Paid toolchain for custom model distillation + hosting for pre-distilled domain-specific models (e.g., "Legal-Search-Static-7M").

Notes

There is strong interest in speed: "Static embedding models im finding quite fast... 1ms on gpu" and the discussion around minimizing resource usage for local setups.
The pain point of models being "dumb at the quant level I'm running to be relatively fast" applies to embeddings as well; static models solve the trade-off between speed and quality.
This bridges the gap between high-quality retrieval (usually reserved for large GPU clusters) and the reality of local user hardware, enabling "RAG for documentation retrieval" on limited machines.

Unified Local RAG Orchestrator (The "Drop-in" Grepgrep)

Summary

An all-in-one local desktop app (or background daemon) that manages document ingestion, chunking, hybrid indexing, and provides a unified API (CLI/MCP) for other tools.
Solves the fragmentation frustration where users have to stitch together "Ollama, Streamlit, Chromadb, and Docling" manually.
Core value proposition: A "set it and forget it" local backend for personal knowledge management that works out of the box with existing AI tools via MCP.

Details

Key	Value
Target Audience	Non-dev power users (writers, researchers, students) and developers who want a "Drop-in" solution rather than building pipelines.
Core Feature	GUI to drag-and-drop documents, automatic background re-indexing of changed files, and a standard MCP server interface that exposes a semantic search tool to Claude Code/Cursor.
Tech Stack	Electron/Tauri (for desktop UI), SQLite (for state), local embedding server (e.g., Ollama or Python), and a standardized MCP implementation.
Difficulty	High
Monetization	Revenue-ready: "Freemium" desktop app. Free for basic file indexing; paid for advanced features like web-crawl integration, OCR, and collaborative sharing (local network).

Notes

Users express desire for simplicity: "I just use a web server and a search engine" and "SQLite works shockingly well." However, they lack a cohesive tool that ties these pieces together seamlessly.
The specific request for "SQLite with FTS5" combined with the discussion of local MCP servers (e.g., Nextcloud MCP) indicates a market for a unified local "backend-as-a-service" for personal AI.
This solves the "constant need to re-index files was annoying and a drag" complaint by handling file watching and incremental updates automatically, making the user's data always ready for retrieval.

Ask HN: How are you doing RAG locally?

1. Hybrid Search is Superior to Pure Vector Search

2. Simplicity and Using Existing Tools (SQLite, Grep, Ripgrep)

3. The Definitional Debate: What Actually Constitutes RAG?

4. Code Search Is Unique (Embeddings Are Bad for Code)

🚀 Project Ideas

Local Document Search Engine with Hybrid Retrieval

Summary

Details

Notes

Codebase-Aware Indexer for AI Agents

Summary

Details

Notes

Static Embedding Distiller for Local RAG

Summary

Details

Notes

Unified Local RAG Orchestrator (The "Drop-in" Grepgrep)

Summary

Details

Notes

Read Later