Project ideas from Hacker News discussions.

Show HN: Use Claude Code to Query 600 GB Indexes over Hacker News, ArXiv, etc.

📝 Discussion Summary (Click to expand)

1. Praise for SQL + Embeddings Approach

Users laud the tool's use of SQL generation over black-box LLMs for precise, controllable research queries.

"I like that this relies on generating SQL rather than just being a black-box chat bot. It feels like the right way to use LLMs for research" - barishnamazov
"people want precision and control sometimes. Also it's very hard to beat SQL query planners" - Xyra

2. Calls for Open-Sourcing Amid Funding Needs

Many urge open-sourcing for accessibility/self-hosting, while Xyra cites financial barriers like server costs.

"Seems very cool, but IMO you’d be better off doing an open source version and then hosted SAAS" - bugglebeetle
"I could make it open-source as soon as I have $5k to my name. I've been in survival mode frankly" - Xyra
"Nice, but would you consider open-sourcing it? I (and I assume others) are not keen on sharing my API keys" - mentalgear

3. Debate on AGI Definitions and Hype

A tangent critiques hyperbolic claims (e.g., "Claude... essentially AGI"), debating AGI as human-level generality vs. current tools.

""Claude Code and Codex are essentially AGI at this point" Okaaaaaaay...." - octoberfranklin
"If someone wrote a definition of AGI 20 years ago, we would probably have met that" - Closi
"Current AI is a transcript generator... It has no goals, it just responds with text" - andy99


🚀 Project Ideas

Self-Hosted Scry Core

Summary

  • Open-source version of the Scry RAG database (Postgres with embeddings) for self-hosting, allowing users to ingest HN/arXiv/LessWrong data or custom corpora and query via SQL + vectors without external APIs.
  • Core value: Enables free, private, local LLM querying (e.g., Llama/Qwen) eliminating Claude credit costs and API key sharing.

Details

Key Value
Target Audience Indie devs, researchers building autonomous agents (e.g., 7777777phil), self-hosting enthusiasts (nineteen999, mentalgear)
Core Feature Dockerized Postgres setup script + ingestion pipelines for public sources + text-to-SQL prompt templates for local LLMs
Tech Stack Postgres/PGVector, Rust for ingestion, Ollama for local LLMs, sql.js-httpvfs for static serving
Difficulty Medium
Monetization Revenue-ready: Hosted version ($10/mo)

Notes

  • "Really useful currently working on a autonomous academic research system... Any plans of making this open source?" (7777777phil); "not keen on sharing my API keys with a 3rd party" (mentalgear).
  • HN loves OSS + self-hosting; high utility for agentic workflows like gia-agentic-short.

Custom Corpus Embedder

Summary

  • CLI tool to scrape, chunk, and embed custom datasets (e.g., paper supplements, leaks like Panama Papers, PubMed) into a local Postgres vector DB, with cheap embedding via open models.
  • Core value: Democratizes "state-of-the-art" RAG for niche research without funding barriers, solving partial embeddings and source scaling issues.

Details

Key Value
Target Audience Researchers (bonsai_spool for biomed supplements, nathan_f77 for physics), data hoarders (fragmede on leaks)
Core Feature Auto-chunking (~300 tokens), Voyage-lite/HuggingFace embeddings, schema export for SQL querying + alerts on new data
Tech Stack Python (LangChain/Papers2Dataset fork), PGVector, HuggingFace Transformers, cron for alerts
Difficulty Low
Monetization Hobby

Notes

  • "I'd like to find a way to query 'Supplementary Material' in biomedical research papers" (bonsai_spool); "How much do you need for... paradise papers, the panama papers" (fragmede).
  • Sparks HN discussions on data pipelines; practical for extending eamag/papers2dataset.

Sandboxed LLM SQL Agent CLI

Summary

  • Secure CLI wrapper for running Scry-like text-to-SQL + vector prompts with Claude/local LLMs in a Docker sandbox, preventing prompt injection/file access/network risks.
  • Core value: Safe "curl | bash" research agent for non-engineers, with query preview/explain before execution.

Details

Key Value
Target Audience Casual users wary of security (theptip, dcreater), Claude CLI users (skybrian, bredren)
Core Feature Docker sandbox (no mounts/egress by default), SQL preview + approval, AST parsing to block risky ops (e.g., massive joins)
Tech Stack Node.js CLI, Docker, Anthropic/Ollama APIs, sql-parser-js
Difficulty Medium
Monetization Revenue-ready: Pro tier with cloud sandboxes ($5/mo)

Notes

  • "dangerously-skip-permissions... You need to sandbox Claude" (theptip); "allowing network egress a security risk" (dcreater).
  • Addresses HN security paranoia; utility for safe agent experiments like Claude Code histories.

Read Later