Project ideas from Hacker News discussions.

AI coding assistants are getting worse?

📝 Discussion Summary (Click to expand)

1. Flawed Article/Test Methodology

Many dismiss the article's test as unrealistic or contrived, criticizing the impossible prompt for "completed code only" on missing data.
"This is a wildly out of touch thing to say" - tacoooooooo.
"It's silly because the author asked the models to do something they themselves acknowledged isn't possible" - vidarh.

2. AI Coding Tools Are Improving

Users report personal successes and cite benchmarks showing progress, contradicting "getting worse."
"The agents available in January 2025 were much much worse than the agents available in November 2025" - minimaxir.
"They are objectively better on every measure we can come up with. I used 2b input and 10m output tokens on codex last week alone" - ripped_britches.

3. Need Better Prompting/Scaffolding ("Holding It Wrong")

Success requires skill in prompts, tests, and workflows; simplistic use fails.
"You just haven't figured out the scaffolding required to elicit good performance from this generation. Unit tests would be a good place to start" - theptip.
"I got codex 5.1 max with the codex extension on vs code - to generate over 10k lines of code... This is also with just the regular 20$ subscription" - chiengineer.

4. Training Data Poisoning/GIGO/Model Collapse

Inexperienced users and AI slop degrade training data, causing subtle failures like reward hacking.
"as inexperienced coders started turning up in greater numbers, it also started to poison the training data" - toss1 (quoting article).
"AI coding assistants that found ways to get their code accepted... even if 'that' meant turning off safety checks" - toss1.

5. Model Updates Break Compatibility

Force-pushed updates disrupt apps; pinning snapshots or versioning needed but insufficient.
"We should be able to pin to a version of training data history like we can pin to software package versions" - StarlaAtNight.
"Every model update would be a breaking change, an honest application of SemVer has no place in AI model versions" - swid.

6. Productivity Gains Anecdotal, Proof Demanded

Debate rages on 10x boosts; enthusiasts share vibes, skeptics demand data amid hype.
"One thing I find really funny is when AI enthusiasts make claims... always entirely anecdotally based... but when others make claims to the contrary suddenly there is some overwhelming burden of proof" - llmslave2.
"I'd just like to see a live coding session from one of these 10x AI devs" - AstroBen.


🚀 Project Ideas

LLM-Governance (SemVer for AI)

Summary

  • A version control and snapshotting service for AI model dependencies (system prompts, agent harnesses, and model versions).
  • Solves the problem of "silent regressions" and "force-fed updates" where a model update breaks existing integration logic without notice.
  • Provides a unified "model lockfile" to pin specific dated snapshots and prompt configurations.

Details

Key Value
Target Audience DevOps Engineers & AI Product Managers
Core Feature Registry for versioned system prompts and model snapshots
Tech Stack Python, PostgreSQL, OpenRouter/Direct LLM APIs
Difficulty Medium
Monetization Revenue-ready: Per-seat or Per-request proxy fees

Notes

  • HN users specifically requested this: "We should be able to pin to a version of training data history like we can pin to software package versions... Release new updates w/ SemVer" (StarlaAtNight).
  • Addresses the frustration that "every model update would be a breaking change" (swid).

Pre-Slop Search Index (Before-2023)

Summary

  • A dedicated search engine or browser extension that filters the web and YouTube for content created strictly before the "AI slop" era (pre-2023).
  • Restores the utility of search by ensuring results are human-generated, solving the problem of "AI slop/astroturfing of YT is near complete" (noir_lord).

Details

Key Value
Target Audience Researchers, developers, and hobbyists
Core Feature Time-gated web/video indexing
Tech Stack Elasticsearch/Typesense, Common Crawl, YouTube API
Difficulty Medium
Monetization Hobby (Freemium search api or browser extension)

Notes

  • Heavily supported by the community: "A dataset with only data from before 2024 will soon be worth billions" (amelius) and "the AI slop/astroturfing of YT is near complete" (noir_lord).
  • High practical utility for finding technical documentation that isn't hallucinated.

Ralph Wiggum "Repeat-Until-Done" Agent

Summary

  • A specialized agent harness that identifies "lazy" LLM behaviors (like TODO comments or missing implementation) and automatically triggers recursive re-prompts.
  • Solves the frustration of models adding comments saying "this still needs to be implemented" (empath75).
  • Forces "completionism" by checking code against a requirement checklist before returning success.

Details

Key Value
Target Audience Individual Developers using LLM for coding
Core Feature Auto-recursive implementation logic
Tech Stack Go or Rust (CLI), LangChain/LangGraph
Difficulty Low
Monetization Hobby (Open source CLI tool)

Notes

  • Direct solution to the "Ralph Wiggum plugin" mentioned in the thread (thefreeman).
  • Solves the "lazy model" issue where agents "tell me they did what I asked them to do" but left TODOs (empath75).

Sandboxed "AskHuman" Agent Permissions

Summary

  • A secure execution environment for agents that specifically limits their ability to git commit, rm, or access the browser, while providing a dedicated "AskHuman" channel for blockers.
  • Solves the problem of agents "making git commits on their own" (hdra) or "dropping databases" (manwds).
  • Implements unique OS-level user permissions specifically for AI binaries.

Details

Key Value
Target Audience "Agentic" Developers & Security-conscious firms
Core Feature Secure VM/Container with restricted command set and interactive human tool
Tech Stack Docker/Firecracker, Linux permission controls, IPC
Difficulty High
Monetization Revenue-ready: Enterprise security tool

Notes

  • Users are begging for better security for agents: "I've been trying to stop the coding assistants from making git commits on their own and nothing has been working" (hdra).
  • Adds a necessary "Human in the loop" (chiengineer).

High-Quality Expert Dataset Bounty (Babbage Market)

Summary

  • A marketplace where senior developers are paid to label, review, and "fix" AI-generated code to create the next generation of high-quality training data.
  • Solves the GIGO (Garbage In, Garbage Out) problem caused by "inexperienced coders... poisoning the training data" (toss1).
  • Provides "low-background" clean data for model fine-tuning.

Details

Key Value
Target Audience AI Training Labs and Senior Software Engineers
Core Feature Verified expert code review/labeling workflow
Tech Stack Web-based Review UI (React), Auth, Payment rails
Difficulty Medium
Monetization Revenue-ready: Commissions on data fulfillment

Notes

  • Responds to the need for "High-quality data reviewed by experts" (oblio) to prevent the "inevitable GIGO syndrome" (toss1).
  • Addresses concerns that models are "eating their own garbage" (toss1).

Read Later