How I write software with LLMs

📝 Discussion Summary (Click to expand)

1. Splitting the agent into specialized “sub‑agents” (architect, developer, reviewer, etc.)
Many users argue that breaking a single LLM into role‑specific agents helps manage context, enforce permissions, and reduce hallucinations.

“If you want to separate capabilities, definitely.” – chriswarbo
“The orchestrator runs the whole thing… architect → developer → reviewer.” – marcus_holmes

2. Cost‑vs‑quality trade‑off and token budgeting
A recurring point is that multi‑agent pipelines can be expensive, but they can also save tokens by delegating simple tasks to cheaper models.

“One tier, one model is cheaper, but the quality comes with the reviewers.” – stavros
“You spend your token & context budget in full in 3 phases.” – hakanderyal

3. Human understanding and the “vibe‑coding” critique
Several comments caution that relying on LLMs without reading or reviewing the code turns developers into “no‑code” users and erodes architectural insight.

“If you fail to even read the code produced, then I might as well treat it like a no‑code system.” – ashwinsundar
“I can understand the high‑levels of how no‑code works, but as soon as it breaks, it might as well be a black box.” – ashwinsundar

4. Workflow & tooling integration (CLI vs IDE, markdown artifacts, harnesses)
Users discuss how to embed agents into existing toolchains, the value of markdown‑based plan files, and the pros/cons of terminal‑based vs. IDE‑based agents.

“I’m using a hierarchy of artifacts: requirements doc → design docs → code+tests.” – aix1
“All artifacts are version controlled.” – aix1
“I just want to talk to a model all day, but that’s not the same as writing code.” – lbreakjai

These four themes capture the main strands of opinion in the discussion.

🚀 Project Ideas

[Orchestrated Agent Studio]

Summary

Solves the fragmentation of LLM-driven code pipelines by providing a unified orchestrator that automatically creates architect, developer, and reviewer roles.
Core value: dramatically reduces context‑window usage and token waste while enforcing systematic review loops.

Details

Key	Value
Target Audience	Small dev teams, solo hackers building side projects
Core Feature	Multi‑agent orchestration with auto‑saved design docs and role‑based permissions
Tech Stack	React front‑end, Node.js backend, PostgreSQL, OpenAI GPT‑4o, Claude 3, Docker
Difficulty	High
Monetization	Revenue-ready: Tiered subscription ($12/mo basic, $45/mo pro)

Notes

Directly answers HN’s call for a “super‑powers” framework that separates concerns without manual prompt gymnastics.
Opens discussion on cost‑effective model routing (e.g., Sonnet for planner, Opus for reviewer).

[Blueprint Builder]

Summary

Addresses the pain of vague requirements by turning user intent into structured design artifacts (specs, diagrams, test plans).
Core value: guarantees clear, shareable blueprints that keep context small and enable reliable hand‑offs.

Details

Key	Value
Target Audience	Product managers, solo founders, freelancers
Core Feature	Generates markdown spec files, PlantUML architecture diagrams, and test matrices from natural‑language prompts
Tech Stack	Vue.js, Python backend, LangChain, DALL‑E 3 for diagram generation, GPT‑4 Turbo
Difficulty	Medium
Monetization	Hobby (free open‑source, optional paid support)

Notes

Mirrors Stavros’ “plan file” approach; HN loves concrete artifact‑driven workflows.
Potential integration with Notion or GitHub Issues for seamless tracking.

[Sub‑Agent Rental Hub]

Summary

Tackles the scarcity of specialized sub‑agents (DB, infra, security) by letting users rent pre‑trained tiny models for specific roles.
Core value: enables anyone to compose a custom agent fleet without paying for large‑model calls on every step.

Details

Key	Value
Target Audience	Developers who need cheap, focused tasks (e.g., database query writer, CI pipeline builder)
Core Feature	Marketplace of skill‑packaged Docker containers exposing a single‑purpose endpoint for the orchestrator
Tech Stack	FastAPI, Docker Compose, Hugging Face models (e.g., CodeLlama‑7B‑DB, TinyLlama‑Infra), Stripe for payments
Difficulty	Medium
Monetization	Revenue-ready: Pay‑per‑call ($0.001 per inference) + optional monthly quota

Notes

Echoes the discussion about using different models for planner vs developer; HN will debate open‑source vs proprietary trade‑offs.
Sparks conversation on token‑budget markets and fair model pricing. ## [AutoCode Reviewer]

Summary- Solves the “review bottleneck” after LLM code generation by automatically running static analysis, security scans, and contextual unit tests.

Core value: guarantees higher‑quality output before developers see it, cutting debugging time dramatically.

Details

Key	Value
Target Audience	Teams adopting vibe‑coding, indie hackers, open‑source maintainers
Core Feature	Integrated reviewer agent that critiques generated files, suggests fixes, and enforces style guides
Tech Stack	Rust backend, Semgrep, SonarQube APIs, GPT‑4‑Vision for visual bug detection, GitHub Actions
Difficulty	High
Monetization	Revenue-ready: Enterprise license $30/user/mo (self‑hosted) + free community tier

Notes

Directly addresses HN’s concern about needing multiple reviewers and “different hats”. - Likely to generate debate on false positives vs code‑ownership.

[SpecCrafter]

Summary

Provides a structured requirement‑articulation workflow that extracts business goals, produces clear acceptance criteria, and auto‑generates implementation tickets.
Core value: turns vague “add email support” prompts into concrete, testable specs, reducing scope creep.

Details

Key	Value
Target Audience	Product owners, solo SaaS founders, remote teams
Core Feature	Chat‑driven spec generator that outputs markdown requirement docs, priority tags, and linked GitHub Issues
Tech Stack	Next.js front‑end, Go microservice, GPT‑4‑Turbo for parsing, Markdown pipelines
Difficulty	Low
Monetization	Hobby (free, with optional paid hosted API)

Notes

Mirrors the “plan file” concept from Stavros and the orchestrator discussion; HN loves concrete docs that replace ambiguous prompts.
Opportunity for integration with project‑management tools like Linear.

[Self‑Healing Code Loop]

Summary

Addresses flakiness of LLM‑generated code by continuously running generated tests, feeding failures back, and auto‑repairing bugs without human intervention. - Core value: turns a one‑shot generation into a reliable, self‑correcting pipeline.

Details

Key	Value
Target Audience	Developers building internal tools, rapid‑prototype hackers, SaaS founders
Core Feature	Loop that executes unit/integration tests on output, triggers re‑prompt with failure context, and merges fixes automatically
Tech Stack	Python orchestration, pytest, LangChain, OpenAPI validator, GitHub PR automation
Difficulty	High
Monetization	Revenue-ready: Usage‑based $0.005 per loop iteration + optional enterprise SLA

Notes

Resonates with HN’s frustration about LLM “hallucinations” and the need for guardrails.
Sparks debate on the limits of self‑repair vs human oversight.

How I write software with LLMs

🚀 Project Ideas

[Orchestrated Agent Studio]

Summary

Details

Notes

[Blueprint Builder]

Summary

Details

Notes

[Sub‑Agent Rental Hub]

Summary

Details

Notes

Summary- Solves the “review bottleneck” after LLM code generation by automatically running static analysis, security scans, and contextual unit tests.

Details

Notes

[SpecCrafter]

Summary

Details

Notes

[Self‑Healing Code Loop]

Summary

Details

Notes

Read Later