73% of AI startups are just prompt engineering

📝 Discussion Summary (Click to expand)

The discussion revolves heavily around the nature, viability, and defensibility of current AI startups, particularly those using foundational models via APIs.

Here are the three most prevalent themes:

1. The Lack of Moat and Startup Defensibility (Wrapper Status)

Many participants believe that the vast majority of current AI startups are merely "wrappers" around powerful, commoditized foundational models (like GPT-4). This lack of proprietary technology leaves them vulnerable to being easily replicated by the model providers themselves or by competitors.

Supporting Quotations: * "The difference is, if your company “moat” is a “prompt” on a commodity engine, there is no moat." - "tylerchilds" * "73% of AI startups are building their castle in someone else's kingdom." - "amelius" * "As a wrapper you have no moat, as the foundational providers can just steal your lunch." - "beAbU"

2. The Debate Over "Prompt Engineering" as a True Skill/Discipline

There is significant skepticism regarding whether "prompt engineering" constitutes genuine engineering, with many viewing it as speculative, based on "instinct," or simply the current low-effort activity required. However, others argue that achieving reliable results demands complex scaffolding and data engineering around the prompts.

Supporting Quotations: * "No, there's no such thing as prompt engineering. Engineering involves applying scientific principles to solve real world problems. There are no clear scientific principles to apply here. It's all instinct, hunches, educated guesses, and heuristics..." - "nradov" * "A long time ago a mentor of mine said, 'In tech, often an expert is someone that know one or two things more than everyone else. When things are new, sometimes that's all it takes.'" - "indymike" (Contextually suggesting prompt engineering is this initial low bar) * "The orchestration layer is the moat, ask any LLM and they will give paragraphs explaining why this is..." - "ojr"

3. Concerns about AI Startup Economics and Capital Flow

A major theme is the reliance of many AI startups on continuous VC funding to run expensive, often unprofitable models. This business model is seen as unsustainable without significant differentiation, leading to fears that these companies are speculative bubbles built on "burning VC money."

Supporting Quotations: * "Burning VC money isn't a long term business model..." - "parineum" * "The only barrier between AI startups at this point is access to the best models, and that's dependent on being able to run unprofitable models that spend someone else's money." - "parineum" * "When people are desperate to invest, they often don't care what someone actually can do but more about what they claim they can do." - "drivingmenuts"

🚀 Project Ideas

RAG Evaluation & Quality Automation Service (REQuAS)

Summary

Addresses the pain point that high-quality RAG (Retrieval-Augmented Generation) implementation is complex and difficult to validate, requiring significant manual effort ("It took me 2 full-time weeks to properly implement a RAG-based system so that it found actually relevant data and did not hallucinate" - kgeist).
Core Value Proposition: Provide a standardized, automated pipeline for testing, tuning, and scoring the quality, relevance, and hallucination rate of RAG systems against custom test sets.

Details

Key	Value
Target Audience	AI/ML Engineers and Data Scientists building RAG applications, especially those using open-weight models or needing robust evaluation before production.
Core Feature	A managed service or CLI tool that accepts a RAG pipeline configuration (retriever, LLM, query rewrite module) and a gold-standard Q&A dataset, then runs automated quality checks using LLM-as-a-Judge methodologies and traditional metrics (e.g., relevance, faithfulness).
Tech Stack	Python (FastAPI/Django), LangChain/LlamaIndex for pipeline abstraction, open-source RAG models (for self-hosted options), LLM-as-a-Judge orchestration (using APIs like GPT-4 or high-capability open models).
Difficulty	Medium
Monetization	Hobby

Notes

Solves the need for "write an evaluation pipeline to automate quality testing" (kgeist) and provides structure for what users call "prompt/context engineering" but struggle to standardize.
Directly addresses the need for measurable, repeatable evaluation processes, which many recognized were missing ("how do you build representative evals and measure forward progress?" - theptip).

Specialized Model Router & TCO Optimizer (MentorRoute)

Summary

Solves the economic and capability mismatch where many startups use expensive, general-purpose models (like GPT-4) for tasks that could be handled cheaply and efficiently by smaller, specialized models or open-weight alternatives.
Core Value Proposition: An intelligent routing layer that analyzes incoming requests (based on complexity, required modality, and user context) and routes them to the most cost/performance-efficient LLM available (e.g., routing simple classification to a small fine-tuned model, complex reasoning to a large proprietary model).

Details

Key	Value
Target Audience	Startups and mid-market companies worried about token profitability and operational costs of running inference at scale.
Core Feature	Dynamic API proxy that implements fine-grained routing, caching of common results, and TCO (Total Cost of Ownership) reporting across multiple model providers (OpenAI, Anthropic, self-hosted Llama, etc.).
Tech Stack	Go or Rust (for excellent proxy/networking performance), Redis for caching, extensive configuration/telemetry dashboards.
Difficulty	Medium
Monetization	Hobby

Notes

Directly supports the idea that specializing models is good economics and necessary for long-term viability ("Specialized models are cheaper... train a specialized model to increase your profit" - lukeschlather; "If tekens aren't profitable then prices per token are likely to go up" - parineum).
Appeals to users interested in simplifying their stack or optimizing economics, aligning with the "less is more" strategy pitched by some commenters.

Deterministic Prompt Specification Language (PromptSpec/PSL)

Summary

Addresses the fundamental tension between the non-deterministic nature of natural language prompts and the industry's need for reliable, production-grade automation.
Core Value Proposition: A structured domain-specific language (DSL) that allows users to define prompts as declarative specifications, ensuring greater determinism, version control, and enabling automated conversion/translation between underlying LLMs.

Details

Key	Value
Target Audience	Engineers frustrated by the ambiguity of natural language prompts in production systems; teams migrating between foundational model providers.
Core Feature	A YAML/JSON-based specification language requiring explicit definition of input schemas, expected output structure (e.g., JSON schema enforcement), guardrails, and mandatory context retrieval steps, moving the practice from "prompt alchemy" to "specification".
Tech Stack	TypeScript/Node.js for parsing and specification validation, leveraging existing JSON schema validation tools, with potential compilation targets for various underlying LLM API calls.
Difficulty	High
Monetization	Hobby

Notes

Directly confronts the philosophical debate: is it code or specification? ("Prompt is specification, not code" - amelius; "determinism has been holding the field back" - add-sub-mul-div).
Appeals to the desire for engineering rigor over "artisanal prompt crafting" (nradov) by providing a formal layer above the natural language instruction.