Project ideas from Hacker News discussions.

Computer Use is 45x more expensive than structured APIs

📝 Discussion Summary (Click to expand)

Top 4 Themes inthe Discussion

Theme Summary & Key Quote
1. Token cost & inefficiency of vision‑based agents Many users point out that “computer use” is dramatically more expensive than calling an API. “The vision agent took almost 20 minutes… the API approach… 0.5 s – 2.8 s … a 17‑minute (!!) total time for the vision agent vs. 0.5s‑2.8s for the API approach.”palashawas
2. Preference for structured APIs / MCPs over raw UI navigation Several commenters stress that “structured APIs (or MCP) beat vision agents in speed and cost.” “Structured APIs… are 40× cheaper, more deterministic, and pay the server bills.”jacktu
3. Real‑world constraints & skepticism about AI handling sensitive tasks Users highlight practical barriers: security, privacy, and the need for reliable APIs. “I don’t see AI ‘agents’ being trusted with taxes, background checks, or creating an LLC – those require human oversight.”overgard
4. Future OS/API redesign to expose functionality to agents There’s a recurring call to rethink operating‑system design so that every app’s functionality is “exposed via an API while remaining human‑friendly.” “In an agentic world, the OS needs to be completely rethought… every single app functionality should be exposable via an API.”aurareturn

The themes above capture the most common points raised, each backed by a direct quotation from the participants.


🚀 Project Ideas

Accessibility‑First Agent API Layer

Summary

  • Turn any desktop or web UI into a typed, schema‑driven API that AI agents can call directly.
  • Eliminates token‑heavy vision loops by exposing native accessibility tree nodes, actions, and state.

Details

Key Value
Target Audience AI agent developers, SaaS teams needing automation without vendor‑specific APIs
Core Feature Auto‑generate a REST‑style JSON spec from OS accessibility APIs (WinUIA, AT‑SPI, macOS AXAPI) and expose it via a lightweight server
Tech Stack Python backend, FastAPI, Electron wrapper for cross‑platform, SQLite for spec cache
Difficulty Medium
Monetization Revenue-ready: Subscription $19/mo per API consumer

Notes

  • HN commenters repeatedly cite “vision agents are too slow/expensive” – this solves that directly.
  • Provides a reusable bridge between existing GUIs and LLMs, turning a pain point into a marketable service.

AgentWorkflow Marketplace

Summary

  • A platform to record, version, and share reusable agent navigation scripts.
  • Enables monetization of proven workflows and reduces duplicate token usage across users.

Details

Key Value
Target Audience Power users, AI‑automation consultants, SaaS founders
Core Feature UI interaction recorder that outputs a declarative workflow JSON; marketplace for buying/selling scripts
Tech Stack Node.js + Express, React frontend, PostgreSQL, Docker containers
Difficulty High
Monetization Revenue-ready: Pay‑per‑execution $0.005, plus 20% platform fee on sales

Notes

  • Directly addresses “high token cost” complaints by reusing verified scripts.
  • Appeals to the community’s desire for shared, efficient agentic actions (mentioned by several commenters).

PromptCache – Token‑Efficient GUI Agent Cloud

Summary

  • Serverless cloud that caches UI interaction traces and suggests optimized prompts, cutting token waste for vision‑based agents.

Details

Key Value
Target Audience AI startups, hobbyist agents, researchers
Core Feature Persistent session storage, auto‑generated prompt compression, fallback to structured APIs when available
Tech Stack Go micro‑services, Redis caching, AWS Lambda, Cloudflare Workers
Difficulty Medium
Monetization Hobby

Notes

  • Aligns with discussions about “200x token penalty” for vision agents; offers a cheap shared pool.
  • Would be a natural companion to existing benchmarks and gets early adopter excitement.

LegacyUI SDK – Declarative Automation for Legacy Apps

Summary

  • SDK that wraps legacy graphical applications (mainframes, old ERP, proprietary desktop tools) and exposes a stable, programmatic UI map for agents.

Details

Key Value
Target Audience Enterprises with legacy software, RPA teams, compliance officers
Core Feature Generates a persistent UI element index from OCR + accessibility hints, produces stable action identifiers for repeatable scripts
Tech Stack C++ core, OpenCV for pixel analysis, TensorFlow Lite for element classification, Rust bindings
Difficulty High
Monetization Revenue-ready: Enterprise license $2,000/mo per server

Notes

  • Many commenters mention “legacy SaaS without APIs” as a key use‑case for computer‑use; this SDK targets exactly that gap.
  • Promises a deterministic alternative to fragile vision‑only approaches, likely to attract early enterprise interest.

Read Later