Gemini 3.5 Flash

📝 Discussion Summary (Click to expand)

1. Pricing shock &cost increase
The price jump is a major pain point.

“$9 vs $12 for output.” – swe_dima
“3× price increase of the last Flash model ($3 → $9 per 1M output).” – bakugo
“Gemini 3.5 Flash: $0.75 input / $4.50 output per 1 M tokens, with output price explicitly ‘including thinking tokens’.” – GodelNumbering

2. Confusing model naming & tier structure
Users are frustrated by the new naming conventions (Flash‑Lite, Flash, Pro, etc.).

“Flash‑Lite is a different product from Flash, which is more expensive. They couldn’t be more confusing with their naming.” – naman
“Gemini 3.5 Flash is priced like a Pro model while still being called ‘Flash’.” – Alfon

3. Performance vs. benchmark claims
There is debate over how the new model stacks up against rivals.

“Arena.ai: Gemini 3.5 Flash shifts the Pareto frontier in Text. 8 models from Google dominate the price‑performance curve.” – Arena.ai (tweet)
“Gemini 3.5 Flash uses ~7,500 tokens for a complex SVG task while 3.1 Pro uses ~28 k tokens, yet only the latter animates correctly.” – sxx

4. UI / access reliability problems
The Gemini web UI, Antigravity quota, and CLI are frequently cited as flaky.

“In our experience, caching is not very reliable with Google. We always get random cache misses that don’t happen with other providers.” – henryah
“Google’s AI Studio is shockingly bad – sessions refresh, disappear, and error out indefinitely.” – veselin

5. Hallucinations & search reliability
Many note that the models still fabricate links and citations.

“People complain about them incessantly, but I can almost never get people to actually post receipts.” – WarmWash > “Gemini will confidently give me wrong or outdated information unless forced to search, and even then it often hallucinates the source.” – krupan

These five themes capture the dominant concerns across the discussion.

🚀 Project Ideas

Generating project ideas…

Gemini Cache‑Optimized Relay#Summary

Problem: Google’s Gemini pricing and flaky cache reliability make it costly and unpredictable for developers building agentic workflows.
Value Prop: A lightweight proxy that caches Gemini responses, tracks hit‑rates, and automatically falls back to cheaper models when cache misses occur, reducing token spend by 30‑50%.

Details

Key	Value
Target Audience	Mid‑size AI teams, indie hackers, and SaaS founders building multi‑model agents.
Core Feature	Automatic Gemini caching with hit‑rate analytics, fallback to DeepSeek/Qwen on miss, real‑time cost estimator.
Tech Stack	Backend: Node.js + Redis (TTL‑based cache); API: OpenAPI‑compatible wrapper; Front‑end: minimal CLI & dashboard.
Difficulty	Medium
Monetization	Revenue-ready: $0.01 per 1 k input tokens + $0.02 per 1 k output tokens (flexible tier).

Notes- HN commenters repeatedly lamented “price hikes” and “cache flakiness”; a solution that guarantees cheaper reuse of cached tokens would be immediately attractive.

Could integrate Google’s explicit caching API while hiding its quirks, turning Gemini into a reliable, cost‑effective building block for agentic coding.

Cross‑Provider Token Router

Summary

Problem: Users must manually switch between Gemini, DeepSeek, Qwen, etc., to balance performance and cost; no automated tool exists.
Value Prop: An autonomous router that selects the cheapest model meeting a predefined quality threshold for each request, continuously learning from token usage patterns.

Details

Key	Value
Target Audience	DevOps engineers, API platform builders, and AI product managers.
Core Feature	Dynamic model selection + cost‑aware request routing; fallback logic for flaky services.
Tech Stack	Backend: Python FastAPI; Decision engine: lightweight RL bandit; Storage: SQLite; Deploy: Docker/K8s.
Difficulty	Medium
Monetization	Revenue-ready: usage‑based fee 0.5% of total token spend saved.

Notes

Discussions on HN about “price is just a tax” and “pricing noose tightening” indicate appetite for a tool that reduces spend without manual overhead.
The router could surface real‑time price comparisons and alert users to emerging cheaper models (e.g., upcoming DeepSeek Flash).

Local‑First LLM Proxy (Open‑Source Alternative) #Summary

Problem: Cloud API costs are prohibitive; developers want a self‑hosted endpoint that mimics Gemini/DeepSeek behavior for low‑latency, low‑cost usage.
Value Prop: A Docker‑compose‑ready proxy that serves Qwen 3.6‑35B, DeepSeek‑V4‑Flash, and other open models with an OpenAI‑compatible API, auto‑updating weights.

Details| Key | Value |

|-----|-------| | Target Audience | Hobbyists, small startups, privacy‑focused enterprises. | | Core Feature | Unified API endpoint, automatic model model‑swap based on load, built‑in token‑count monitoring. | | Tech Stack | Backend: FastAPI + vLLM; Model loader: Hugging Face Transformers; Auth: JWT; UI: Streamlit dashboard. | | Difficulty | High | | Monetization | Hobby |

Notes

HN users regularly cite “price hikes” and “China’s cheaper models” as reasons to self‑host; a ready‑to‑run solution would capture that market.
Could be positioned as a “Gemini‑Free” drop‑in replace for any Gemini‑compatible codebase.

Cost‑Aware Benchmark Dashboard for LLM APIs

Summary

Problem: Teams lack visibility into per‑token costs, cache hits, and latency across multiple LLMs, leading to unexpected bills.
Value Prop: A SaaS dashboard that ingests API logs, calculates effective cost per task, visualizes cache efficiency, and sends alerts for anomalous price spikes.

Details

Key	Value
Target Audience	Engineering managers, AI product owners, finance ops for AI‑heavy firms.
Core Feature	Multi‑provider cost breakdown, cache‑hit ratio reporting, predictive spend modeling.
Tech Stack	Frontend: React + D3; Backend: Go (log parser); DB: TimescaleDB; Hosting: Vercel/Render.
Difficulty	Medium
Monetization	Revenue-ready: tiered subscription ($19/mo basic, $99/mo pro).

Notes

Frequent HN complaints about “unexpected price jumps” and “price war” signal demand for proactive cost monitoring.
Could integrate with the Cross‑Provider Token Router to provide actionable insights directly to users.

Agentic Workflow Scheduler with Adaptive Model Switching

Summary

Problem: Agentic coding pipelines waste tokens by using premium models for every step; users need a strategy that uses cheap models for most operations and only upgrades when necessary.
Value Prop: A workflow engine that parses an AGENTS.md‑style plan, executes low‑cost model steps, validates output with a cheaper verifier, and escalates to a premium model only on failure, cutting token costs by up to 70%.

Details

Key	Value
Target Audience	AI‑centric dev teams, automation engineers, SaaS founders building AI‑augmented products.
Core Feature	Plan parsing, multi‑stage execution, on‑the‑fly cost‑budgeting, fail‑over to high‑quality model.
Tech Stack	Backend: Python + Celery; Scheduler: Airflow‑lite; Model API wrapper; Pricing engine.
Difficulty	High
Monetization	Revenue-ready: $0.005 per 1 k input tokens processed, $0.01 per 1 k output tokens generated (pay‑as‑you‑go).

Notes

Discussions about “Flash is now premium” and “need for caching” highlight the need for intelligent token budgeting.
Providing a concrete scheduler that integrates with existing CLI tools (e.g., Gemini CLI, Claude Code) would resonate strongly with HN developers.

Gemini 3.5 Flash

🚀 Project Ideas

Gemini Cache‑Optimized Relay#Summary

Details

Notes- HN commenters repeatedly lamented “price hikes” and “cache flakiness”; a solution that guarantees cheaper reuse of cached tokens would be immediately attractive.

Cross‑Provider Token Router

Summary

Details

Notes

Local‑First LLM Proxy (Open‑Source Alternative) #Summary

Details| Key | Value |

Notes

Cost‑Aware Benchmark Dashboard for LLM APIs

Summary

Details

Notes

Agentic Workflow Scheduler with Adaptive Model Switching

Summary

Details

Notes

Read Later