Caveman: Why use many token when few token do trick

Original Article

Hacker News Discussion

📝 Discussion Summary (Click to expand)

Key Themes from the HN discussion

Caveman mode trades performance for brevity Shortening output can make the model “dumber.”

“More concise is dumber. Got it.” – taneq
Tokens are the currency of reasoning
Models “think” by emitting tokens; low‑entropy tokens convey little new information.

“tokens are units of thinking.” – TeMPOraL > “The LLM has no accessible state beyond its own output tokens; each pass generates a single token and does not otherwise communicate with subsequent passes.” – dTal
Concise communication is valued by users
Many participants appreciate fewer fluff words, which saves context and speeds reading.

“It makes my day not to have to read through entire essays about some trivial solution.” – bhwoo48
The ~75 % token‑saving claim needs proper validation
The author acknowledges the figure is preliminary and calls for rigorous evaluation.

“The real eval is end‑to‑end: total input tokens, total output tokens, latency, quality/task success.” – author of the skill

🚀 Project Ideas

[CavemanSkill Optimizer]

Summary

Automatically compresses LLM output to caveman‑style prose while preserving factual accuracy via confidence‑based token pruning.
Provides real‑time benchmarking and fallback to full output when quality drops below a threshold.

Details

Key	Value
Target Audience	Developers using Claude, Claude Code, or other Anthropic APIs who pay per token
Core Feature	Caveman‑style output generator with dynamic quality guardrails
Tech Stack	Python (FastAPI), React, Anthropic API wrapper
Difficulty	Medium
Monetization	Revenue-ready: per‑token‑saved pricing (e.g., $0.0001 per token reduced)

Notes- HN commenters repeatedly lamented “verbose LLM slop” and asked for token savings (“makes my day not to have to read entire essays”) – this directly addresses that pain.

Offers a discussion‑worthy hybrid: token reduction without sacrificing reliability, tackling the “dumbing‑down” concern.

[Neuralese Prompt Compiler]

Summary

Converts natural‑language prompts into a compact “neuralese” token format that maximizes information density for LLMs. - Includes a preview mode that shows token savings and an API to switch back to plain text when needed.

Details

Key	Value
Target Audience	Engineers building AI‑heavy applications that face high token‑costs (e.g., SaaS, research tools)
Core Feature	Prompt serialization to high‑density token sequences (neuralese) with reversible decoding
Tech Stack	Node.js microservice, JSON Schema validation, OpenAPI spec, Docker
Difficulty	High
Monetization	Revenue-ready: tiered SaaS subscription (free tier up to 10k tokens, $0.001 per additional 1k tokens)

Notes

Users noted that “Chinese is more concise” and that “tokens are units of thinking” – neuralese leverages that insight to cut input tokens.
Sparks conversation about a new language layer for LLMs, aligning with ideas of “languages of the machine”.

[Token‑Conscious LLM Orchestrator]

Summary

Monitors token consumption across multiple LLM calls, automatically toggling between full and concise modes based on cost thresholds.
Stores compacted, caveman‑styled responses in a cache to avoid re‑generating verbose output.

Details| Key | Value |

|-----|-------| | Target Audience | Teams managing large‑scale AI workflows where token budget directly impacts cost (e.g., product teams, data pipelines) | | Core Feature | Auto‑mode switching, token‑budget dashboard, cache‑aware response pruning | | Tech Stack | Go backend, Redis cache, Grafana dashboard, Kubernetes | | Difficulty | High | | Monetization | Revenue-ready: usage‑based pricing (e.g., $0.02 per 1 k tokens saved) |

Notes

Commenters asked “why waste time say lot word when few word do fine” and expressed concern about “LLM slop” taking up context – this service directly reduces that noise.
Generates discussion on balancing cost, latency, and answer quality, addressing the “saving tokens” motivation.

[Caveman Translator Browser Extension]

Summary

Intercepts LLM chat UI responses and rewrites them in minimal caveman style, with an optional Expand button to view the original verbose text.
Works on popular AI chat platforms (Claude, ChatGPT, etc.) and adds a toggle for token‑saving mode.

Details| Key | Value |

|-----|-------| | Target Audience | End‑users of AI chat interfaces who want quicker, clearer reads without losing access to full answers | | Core Feature | Real‑time response compression to caveman syntax, expand‑to‑original, token‑count indicator | | Tech Stack | JavaScript (React), browser extension API, WebAssembly for token counting | | Difficulty | Low | | Monetization | Hobby |

Notes

Several HN remarks praised the “short output, ++good” idea and described it as “the best thing since I asked Claude to address me in third person.” – this extension delivers that experience.
Could spark conversation about user‑side token optimization and the trade‑off between readability and full context preservation.

Caveman: Why use many token when few token do trick

🚀 Project Ideas

[CavemanSkill Optimizer]

Summary

Details

Notes- HN commenters repeatedly lamented “verbose LLM slop” and asked for token savings (“makes my day not to have to read entire essays”) – this directly addresses that pain.

[Neuralese Prompt Compiler]

Summary

Details

Notes

[Token‑Conscious LLM Orchestrator]

Summary

Details| Key | Value |

Notes

[Caveman Translator Browser Extension]

Summary

Details| Key | Value |

Notes

Read Later