Claude mixes up who said what

📝 Discussion Summary (Click to expand)

1. Mis‑attribution / “harness” bug > “It’s somehow labelling internal reasoning messages as coming from the user, which is why the model is so confident that ‘No, you said that.’” — sixhobbits

2. No clear data vs control boundary (prompt‑injection risk)

“The principal security problem of LLMs is that there is no architectural boundary between data and control paths.” — fzeindl

3. Context‑window instability & non‑determinism

“After just a handful of prompts everything breaks down.” — wildrhythms 4. Over‑reliance & abdication of responsibility
“The general problem what I have with LLMs … is that people that tend to overuse the technology try to absolve themselves from responsibilities.” — cookiengineer

5. Calls for better controls (defaults, sandboxing, speaker delimiters)

“The best solution … are the aforementioned better defaults, stricter controls, and sandboxing (and less snake‑oil marketing).” — perching_aix

🚀 Project Ideas

[PromptGuard]

Summary

Adds explicit speaker tags to every token stream, preventing LLMs from mislabeling internal reasoning as user input. - Core value: eliminates “who said what?” confusion that leads to self‑recommending dangerous actions.

Details

Key	Value
Target Audience	LLM tool developers, AI‑ops teams, Claude Code users
Core Feature	Automatic insertion of / delimiters around all messages before inference
Tech Stack	Python backend, OpenTelemetry for tracing, React UI, Redis for session state
Difficulty	Medium
Monetization	Revenue-ready: usage‑based pricing ($0.001 per 1k tokens processed)

Notes

HN commenters repeatedly lamented the “hallucinated user messages” problem; this directly fixes it. - Could be packaged as a plug‑in for VS Code, Claude‑Code, and any chat UI.

[StructuredAgent]

Summary

Provides a deterministic query language that maps natural language intents to discrete tool‑call syntax, removing ambiguous prompting. - Core value: guarantees that the model cannot misinterpret its own output as user instruction.

Details

Key	Value
Target Audience	Developers building agent frameworks, enterprise workflow automators
Core Feature	Parse‑and‑execute a custom DSL (e.g., `DEPLOY service=:auth API_KEY=xxx`) that the model must follow verbatim
Tech Stack	Rust microservice, OpenAPI spec, PostgreSQL for command store, Docker
Difficulty	High
Monetization	Hobby

Notes

Discussions around “structured LLM queries” show demand for a safe alternative to raw prompts.
Aligns with IETF draft on cryptographic argument attenuation—offers a marketable security layer.

[SandboxedCommander]

Summary

A permission‑gated execution layer that only allows LLM‑generated commands that match a whitelist of approved actions.
Core value: prevents accidental destructive operations even if the model tries to self‑authorize itself.

Details

Key	Value
Target Audience	DevOps engineers, SaaS platforms offering AI‑driven automation
Core Feature	Command approval engine using policy JSON; each approved command returns a unique signed token before execution
Tech Stack	Go, gRPC, SQLite, AWS Lambda authorizer
Difficulty	Medium
Monetization	Revenue-ready: subscription tier ($19/mo per 10k commands)

Notes

Commenters highlighted agents “thinking they have permission” to run destructive tool calls; this directly mitigates that risk.
Can integrate with existing CI/CD pipelines as a gatekeeper.

[TraceLabeler]

Summary

Automatically annotates chat histories with speaker provenance tags and flags when the model references its own prior output as user input.
Core value: gives users visibility into misattributed messages for debugging and compliance.

Details

Key	Value
Target Audience	AI researchers, product managers, compliance officers
Core Feature	Post‑processing API that adds <author:user
Tech Stack	Node.js, Elasticsearch, Grafana dashboard
Difficulty	Low
Monetization	Hobby

Notes

Multiple HN users asked for a way to “see who said what” to avoid the confusion described in the article. - Open‑source version can be self‑hosted; premium SaaS can provide alerting on high‑risk mislabeling. ## [DeterministicReplay]

Summary

Provides versioned conversation snapshots with cryptographic hashes, enabling reproducible re‑execution of LLM sessions.
Core value: ensures that the same input always yields the same downstream actions, eliminating stochastic drift.

Details| Key | Value |

|-----|-------| | Target Audience | Enterprise AI teams, auditors, regulated industries | | Core Feature | Store hashed context blocks; on replay, restore exact token order and enforce deterministic inference (temp = 0, fixed seed) | | Tech Stack | Python, Redis, Kubernetes, OpenTelemetry | | Difficulty | High | | Monetization | Revenue-ready: enterprise licensing ($2,500 per month per workspace) |

Notes

Frequently discussed the pain of “context rot” and unpredictable model behavior when revisiting old chats; this solves it.
Could be marketed as a compliance feature for industries that must audit AI decisions.

Claude mixes up who said what

🚀 Project Ideas

[PromptGuard]

Summary

Details

Notes

[StructuredAgent]

Summary

Details

Notes

[SandboxedCommander]

Summary

Details

Notes

[TraceLabeler]

Summary

Details

Notes

Summary

Details| Key | Value |

Notes

Read Later