Don't trust AI agents

📝 Discussion Summary (Click to expand)

1. Security is the biggest headache
Agents run inside containers, but the community keeps reminding us that “Docker is not a security boundary” (smallpipe). Even with Podman or runc, recent CVEs show that “container escape” is still a real risk, and that “the attacker needs already to have the capability to spawn containers” (jrpear). The consensus is that the only safe approach is to give agents least‑privilege access and to sandbox them with a hardened proxy or VM (buremba, alexhans).

“Docker is not a security boundary. You’re one prompt injection away from handing over your gmail cookie.” – smallpipe
“The attacker needs already to have the capability to spawn containers …” – jrpear

2. Trust, accountability, and the human‑in‑the‑loop
Because LLMs can hallucinate or be prompt‑injected, the discussion repeatedly stresses that agents must be auditable and that humans must stay in control. “You can’t trust an LLM” (bigstrat2003) and “the only secure way to use any of these tools is to give them very limited access” (jeremyjh) are echoed by many who insist on manual review, snapshots, or a “harness” that limits what an agent can do (buremba, alexhans, daveguy).

“The only secure way to use any of these tools is to give them very limited access” – jeremyjh
“You can’t trust an LLM” – bigstrat2003

3. Extensibility vs. code bloat
The “skills” model of NanoClaw and the self‑modifying code of OpenClaw spark debate over how much code an agent should generate. Some praise the ability to add features on the fly (“skills are a markdown file written in English”) while others warn that “the codebase will eventually be un‑reviewable” (sanex, jimminyx). The LOC‑vs‑quality argument is a recurring theme, with many pointing out that “lines of code is a bad metric” (ninkendo, bee_rider).

“Skills are a markdown file written in English to provide a step by step guide to an AI agent” – sanex
“Lines of code is a bad metric” – ninkendo

4. Real‑world use cases and friction reduction
Despite the concerns, many users are already deploying agents for everyday tasks: scheduling, email drafting, note‑taking, or even complex workflows like GitHub/Jira triage (rubslopes, nkzd). The discussion shows that the promise of “personal assistants” is real, but it also highlights the trade‑offs between convenience and risk (“I want to use it for reminders, but not for email” – himata4113).

“I want to try one to be a bit of a personal coach. Remind me to do things and check in on goals.” – medi8r
“I’m only using my own ‘agent’ … to retrieve information about the audio I upload to it” – vitto_gioda

These four themes—security, trust, extensibility, and practical use—capture the core of the conversation.

🚀 Project Ideas

CapabilityGuard

Summary

solves container‑escape fears by running each agent in an immutable Firecracker microVM with capability‑based permissions and automatic rollback, eliminating the ability to leak host secrets.
Core value proposition: Zero‑trust isolation that guarantees agents can only perform pre‑approved actions, preventing secret exfiltration and privilege escalation.

Details

Key	Value
Target Audience	Security‑focused developers, AI agent operators, SaaS platforms running autonomous agents
Core Feature	Isolated microVM execution with capability APIs, immutable filesystem, enforced network egress allow‑list, automatic snapshot revert on anomaly
Tech Stack	Firecracker,
Monetization	Hobby

Don't trust AI agents

🚀 Project Ideas

CapabilityGuard

Summary

Details

Read Later