Claude Cowork exfiltrates files

📝 Discussion Summary (Click to expand)

Here are the 4 most prevalent themes of the opinions expressed in this Hacker News discussion:

1. Prompt Injection is an Inherent, Unsolvable Vulnerability

Many users argued that prompt injection is not a bug but an intrinsic feature of how current Large Language Models (LLMs) work. Unlike SQL injection, there is no way to separate data from instructions (code) because both are processed as a single stream of tokens. This makes complete prevention theoretically impossible with the current architecture.

NitpickLawyer: "In SQL world we had ways to separate control channels from data channels. In LLMs we don't."
alienbaby: "The control and data streams are woven together (context is all just one big prompt) and there is currently no way to tell for certain which is which."
phyzome: "Prompt injection is not solvable in the general case. So it will just keep happening."

2. AI Companies Externalize Risk onto Users

Users expressed frustration that AI companies acknowledge risks but suggest "unreasonable precautions" that shift the burden of security to the end-user. This was compared to the gun industry and criticized as a byproduct of "unfettered capitalism," where companies profit while users bear the consequences of inevitable exploits.

AmbroseBierce: "It's exactly like guns, we know they will be used in school shootings but that doesn't stop their selling in the slightest, the businesses just externalize all the risks claiming it's all up fault of the end users."
rsynnott: "It largely seems to amount to 'to use this product safely, simply don't use it'."
ronbenton: "Telling uses to 'watch out for prompt injections' is insane. Less than 1% of the population knows what that even means... This is more than unreasonable, it’s negligent."

3. The "Phishing" Analogy is More Accurate than "Injection"

Several commenters argued that comparing prompt injection to SQL injection is misleading. Instead, they view these attacks as closer to social engineering or phishing. This implies that the solution requires behavioral changes and skepticism from users, not just technical patches, because the AI is being tricked rather than exploited via a code flaw.

NitpickLawyer: "What I don't think is accurate is to then compare it to sql injection... I think it's better to think of the aftermath as phishing, and communicate that as the threat model."
chasd00: "'llm phishing' is a much better way to think about this than prompt injection."
hakanderyal: "People fall for phishing every day. Even highly trained people."

4. Fundamental Architectural Changes are Needed

Because prompt injection is inherent to LLMs, users believe the only real solution is a new architectural breakthrough. Ideas floated included "authority bits" for tokens, separating instruction channels from data channels at the model level, or a "prepared statement" pattern for agents. However, many are skeptical this will happen soon, as it might limit the model's usefulness.

wcoenen: "I wonder if might be possible by introducing a concept of 'authority'. Tokens are mapped to vectors... one of the dimensions of that space could be reserved to represent authority."
NitpickLawyer: "This is what oAI are doing. System prompt is 'ring0'... They do train the models to follow this prompt hierarchy. But it's never full-proof."
tempaccsoz5: "Honestly, I'm a bit surprised the main AI labs aren't doing this... You could just include an extra single bit with each token that represents trusted or untrusted."

🚀 Project Ideas

Prompt Injection Defense Simulator

Summary

[A local, interactive sandbox that demonstrates how prompt injection attacks on AI agents (like the one described in the HN discussion) actually work, using real-world examples.]
[The core value is demystifying the "lethal trifecta" and providing a safe, educational environment for developers and security teams to test and understand these new vulnerabilities without risking real data.]

Details

Key	Value
Target Audience	Developers, security engineers, and product managers building or using AI agents.
Core Feature	A downloadable application that simulates an AI agent (like Claude Cowork) and allows users to safely execute pre-loaded prompt injection attacks. It visualizes the attack chain (e.g., hidden text in a file leading to exfiltration) in real-time.
Tech Stack	Python, Electron/Tauri for the desktop app, Ollama for running a local, small LLM to demonstrate the principle.
Difficulty	Medium
Monetization	Hobby (Open source)

Notes

[Directly addresses the frustration that "Less than 1% of the population knows what that even means" by making the threat tangible and educational.]
[Provides practical utility for security training, moving beyond abstract warnings to hands-on demonstration, which would spark significant discussion on best practices.]

Unstructured Data Firewall

Summary

[A proxy service that sits between AI agents and any unstructured data sources (like email, cloud drives, or scraped web content), sanitizing inputs before they reach the LLM's context window.]
[The core value is breaking the "single channel" problem by creating a trusted intermediary that rewrites or flags potentially malicious content, aiming to separate the "control" channel from the "data" channel.]

Details

Key	Value
Target Audience	Enterprises deploying AI agents on sensitive internal documents and communications.
Core Feature	An API gateway that ingests documents, uses a series of smaller, specialized models to detect and neutralize hidden instructions or unusual patterns, and outputs a "safe" version for the agent.
Tech Stack	Go/Python for the proxy, smaller transformer models (like BERT/DeBERTa) for classification/rewriting, vector databases for similarity checks against known attack patterns.
Difficulty	High
Monetization	Revenue-ready: SaaS subscription based on document volume.

Notes

[Addresses the core technical argument from NitpickLawyer: "we don't have ways to separate control channels from data channels." This is an architectural attempt to create one.]
[Sparks debate on whether this is a viable "fix" or just another mitigation that clever attackers will bypass, referencing the "whack-a-mole" nature of security.]

Summary

[An open-source, community-driven repository for verified, version-controlled, and signed "skills" or "prompts" for AI agents. It includes a GitHub Action that automatically runs a suite of prompt injection tests against any submitted skill.]
[The core value is creating a trusted ecosystem for sharing agent capabilities, directly addressing the "trust" problem of uploading random files from the internet that burkaman highlighted.]

Details

Key	Value
Target Audience	AI agent developers, "vibe coders," and companies building agent marketplaces.
Core Feature	A central registry with cryptographically signed skills, automated adversarial testing (red-teaming) in CI/CD, and a strict review process to prevent malicious submissions.
Tech Stack	GitHub Actions, Sigstore for signing, a public database (Postgres/SQLite), and a simple web frontend.
Difficulty	Medium
Monetization	Hobby (Open source, with potential for sponsored builds for private registries).

Notes

[Directly counters the idea that you can't trust random .docx or .md files by creating a "skill store" with security guarantees.]
[Would generate high-quality discussion around software supply chain security in the AI era and the responsibility of platforms to vet AI components.]

Account-Bound API Gateway

Summary

[A middleware service that enforces a strict 1:1 mapping between a user's session and the API keys/credentials used by an agent, preventing cross-account data exfiltration.]
[The core value is fixing the specific architectural flaw in the Anthropic Cowork exploit where the agent could be tricked into using an attacker's API key, effectively acting as a secure proxy that insulates the agent from direct credential access.]

Details

Key	Value
Target Audience	API platform providers and developers building custom agentic workflows.
Core Feature	Intercepts all outbound tool calls (e.g., `curl`, API requests) from the agent. It validates the call against the user's session policy and rewrites the request to use the correct, session-bound credentials, blocking any calls with unauthorized keys.
Tech Stack	Rust/Go for a high-performance proxy, Redis for session management, Envoy as a sidecar proxy in containerized environments.
Difficulty	Medium
Monetization	Revenue-ready: Enterprise deployment license or managed service for secure agent hosting.

Notes

[Directly solves the caminanteblanco critique: "they took measures against people using other agents... while being this fucking sloppy." It enforces credential binding at the infrastructure level.]
[A practical, engineer-focused solution that would be praised by HN users for being an architectural fix rather than just a policy recommendation.]

Claude Cowork exfiltrates files

1. Prompt Injection is an Inherent, Unsolvable Vulnerability

2. AI Companies Externalize Risk onto Users

3. The "Phishing" Analogy is More Accurate than "Injection"

4. Fundamental Architectural Changes are Needed

🚀 Project Ideas

Prompt Injection Defense Simulator

Summary

Details

Notes

Unstructured Data Firewall

Summary

Details

Notes

Summary

Details

Notes

Account-Bound API Gateway

Summary

Details

Notes

Read Later