15 Apr 2026 4 min read cybersecurity

A CISO’s Take on “Your Agent Is Mine”

New research out of UC Santa Barbara and Fuzzland should be mandatory reading for any security team that has greenlighted LLM agents in production.

The paper, “Your Agent Is Mine: Measuring Malicious Intermediary Attacks on the LLM Supply Chain,” formalizes something most of us have been hand-waving past. Every LLM API router is a plaintext man-in-the-middle, not by exploit, but by architecture. The client sends its request, the router terminates TLS, reads everything (API keys, system prompts, tool-call payloads), and then opens a fresh connection upstream. No provider signs the response. No integrity check binds what the model actually said to what your agent receives and executes. The router is trusted because you configured it, not because anything has verified it deserves that trust.

What the researchers found

They purchased 28 paid routers from Taobao, Xianyu, and Shopify storefronts and collected 400 free ones from public communities. Nine were actively injecting malicious code into tool-call responses. Seventeen silently harvested AWS credentials that crossed the wire. One drained ETH from a researcher-controlled wallet.

Two of the malicious routers used adaptive evasion, only activating after 50 or more clean requests, or when they detected autonomous YOLO-mode sessions targeting Rust and Go projects. That last detail should concern anyone who thinks a quick smoke test constitutes sufficient vetting. A router that behaves perfectly for your first 49 requests and then starts rewriting tool calls on the 50th is specifically designed to survive exactly the kind of shallow audit most teams run.

The poisoning study is the real wake-up call

The finding that stuck with me wasn’t the overtly malicious routers. It was the poisoning study.

The researchers leaked a single OpenAI key into Chinese forums and messaging groups, then watched it get absorbed into router chains. That one key generated 100 million GPT-5.4 tokens and exposed more than seven Codex sessions. In a parallel study, they deployed weakly configured decoy routers across 20 domains and 20 IPs. Those decoys attracted tens of thousands of unauthorized access attempts and were folded into active agent-serving relay paths. The result: roughly 2 billion billed tokens, 99 leaked credentials across 440 Codex sessions spanning 398 different projects, and 401 of those sessions running fully autonomous with no human in the loop.

The implication is straightforward and uncomfortable. Your “trusted” router doesn’t need to be malicious at signup. It just needs to reuse one leaked key or chain through one weak upstream relay, and the entire trust boundary silently expands to include every operator in that chain. You think you’re trusting one service. In practice, you may be trusting a graph of pseudonymous intermediaries you’ve never heard of.

What this means operationally

This isn’t a novel attack technique. JSON rewriting through a proxy is trivial engineering. What’s new is the measurement, the proof that this is already happening in commodity markets, at scale, with real money and real credentials being stolen today.

If your teams are using third-party LLM routers (and statistically, some of them probably are, especially if they’re optimizing for cost or working around rate limits), here’s where I’d focus:

Inventory first

You cannot secure what you haven’t enumerated. Find every base URL your agents are configured to hit. If it’s not a first-party provider endpoint, you have a router in your chain, and you should treat it as an unverified intermediary until proven otherwise. This includes developers on your team who may have swapped in a cheaper endpoint without telling anyone.

Kill YOLO mode in anything touching production

401 out of 440 observed sessions in the poisoning study were auto-approving tool execution with zero human confirmation. Autonomous execution through an unverified proxy is outsourcing your incident response to an attacker’s timeline. If your agents are running with auto-approve enabled, they will execute whatever the router sends back, and the router knows it.

Implement a fail-closed policy gate on high-risk tool calls

The paper’s own evaluation shows this blocks 100% of direct payload injection and dependency substitution attacks at a 1% false-positive rate. It is bypassable by a sophisticated attacker who pre-positions payloads on an allowlisted domain (the paper is honest about that limitation), but it raises the bar significantly above “rewrite a URL in a JSON blob.” The cost is operational: maintaining domain and package allowlists, and handling the occasional false positive when a legitimate installer fetch hits an unlisted mirror.

Treat API keys like database credentials

Not like magazine subscriptions. Rotate them. Scope them. Monitor for unexpected upstream billing spikes. The leaked-key study showed how fast a single exposed credential becomes a supply-chain weapon, generating 100 million tokens of downstream exposure from one key posted in a chat group.

Push your providers on response signing

The paper proposes a provider-signed response envelope, essentially DKIM for tool calls. The provider would sign the model identifier, tool name, arguments, finish reason, and a client nonce, so the client can verify the response before executing anything. None of the major providers implement this today. Until they do, there is no cryptographic guarantee that what your agent executes is what the model actually produced. That’s the structural gap, and no amount of client-side screening fully closes it.

The bigger picture

The LiteLLM supply-chain compromise in March showed what happens when a trusted router gets poisoned through dependency confusion. This paper shows the problem is structural, not incidental. The architecture assumes the router is honest. The measurement proves that assumption is already being violated in commodity markets that serve a meaningful share of global LLM routing volume.

We’ve spent decades learning that “trust but verify” doesn’t work when there’s no mechanism for verification. LLM agent infrastructure is repeating that lesson, except now the payload the intermediary can tamper with isn’t just data. It’s executable tool calls that your agent will run on your infrastructure, on your credentials, in your environment.

The paper is worth reading in full, especially Sections 4 and 7 on the attack taxonomy and deployable defenses.

Paper: Your Agent Is Mine (arXiv:2604.08407)

This comes weeks after the LiteLLM PyPI supply-chain incident in March 2026. The attack surface for anyone routing LLM calls through third-party infrastructure is wider than most teams realize.

What the researchers found

The poisoning study is the real wake-up call

What this means operationally

Inventory first

Kill YOLO mode in anything touching production

Implement a fail-closed policy gate on high-risk tool calls

Treat API keys like database credentials

Push your providers on response signing

The bigger picture

You might also like...

Reading: The CISO Skill Nobody Talks About

Stop Treating Security Like a Gatekeeper (Start Treating It Like a Guardrail)

Building a security program for edtech when your customers are the auditors

Running NIST SP 800–53 Controls on a Team That Fits in a Conference Room