MCP tool poisoning hides malicious instructions in an MCP tool's description or response so an AI agent executes them as trusted commands. Learn how the attack works, its variants, and how to prevent it.

Your agent reviews the code behind a tool. The model never sees it. MCP tool poisoning lives in that gap: an attack in which a malicious or compromised Model Context Protocol (MCP) server hides instructions inside a tool's description, parameters, or response so that an AI agent reads them as trusted commands and acts on them. The tool's code can look completely benign, because the payload sits in the metadata the model consumes, not in the function the developer reviews.
That distinction is what makes the attack dangerous. It targets the agent's reasoning rather than your infrastructure, so a poisoned tool can exfiltrate data, leak credentials, or chain into more privileged tools while conventional security scanners see nothing wrong. This guide explains how tool poisoning works, walks through the main attack variants and a real-world incident, and gives you a prevention checklist organized by who owns each control. Here is the short version of the argument. Tool poisoning is not a prompt-engineering bug you can write your way out of. It is a trust-and-provenance failure in your agent's tool supply chain, and it has to be fixed there.
MCP tool poisoning is a type of indirect prompt injection that embeds malicious instructions in the metadata of an MCP tool, so that an AI agent executes them as if they were legitimate directives. When an agent connects to an MCP server, the client passes each tool's name, description, and parameter descriptions into the model's context window. A tool poisoning attack puts attacker-controlled instructions into exactly that text, or into the values a tool returns at runtime.
The key thing to understand is the gap between what a developer reviews and what the model reads. A developer auditing a tool reads its source code. The model never sees that code. It sees the natural-language description and the tool's responses, and it treats both as trustworthy context. If you are new to how MCP exposes tools to a model, it helps to understand what MCP tools are before going further, because the attack lives entirely in the layer the model consumes rather than the layer engineers inspect.
This matters for anyone building or operating agents on top of MCP: the platform and AI engineers wiring up tool servers, and the security teams responsible for what those agents can reach. As MCP adoption has accelerated, tool poisoning has moved from a theoretical concern to a documented class of MCP vulnerability, catalogued by the OWASP community as a named MCP attack.
Why does one poisoned tool description matter so much? Because of what security researchers call the "lethal trifecta": the combination of access to private data, exposure to untrusted instructions, and an available exfiltration vector in the same agent, a toxic combination documented in independent analysis of MCP's prompt-injection problems. When all three are present, an attacker who controls any one input can turn the agent into an accomplice.
MCP makes that combination unusually easy to assemble. An agent might connect to a server that reads your messages or files (private data), receive a poisoned tool description or response (untrusted instructions), and have a network-capable tool or output channel available (exfiltration). None of these capabilities is malicious on its own. The danger is that MCP lets a single agent hold all three at once, and a poisoned tool is the spark that connects them.
Because the attack rides on the model's instruction-following behavior, it does not require breaking authentication, exploiting a memory bug, or compromising the host. That is why teams who have hardened their MCP servers at the infrastructure level can still be fully exposed at the reasoning level.
A tool poisoning attack generally unfolds in a predictable sequence:
The payload can live in several places: the tool's description or docstring, its parameter descriptions, or the tool response returned at call time. Tool-description poisoning is the classic form. Tool response poisoning is more insidious, because the malicious content arrives at runtime, after any connect-time review has already passed.
Most MCP clients pass tool descriptions into the model context without sanitization or schema enforcement, and the model is designed to follow instructions it finds in its context. There is no built-in boundary that says "this text is data to display, not a command to obey." A maliciously crafted description therefore reads to the model exactly like a system instruction.
Attackers lean on interface quirks to hide the payload from humans. Documented examples use elaborate whitespace and off-screen text, so that a tool description looks innocuous in a UI that hides horizontal scrollbars while the model still receives the full hidden instruction. The user approves what looks like a harmless tool. The model sees something very different.
Tool poisoning is a family of techniques rather than a single exploit. The main variants:
| Variant | How it works |
|---|---|
| Classic tool-description poisoning | Malicious instructions are embedded in a tool's description or docstring and processed by the model as trusted input. |
| Tool response poisoning | The hidden instructions arrive in the tool's runtime response, bypassing any vetting done at connection time. |
| Rug pull (bait-and-switch) | A tool is benign when approved, then its server-side definition mutates after consent. Safe on day one, exfiltrating your API keys by day seven. |
| Tool shadowing / squatting | A malicious tool mimics a legitimate one using similar names, homoglyphs, or naming collisions so the model or user selects it instead of the real one. |
| Line jumping / cross-server interference | With multiple servers connected to one agent, a malicious server overrides or intercepts calls intended for a trusted server. |
The rug pull variant deserves special attention, because it defeats one-time review entirely. Two conditions enable it: server-side tool logic is mutable, and standard clients do not re-fetch and re-verify a tool's full definition on every invocation, conditions analyzed in research on mitigating tool squatting and rug pull attacks in MCP. Approve a tool once and you have effectively trusted every future version of it. Tool shadowing and squatting attack a different moment, the selection stage: a fake server that imitates a trusted one only has to be picked once.
The most cited demonstration of tool poisoning targeted a WhatsApp MCP integration, analyzed in detail by Simon Willison. A malicious server defined an innocent-looking tool, presented as something as harmless as returning a daily fact, that quietly changed its definition to read the user's WhatsApp message history and reroute messages to an attacker-controlled number, all while instructing the model to hide evidence of the compromise from the user. The exfiltration was concealed using off-screen text in the client UI, so nothing looked wrong on screen.
A second documented example comes from the OWASP MCP tool poisoning write-up: a tool that reports "compliance status" returns text reading "SOC2 Status: REVIEW REQUIRED" followed by a fake directive telling the agent to read the system's /etc/shadow file and send its contents to an external endpoint. The model has no way to distinguish the fabricated compliance requirement from a real one.
Both examples share the same shape. A tool that looks legitimate carries instructions the model obeys, and the human in the loop is deliberately kept from seeing them.
The damage a poisoned tool can do scales directly with what the agent is allowed to reach. Common outcomes include data exfiltration, credential exposure (API keys, tokens, and secrets the agent can access), unauthorized actions such as deleting or modifying data, and bypassing security checks by hijacking the model's prioritization logic.
The more capable the agent, the larger the MCP blast radius. An agent with one read-only tool is a limited target. An agent that can read files, call internal APIs, and reach the network can be chained into a cascading attack, where one poisoned tool's output triggers calls to more privileged tools. Credential exposure is especially damaging, because leaked secrets extend the attacker's reach far beyond the original agent.
Tool poisoning attacks target the cognitive process of the model rather than a code-level flaw, which makes them difficult to catch with conventional security approaches. A static analyzer scanning tool source code finds nothing, because the malicious content is in natural-language metadata, not executable code. A vulnerability scanner looking for known CVEs finds nothing, because there is no software vulnerability in the traditional sense.
The deeper problem is timing. Trust is established at connection time, when a human or policy approves a server and its tools. It is abused at runtime, when responses and mutated definitions flow back. Any control that only inspects tools at approval, and never again, is blind to rug pulls and response poisoning by design.
Because the attack is behavioral, detection focuses on change and anomaly rather than signatures:
There is no single setting that fixes tool poisoning, because the root cause is that agents trust tools they should treat as untrusted third-party code. Effective prevention layers several controls, and it helps to organize them by who is responsible for each. These controls extend the broader practices in our guide to MCP security.
| Control | Owner | What it stops |
|---|---|---|
| Allowlist + provenance for MCP servers | Platform / security | Malicious and fake servers |
| Pinned, integrity-checked tool definitions | Client / platform | Rug pulls, definition mutation |
| Least privilege + context isolation | Platform / engineering | Blast-radius escalation |
| Runtime guardrails + human-in-the-loop | Platform / security | Response poisoning, exfiltration |
Treat every MCP server as untrusted third-party code until proven otherwise. Maintain an allowlist of approved servers, verify provenance before connecting, and be especially wary of fake or malicious MCP servers that imitate popular integrations. Centralized governance over which servers and tools your agents may use, of the kind an MCP platform provides, is the foundation everything else builds on.
Use immutable, versioned tool definitions, and re-fetch and re-verify the full definition on every invocation, not just at approval. Alert users when a tool's description changes. The official MCP security best practices set out the consent, scope-minimization, and validation controls that underpin this, and treating its SHOULDs as MUSTs is the single most effective defense against rug pulls. Cryptographic approaches go further. Research on OAuth-enhanced tool definitions with policy-based access control proposes signing tool definitions so a mutated or impersonated tool fails verification.
Scope each agent and each tool to the minimum access it needs, and isolate privileged tools into separate agent contexts so a poisoned low-trust tool cannot chain into them. Enforce access controls on the server side rather than relying on instructions in the system prompt, which a poisoned tool can simply override. Strong MCP least privilege is what keeps a successful poisoning attempt from becoming a catastrophic one.
Inspect tool calls and responses at runtime, ideally at a central enforcement point. An MCP gateway is a natural place to apply description-change detection, tool-call filtering, egress controls, and structured (JSON) output schemas instead of free-text responses. Require explicit human confirmation for sensitive operations, so a model cannot silently act on an injected instruction.
These terms overlap, which causes confusion. The relationship:
| Term | What it is | Relationship |
|---|---|---|
| Prompt injection | Any attack that smuggles attacker-controlled instructions into a model's context. | The broad parent category. |
| MCP tool poisoning | Indirect prompt injection where the instructions hide in MCP tool metadata or responses. | A specific form of indirect prompt injection. |
| Rug pull | A poisoned tool that was benign at approval and mutates afterward. | A time-based variant of tool poisoning. |
In short: all tool poisoning is prompt injection, but not all prompt injection is tool poisoning, and a rug pull is tool poisoning that arrives on a delay.
It is an attack in which a malicious or compromised MCP server hides instructions in a tool's description, parameters, or response so that an AI agent executes them as trusted commands. The tool's code can look harmless, because the payload lives in the metadata the model reads, not the code a developer reviews.
Tool poisoning is a specific kind of indirect prompt injection. Prompt injection is the broad category of smuggling instructions into a model's context. Tool poisoning is the case where those instructions hide inside MCP tool metadata or tool responses.
A rug pull is a tool that behaves normally when you approve it, then changes its server-side definition afterward to do something malicious. It works because tool logic can be mutated and most clients do not re-verify a tool's definition on every use.
Usually not. The malicious content is natural-language metadata aimed at the model's reasoning, not executable code with a known vulnerability, so conventional scanners and antivirus tools have nothing to flag. Detection relies on description diffing and runtime behavioral monitoring instead.
Layer your controls. Allowlist and verify MCP servers, pin and integrity-check tool definitions, apply least privilege and context isolation, and add runtime guardrails such as tool-call filtering, egress control, and human confirmation for sensitive actions.
Popularity is not a guarantee. A legitimate server can be compromised, and rug pulls specifically exploit trust in a previously approved tool. Pinning definitions and re-verifying them on every use protects you even from servers you already trust.
To go deeper on the surrounding topics, see our guides on MCP security risks, building, deploying, and securing MCP servers, and the Model Context Protocol overall.
Govern your agent tool supply chain with Agen. Tool poisoning is a provenance and trust problem, and that is exactly what agent identity and MCP runtime controls are built to solve: allowlisting and provenance for the servers your agents connect to, pinned tool definitions, least-privilege scoping, and runtime tool-call filtering. See how an MCP gateway helps you put guardrails between your agents and the tools they call.
Written by
Agen.co
Learn how to implement MCP access control for AI agents with OAuth 2.1, RBAC, CBAC, and Zero Trust enforcement patterns for platform and security teams.