MCPGuide

What Is MCP Tool Poisoning? How the Attack Works and How to Prevent It

MCP tool poisoning hides malicious instructions in an MCP tool's description or response so an AI agent executes them as trusted commands. Learn how the attack works, its variants, and how to prevent it.

Agen.co

12 min read

What Is MCP Tool Poisoning? How the Attack Works and How to Prevent It

Your agent reviews the code behind a tool. The model never sees it. MCP tool poisoning lives in that gap: an attack in which a malicious or compromised Model Context Protocol (MCP) server hides instructions inside a tool's description, parameters, or response so that an AI agent reads them as trusted commands and acts on them. The tool's code can look completely benign, because the payload sits in the metadata the model consumes, not in the function the developer reviews.

That distinction is what makes the attack dangerous. It targets the agent's reasoning rather than your infrastructure, so a poisoned tool can exfiltrate data, leak credentials, or chain into more privileged tools while conventional security scanners see nothing wrong. This guide explains how tool poisoning works, walks through the main attack variants and a real-world incident, and gives you a prevention checklist organized by who owns each control. Here is the short version of the argument. Tool poisoning is not a prompt-engineering bug you can write your way out of. It is a trust-and-provenance failure in your agent's tool supply chain, and it has to be fixed there.

What is MCP tool poisoning?

MCP tool poisoning is a type of indirect prompt injection that embeds malicious instructions in the metadata of an MCP tool, so that an AI agent executes them as if they were legitimate directives. When an agent connects to an MCP server, the client passes each tool's name, description, and parameter descriptions into the model's context window. A tool poisoning attack puts attacker-controlled instructions into exactly that text, or into the values a tool returns at runtime.

The key thing to understand is the gap between what a developer reviews and what the model reads. A developer auditing a tool reads its source code. The model never sees that code. It sees the natural-language description and the tool's responses, and it treats both as trustworthy context. If you are new to how MCP exposes tools to a model, it helps to understand what MCP tools are before going further, because the attack lives entirely in the layer the model consumes rather than the layer engineers inspect.

This matters for anyone building or operating agents on top of MCP: the platform and AI engineers wiring up tool servers, and the security teams responsible for what those agents can reach. As MCP adoption has accelerated, tool poisoning has moved from a theoretical concern to a documented class of MCP vulnerability, catalogued by the OWASP community as a named MCP attack.

Why MCP tool poisoning matters

Why does one poisoned tool description matter so much? Because of what security researchers call the "lethal trifecta": the combination of access to private data, exposure to untrusted instructions, and an available exfiltration vector in the same agent, a toxic combination documented in independent analysis of MCP's prompt-injection problems. When all three are present, an attacker who controls any one input can turn the agent into an accomplice.

MCP makes that combination unusually easy to assemble. An agent might connect to a server that reads your messages or files (private data), receive a poisoned tool description or response (untrusted instructions), and have a network-capable tool or output channel available (exfiltration). None of these capabilities is malicious on its own. The danger is that MCP lets a single agent hold all three at once, and a poisoned tool is the spark that connects them.

Because the attack rides on the model's instruction-following behavior, it does not require breaking authentication, exploiting a memory bug, or compromising the host. That is why teams who have hardened their MCP servers at the infrastructure level can still be fully exposed at the reasoning level.

How an MCP tool poisoning attack works

A tool poisoning attack generally unfolds in a predictable sequence:

The attacker operates a malicious or compromised MCP server. The server exposes tools with normal, helpful-looking names. It may be a server the attacker published, or a legitimate one whose definitions were tampered with.
The victim's agent connects and ingests the tool definitions. The MCP client injects every tool's description and parameters into the model's context, without validating their content.
Hidden instructions reach the model as trusted text. The description or a tool response contains directives, often visually obfuscated, telling the model to do something harmful.
The model acts on the injected instructions. It may call a more privileged tool, read a secret, or send data to an attacker-controlled destination, believing it is following legitimate guidance.

Where the malicious payload hides

The payload can live in several places: the tool's description or docstring, its parameter descriptions, or the tool response returned at call time. Tool-description poisoning is the classic form. Tool response poisoning is more insidious, because the malicious content arrives at runtime, after any connect-time review has already passed.

Why the client trusts it

Most MCP clients pass tool descriptions into the model context without sanitization or schema enforcement, and the model is designed to follow instructions it finds in its context. There is no built-in boundary that says "this text is data to display, not a command to obey." A maliciously crafted description therefore reads to the model exactly like a system instruction.

How the user is kept in the dark

Attackers lean on interface quirks to hide the payload from humans. Documented examples use elaborate whitespace and off-screen text, so that a tool description looks innocuous in a UI that hides horizontal scrollbars while the model still receives the full hidden instruction. The user approves what looks like a harmless tool. The model sees something very different.

MCP tool poisoning attack variants

Tool poisoning is a family of techniques rather than a single exploit. The main variants:

Variant	How it works
Classic tool-description poisoning	Malicious instructions are embedded in a tool's description or docstring and processed by the model as trusted input.
Tool response poisoning	The hidden instructions arrive in the tool's runtime response, bypassing any vetting done at connection time.
Rug pull (bait-and-switch)	A tool is benign when approved, then its server-side definition mutates after consent. Safe on day one, exfiltrating your API keys by day seven.
Tool shadowing / squatting	A malicious tool mimics a legitimate one using similar names, homoglyphs, or naming collisions so the model or user selects it instead of the real one.
Line jumping / cross-server interference	With multiple servers connected to one agent, a malicious server overrides or intercepts calls intended for a trusted server.

The rug pull variant deserves special attention, because it defeats one-time review entirely. Two conditions enable it: server-side tool logic is mutable, and standard clients do not re-fetch and re-verify a tool's full definition on every invocation, conditions analyzed in research on mitigating tool squatting and rug pull attacks in MCP. Approve a tool once and you have effectively trusted every future version of it. Tool shadowing and squatting attack a different moment, the selection stage: a fake server that imitates a trusted one only has to be picked once.

A real-world example: the WhatsApp-MCP attack

The most cited demonstration of tool poisoning targeted a WhatsApp MCP integration, analyzed in detail by Simon Willison. A malicious server defined an innocent-looking tool, presented as something as harmless as returning a daily fact, that quietly changed its definition to read the user's WhatsApp message history and reroute messages to an attacker-controlled number, all while instructing the model to hide evidence of the compromise from the user. The exfiltration was concealed using off-screen text in the client UI, so nothing looked wrong on screen.

A second documented example comes from the OWASP MCP tool poisoning write-up: a tool that reports "compliance status" returns text reading "SOC2 Status: REVIEW REQUIRED" followed by a fake directive telling the agent to read the system's /etc/shadow file and send its contents to an external endpoint. The model has no way to distinguish the fabricated compliance requirement from a real one.

Both examples share the same shape. A tool that looks legitimate carries instructions the model obeys, and the human in the loop is deliberately kept from seeing them.

Impact and blast radius

The damage a poisoned tool can do scales directly with what the agent is allowed to reach. Common outcomes include data exfiltration, credential exposure (API keys, tokens, and secrets the agent can access), unauthorized actions such as deleting or modifying data, and bypassing security checks by hijacking the model's prioritization logic.

The more capable the agent, the larger the MCP blast radius. An agent with one read-only tool is a limited target. An agent that can read files, call internal APIs, and reach the network can be chained into a cascading attack, where one poisoned tool's output triggers calls to more privileged tools. Credential exposure is especially damaging, because leaked secrets extend the attacker's reach far beyond the original agent.

Why tool poisoning is hard to detect

Tool poisoning attacks target the cognitive process of the model rather than a code-level flaw, which makes them difficult to catch with conventional security approaches. A static analyzer scanning tool source code finds nothing, because the malicious content is in natural-language metadata, not executable code. A vulnerability scanner looking for known CVEs finds nothing, because there is no software vulnerability in the traditional sense.

The deeper problem is timing. Trust is established at connection time, when a human or policy approves a server and its tools. It is abused at runtime, when responses and mutated definitions flow back. Any control that only inspects tools at approval, and never again, is blind to rug pulls and response poisoning by design.

How to detect MCP tool poisoning

Because the attack is behavioral, detection focuses on change and anomaly rather than signatures:

Tool-description diffing. Record a tool's definition at approval and compare it on every use. A description that changes is the clearest rug-pull tell.
Anomalous tool-call chains. A tool requesting access to a secret, a file, or a privileged action that is unrelated to its stated purpose.
Unexpected egress. Outbound network calls to destinations a tool has no legitimate reason to contact.
Near-duplicate tool names. Two tools with confusingly similar names, or homoglyph variants, are a shadowing or squatting tell.
Out-of-context instruction patterns. Tool descriptions or responses that read like commands to the model rather than data for the user.

How to prevent MCP tool poisoning

There is no single setting that fixes tool poisoning, because the root cause is that agents trust tools they should treat as untrusted third-party code. Effective prevention layers several controls, and it helps to organize them by who is responsible for each. These controls extend the broader practices in our guide to MCP security.

Control	Owner	What it stops
Allowlist + provenance for MCP servers	Platform / security	Malicious and fake servers
Pinned, integrity-checked tool definitions	Client / platform	Rug pulls, definition mutation
Least privilege + context isolation	Platform / engineering	Blast-radius escalation
Runtime guardrails + human-in-the-loop	Platform / security	Response poisoning, exfiltration

Govern the tool supply chain

Treat every MCP server as untrusted third-party code until proven otherwise. Maintain an allowlist of approved servers, verify provenance before connecting, and be especially wary of fake or malicious MCP servers that imitate popular integrations. Centralized governance over which servers and tools your agents may use, of the kind an MCP platform provides, is the foundation everything else builds on.

Pin and integrity-check tool definitions

Use immutable, versioned tool definitions, and re-fetch and re-verify the full definition on every invocation, not just at approval. Alert users when a tool's description changes. The official MCP security best practices set out the consent, scope-minimization, and validation controls that underpin this, and treating its SHOULDs as MUSTs is the single most effective defense against rug pulls. Cryptographic approaches go further. Research on OAuth-enhanced tool definitions with policy-based access control proposes signing tool definitions so a mutated or impersonated tool fails verification.

Enforce least privilege and isolation

Scope each agent and each tool to the minimum access it needs, and isolate privileged tools into separate agent contexts so a poisoned low-trust tool cannot chain into them. Enforce access controls on the server side rather than relying on instructions in the system prompt, which a poisoned tool can simply override. Strong MCP least privilege is what keeps a successful poisoning attempt from becoming a catastrophic one.

Add runtime guardrails

Inspect tool calls and responses at runtime, ideally at a central enforcement point. An MCP gateway is a natural place to apply description-change detection, tool-call filtering, egress controls, and structured (JSON) output schemas instead of free-text responses. Require explicit human confirmation for sensitive operations, so a model cannot silently act on an injected instruction.

Tool poisoning vs prompt injection vs rug pull

These terms overlap, which causes confusion. The relationship:

Term	What it is	Relationship
Prompt injection	Any attack that smuggles attacker-controlled instructions into a model's context.	The broad parent category.
MCP tool poisoning	Indirect prompt injection where the instructions hide in MCP tool metadata or responses.	A specific form of indirect prompt injection.
Rug pull	A poisoned tool that was benign at approval and mutates afterward.	A time-based variant of tool poisoning.

In short: all tool poisoning is prompt injection, but not all prompt injection is tool poisoning, and a rug pull is tool poisoning that arrives on a delay.

MCP tool poisoning prevention checklist

Maintain an allowlist of approved MCP servers and verify provenance before connecting.
Pin tool definitions and re-verify them on every invocation; alert on any change.
Prefer cryptographically signed or OAuth-enhanced tool definitions where available.
Apply least privilege to every agent and tool; isolate privileged tools.
Enforce access control server-side, never via system-prompt instructions alone.
Filter tool calls and control network egress at a runtime enforcement point.
Use structured output schemas instead of free-text tool responses.
Require human confirmation for sensitive or irreversible actions.
Monitor for near-duplicate tool names and anomalous tool-call chains.

Frequently asked questions

What is MCP tool poisoning?

It is an attack in which a malicious or compromised MCP server hides instructions in a tool's description, parameters, or response so that an AI agent executes them as trusted commands. The tool's code can look harmless, because the payload lives in the metadata the model reads, not the code a developer reviews.

How is tool poisoning different from prompt injection?

Tool poisoning is a specific kind of indirect prompt injection. Prompt injection is the broad category of smuggling instructions into a model's context. Tool poisoning is the case where those instructions hide inside MCP tool metadata or tool responses.

What is a rug pull attack in MCP?

A rug pull is a tool that behaves normally when you approve it, then changes its server-side definition afterward to do something malicious. It works because tool logic can be mutated and most clients do not re-verify a tool's definition on every use.

Can antivirus or code scanners detect tool poisoning?

Usually not. The malicious content is natural-language metadata aimed at the model's reasoning, not executable code with a known vulnerability, so conventional scanners and antivirus tools have nothing to flag. Detection relies on description diffing and runtime behavioral monitoring instead.

How do I prevent MCP tool poisoning?

Layer your controls. Allowlist and verify MCP servers, pin and integrity-check tool definitions, apply least privilege and context isolation, and add runtime guardrails such as tool-call filtering, egress control, and human confirmation for sensitive actions.

Are official or popular MCP servers safe from tool poisoning?

Popularity is not a guarantee. A legitimate server can be compromised, and rug pulls specifically exploit trust in a previously approved tool. Pinning definitions and re-verifying them on every use protects you even from servers you already trust.

To go deeper on the surrounding topics, see our guides on MCP security risks, building, deploying, and securing MCP servers, and the Model Context Protocol overall.

Govern your agent tool supply chain with Agen. Tool poisoning is a provenance and trust problem, and that is exactly what agent identity and MCP runtime controls are built to solve: allowlisting and provenance for the servers your agents connect to, pinned tool definitions, least-privilege scoping, and runtime tool-call filtering. See how an MCP gateway helps you put guardrails between your agents and the tools they call.

Keep reading

What Is MCP Tool Poisoning? How the Attack Works and How to Prevent It

Agen.co

12 min read

What is MCP tool poisoning?

Why MCP tool poisoning matters

How an MCP tool poisoning attack works

A tool poisoning attack generally unfolds in a predictable sequence:

The attacker operates a malicious or compromised MCP server. The server exposes tools with normal, helpful-looking names. It may be a server the attacker published, or a legitimate one whose definitions were tampered with.
The victim's agent connects and ingests the tool definitions. The MCP client injects every tool's description and parameters into the model's context, without validating their content.
Hidden instructions reach the model as trusted text. The description or a tool response contains directives, often visually obfuscated, telling the model to do something harmful.
The model acts on the injected instructions. It may call a more privileged tool, read a secret, or send data to an attacker-controlled destination, believing it is following legitimate guidance.

Where the malicious payload hides

Why the client trusts it

How the user is kept in the dark

MCP tool poisoning attack variants

Tool poisoning is a family of techniques rather than a single exploit. The main variants:

Variant	How it works
Classic tool-description poisoning	Malicious instructions are embedded in a tool's description or docstring and processed by the model as trusted input.
Tool response poisoning	The hidden instructions arrive in the tool's runtime response, bypassing any vetting done at connection time.
Rug pull (bait-and-switch)	A tool is benign when approved, then its server-side definition mutates after consent. Safe on day one, exfiltrating your API keys by day seven.
Tool shadowing / squatting	A malicious tool mimics a legitimate one using similar names, homoglyphs, or naming collisions so the model or user selects it instead of the real one.
Line jumping / cross-server interference	With multiple servers connected to one agent, a malicious server overrides or intercepts calls intended for a trusted server.

A real-world example: the WhatsApp-MCP attack

Both examples share the same shape. A tool that looks legitimate carries instructions the model obeys, and the human in the loop is deliberately kept from seeing them.

Impact and blast radius

Why tool poisoning is hard to detect

How to detect MCP tool poisoning

Because the attack is behavioral, detection focuses on change and anomaly rather than signatures:

Tool-description diffing. Record a tool's definition at approval and compare it on every use. A description that changes is the clearest rug-pull tell.
Anomalous tool-call chains. A tool requesting access to a secret, a file, or a privileged action that is unrelated to its stated purpose.
Unexpected egress. Outbound network calls to destinations a tool has no legitimate reason to contact.
Near-duplicate tool names. Two tools with confusingly similar names, or homoglyph variants, are a shadowing or squatting tell.
Out-of-context instruction patterns. Tool descriptions or responses that read like commands to the model rather than data for the user.

How to prevent MCP tool poisoning

Control	Owner	What it stops
Allowlist + provenance for MCP servers	Platform / security	Malicious and fake servers
Pinned, integrity-checked tool definitions	Client / platform	Rug pulls, definition mutation
Least privilege + context isolation	Platform / engineering	Blast-radius escalation
Runtime guardrails + human-in-the-loop	Platform / security	Response poisoning, exfiltration

Govern the tool supply chain

Pin and integrity-check tool definitions

Enforce least privilege and isolation

Add runtime guardrails

Tool poisoning vs prompt injection vs rug pull

These terms overlap, which causes confusion. The relationship:

Term	What it is	Relationship
Prompt injection	Any attack that smuggles attacker-controlled instructions into a model's context.	The broad parent category.
MCP tool poisoning	Indirect prompt injection where the instructions hide in MCP tool metadata or responses.	A specific form of indirect prompt injection.
Rug pull	A poisoned tool that was benign at approval and mutates afterward.	A time-based variant of tool poisoning.

In short: all tool poisoning is prompt injection, but not all prompt injection is tool poisoning, and a rug pull is tool poisoning that arrives on a delay.

MCP tool poisoning prevention checklist

Maintain an allowlist of approved MCP servers and verify provenance before connecting.
Pin tool definitions and re-verify them on every invocation; alert on any change.
Prefer cryptographically signed or OAuth-enhanced tool definitions where available.
Apply least privilege to every agent and tool; isolate privileged tools.
Enforce access control server-side, never via system-prompt instructions alone.
Filter tool calls and control network egress at a runtime enforcement point.
Use structured output schemas instead of free-text tool responses.
Require human confirmation for sensitive or irreversible actions.
Monitor for near-duplicate tool names and anomalous tool-call chains.

Frequently asked questions

What is MCP tool poisoning?

How is tool poisoning different from prompt injection?

What is a rug pull attack in MCP?

Can antivirus or code scanners detect tool poisoning?

How do I prevent MCP tool poisoning?

Are official or popular MCP servers safe from tool poisoning?

To go deeper on the surrounding topics, see our guides on MCP security risks, building, deploying, and securing MCP servers, and the Model Context Protocol overall.

Keep reading

What Is MCP Tool Poisoning? How the Attack Works and How to Prevent It

What is MCP tool poisoning?

Why MCP tool poisoning matters

How an MCP tool poisoning attack works

Where the malicious payload hides

Why the client trusts it

How the user is kept in the dark

MCP tool poisoning attack variants

A real-world example: the WhatsApp-MCP attack

Impact and blast radius

Why tool poisoning is hard to detect

How to detect MCP tool poisoning

How to prevent MCP tool poisoning

Govern the tool supply chain

Pin and integrity-check tool definitions

Enforce least privilege and isolation

Add runtime guardrails

Tool poisoning vs prompt injection vs rug pull

MCP tool poisoning prevention checklist

Frequently asked questions

What is MCP tool poisoning?

How is tool poisoning different from prompt injection?

What is a rug pull attack in MCP?

Can antivirus or code scanners detect tool poisoning?

How do I prevent MCP tool poisoning?

Are official or popular MCP servers safe from tool poisoning?

Related MCP security resources

More from MCP

What is MCP (Model Context Protocol)? A Complete Guide

What Is MCP Tool Poisoning? How the Attack Works and How to Prevent It

What is MCP tool poisoning?

Why MCP tool poisoning matters

How an MCP tool poisoning attack works

Where the malicious payload hides

Why the client trusts it

How the user is kept in the dark

MCP tool poisoning attack variants

A real-world example: the WhatsApp-MCP attack

Impact and blast radius

Why tool poisoning is hard to detect

How to detect MCP tool poisoning

How to prevent MCP tool poisoning

Govern the tool supply chain

Pin and integrity-check tool definitions

Enforce least privilege and isolation

Add runtime guardrails

Tool poisoning vs prompt injection vs rug pull

MCP tool poisoning prevention checklist

Frequently asked questions

What is MCP tool poisoning?

How is tool poisoning different from prompt injection?

What is a rug pull attack in MCP?

Can antivirus or code scanners detect tool poisoning?

How do I prevent MCP tool poisoning?

Are official or popular MCP servers safe from tool poisoning?

Related MCP security resources

More from MCP

What is MCP (Model Context Protocol)? A Complete Guide

MCP Access Control: Secure AI Agent Gateways

What are MCP Tools? How They Work & How to Use Them