A new attack lets hackers steal company data by hiding malicious instructions inside an AI tool's description.
New Microsoft’s research shows that AI agents can be tricked into sharing sensitive information without breaking any rules. Because every action appears normal, the attack may go unnoticed.
In the described scenario, a finance team uses an AI agent to manage invoices with the help of a third-party tool. An attacker secretly updates that tool's description, adding hidden instructions that tell the AI to collect unpaid invoices and send them during its next request.
When an employee later asks the AI a routine question, the agent follows the hidden instructions. It gathers the invoices and sends them to the compromised tool, which quietly copies the data to an attacker-controlled server before returning a normal response.
“Each action the agent takes on its own is legitimate. The tool is approved, the Dataverse query inherits the analyst’s permissions, and the outbound call goes to a server that was allowlisted when it was added. The vulnerability is not in any single system; it is in the trust boundary between them.The MCP blends instructions (tool descriptions) with data, so a change to a tool’s metadata can redirect the agent’s behavior as effectively as a change to its system prompt. The agent cannot distinguish between a legitimate instruction authored by its owner and a malicious instruction inserted by an upstream maintainer,” Microsoft expains.
Research says that the problem is not a bug in Copilot but a trust issue with third-party tools. Since AI agents read tool descriptions to decide what to do, attackers can use the descriptions to influence the agent's actions if they are not properly reviewed.