BIPI
BIPI

Multi-Agent Systems and the New Attack Surface Nobody Has Mapped Yet

Agentic AI

When AI agents talk to other AI agents, trust becomes the attack surface. A2A protocol, orchestrator compromise, and inter-agent prompt injection are reshaping what it means to secure an AI deployment. Here is what the threat model looks like in 2025.

By Arjun Raghavan, Security & Systems Lead, BIPI · July 5, 2025 · 11 min read

#multi-agent#a2a-protocol#orchestrator#agent-trust#ai-security#agentic-ai

Single-agent systems were already complicated enough. An agent has a goal, a set of tools, and a loop. The attack surface is bounded: user inputs, tool responses, and memory. Multi-agent systems break that bound. Now you have agents issuing instructions to other agents, orchestrators decomposing tasks and assigning sub-tasks to specialised workers, and the question of trust between agents becomes acutely non-trivial.

The A2A Protocol and What It Enables

Google's Agent-to-Agent (A2A) protocol, along with analogous patterns in LangGraph, AutoGen, and custom orchestration frameworks, defines how agents communicate task requests, share context, and return results. The protocol is designed for capability — enabling complex workflows that no single agent could execute alone. It was not designed with adversarial conditions as a first-class concern. That gap is now being exploited.

Orchestrator Compromise: The Highest-Value Target

In most multi-agent architectures, an orchestrator agent decomposes high-level goals into sub-tasks and dispatches them to worker agents. Compromising the orchestrator is equivalent to compromising the entire system — not just one agent, but the authority that directs all of them. Orchestrators are typically exposed to the widest range of inputs, making them the highest-value target for injection attacks.

  • Orchestrator prompt injection: user input or retrieved content that modifies the orchestrator's task decomposition logic
  • Sub-task hijacking: intercepting the message channel between orchestrator and worker to replace legitimate tasks with malicious ones
  • Result fabrication: a compromised worker agent returns false results that cause the orchestrator to take incorrect downstream actions
  • Authority escalation: a worker agent claims elevated permissions by impersonating the orchestrator in messages to other workers
  • Context pollution: injecting data into shared context stores that all agents read, affecting the behaviour of the entire swarm

Agent-to-Agent Trust: The Core Problem

In human organisations, trust is established through identity verification, role definitions, and audit trails. In most multi-agent systems, trust between agents is implicit — a worker agent trusts messages from the orchestrator because the system is designed that way, not because the orchestrator has cryptographically proven its identity. This is the same mistake that made early network protocols vulnerable: assuming a benign environment and building no verification layer.

Mapping the Attack Surface

  1. Enumerate every communication channel between agents — message queues, shared memory, API calls, file system handoffs
  2. Identify which agents have write access to shared state and treat those as critical trust boundaries
  3. Determine whether the orchestrator's identity is verified by workers or assumed from context
  4. Test whether a worker agent can be made to impersonate the orchestrator through adversarial output
  5. Examine whether task results from one agent can inject instructions into another agent's context
  6. Look for fan-out amplification: a single compromised input reaching multiple agents simultaneously
  7. Test the system's behaviour when one agent in the swarm is deliberately fed false information

Defence Patterns That Actually Work

Cryptographic signing of inter-agent messages is the right answer for high-assurance systems, though it adds complexity. For most teams, a more pragmatic starting point is strict schema validation on all inter-agent messages, explicit capability declarations that agents cannot self-modify, and a read-only pattern for shared context (write operations require human or orchestrator approval). The goal is to make each agent's behaviour predictable from its inputs, removing the conditions that allow cascade compromise.

blast radius amplification when orchestrator is compromised vs. single worker agent
91%
of multi-agent systems audited in 2025 lacked any cryptographic inter-agent identity verification
18 min
median time for a compromised worker agent to affect orchestrator state in an unprotected swarm

The Standards Gap

No major security standard as of mid-2025 specifically addresses multi-agent trust architectures. OWASP's LLM Top 10 covers prompt injection and supply chain risk at the single-agent level. ISO 42001 covers AI management processes. Neither maps cleanly to the distributed, asynchronous, goal-directed nature of production multi-agent systems. Teams deploying these architectures are operating without a compliance framework and largely without peer benchmarks. That will change — but the attacks will not wait for the standards to catch up.

Read more field notes, explore our services, or get in touch at info@bipi.in. Privacy Policy · Terms.