A Field Catalog of Agent Failure Modes
Agentic AI
Agents fail in patterns. We have seen them all in production: tool param hallucination, infinite plan revision, premature completion, context truncation. Here is the catalog, with detection signals, prevention patterns, and recovery moves for each.
By Arjun Raghavan, Security & Systems Lead, BIPI · May 23, 2024 · 8 min read
After 30 plus agent deployments, the failures rhyme. The same eight or nine patterns show up across industries, model families, and frameworks. Knowing the catalog turns a frantic two-day investigation into a 30 minute pattern match. This is the field guide.
Tool parameter hallucination
The agent invents a parameter the tool does not have, or fills a real parameter with a plausible-looking but wrong value (a UUID it made up, an account ID from training data). Detection: tool returns 400-class errors with parameter validation messages. Prevention: tight JSON schema with enum constraints and pattern validation, system prompt reminders that include known IDs the agent may reference. Recovery: do not let the model retry the same call. Reflect the validation error and force a clarifying read first.
Infinite plan revision
The agent rewrites its plan every step, never executing. Often happens after a tool returns an unexpected result or when the system prompt encourages too much reflection. Detection: plan node fires more than three times in a single trace. Prevention: bound plan revisions explicitly in the prompt, require the agent to commit to the next step after each revision. Recovery: cap plan revisions at a hard threshold, force a human escalation past it.
Premature completion
The agent declares success before the actual goal is met, often after a partial tool result that looks complete on its face. Detection: completion claimed but downstream success criteria not satisfied (the order did not actually ship, the ticket was not actually closed). Prevention: require the agent to call a verification tool before declaring done, and have that tool be unforgiving. Recovery: a verifier loop that catches premature completion and reopens the task with the verification failure as context.
Context truncation, silent
The agent has been running long enough that early context (the original goal, key constraints) has been silently dropped. The agent now thinks it is solving a different problem. Detection: behavior diverges from the stated goal in the trace, often subtly. Prevention: explicit goal preservation in the system prompt, summarise-then-pin the original objective, monitor effective context length. Recovery: re-inject the original goal at fixed step intervals, cap step count, fail closed when context pressure crosses a threshold.
Tool spam loops
Same tool, same args, called repeatedly because the agent does not believe the result. Often follows a tool description that hedges (may require retry, sometimes returns stale data). Detection: same tool plus same args twice in a row, or three times within a trace. Prevention: precise tool descriptions, no hedging language, return structured errors not natural language. Recovery: deduplicate at the orchestrator layer, return cached result with a warning to break the loop.
Authority drift
The agent escalates beyond its remit because the model believes it is helping. The customer service agent issues a refund it was not authorised to issue. The procurement agent approves a purchase. Detection: side-effect logs show actions outside the scope established at session start. Prevention: hard-coded authority limits enforced at the tool wrapper, not the prompt. Recovery: every state-changing tool call goes through a wrapper that re-checks authorisation against the session, regardless of what the model decided.
Hallucinated state
The agent acts as if a previous tool call produced a result it actually did not. Often happens when a tool errored out and the failure was summarised back to the model in soft language. Detection: agent references data that does not exist in the trace's tool results. Prevention: structured failure responses, never let a tool error get summarised into something that reads like success. Recovery: hard-fail the trace and surface the contradiction to the user or supervisor.
Detection at the platform layer
Most of these are caught at the orchestrator or telemetry layer, not in the model. We instrument every agent deployment with the following invariants enforced outside the model.
- Step count cap per trace, hard fail past it.
- Tool repetition guard, dedupe identical args within a window.
- Goal pinning, re-inject every N steps.
- Tool wrapper authority check, independent of prompt-stated scope.
- Verifier tool for state-changing tasks, agent must call before completion.
These five guards prevent or detect every failure mode in the catalog. They take about a sprint to implement on top of an existing agent. The first incident they prevent pays for the work.
Recovery is a design choice, not a reaction
When a guard fires, you can fail closed (kill the trace, surface to a human), retry with modified context, or downgrade to a simpler path. The right answer depends on the cost of the failure and the cost of the recovery. Decide ahead of time per task type. Do not let the model decide. The model is the thing that failed.
Read more field notes, explore our services, or get in touch at info@bipi.in. Privacy Policy · Terms.