BIPI

Prompt Injection in Production: Real Attack Patterns Against Claude and GPT-Based Agents

AI Security

Prompt injection is no longer a research curiosity. In 2025 it is the most-exploited vulnerability class in deployed AI agents. These are the attack patterns we see consistently in the wild — and what they reveal about the gap between demo safety and production security.

By Arjun Raghavan, Security & Systems Lead, BIPI · July 7, 2025 · 10 min read

#prompt-injection#ai-agents#llm-security#claude#gpt#production-security

Prompt injection was first described as a theoretical vulnerability in 2022. By 2025 it is the leading vulnerability class in deployed AI agent systems — ahead of insecure tool permissions, ahead of data leakage, ahead of authentication failures. The reason is structural: AI agents are designed to follow instructions, and prompt injection exploits exactly that property. You cannot patch it out without changing what makes agents useful.

Direct vs. Indirect Injection: The Field Split

Direct prompt injection is what most people picture: a user crafts a malicious input that overrides the system prompt. It is the easier class to defend against because you control the input channel. Indirect prompt injection is the more dangerous variant: the attack payload is embedded in content that the agent retrieves from the environment — a web page, a document, an email, a database row. The agent processes it as data but it is interpreted as an instruction.

Attack Patterns Observed in Production (2025)

Hidden instruction injection: white-on-white text or zero-width characters in documents retrieved by browsing agents, containing instruction overrides
Markdown escape: closing a code block or table in the injected content so the parser treats subsequent text as running prose instructions
Persona override: injected text claiming to be a system message or claiming the previous instructions have been superseded
Tool parameter injection: embedding instruction text in fields that get passed as tool call parameters, exploiting agents that do not sanitise interpolated values
Memory write injection: crafting input that causes the agent to write adversarial content to its own long-term memory store
Exfiltration via metadata: encoding stolen data in image alt text, URL parameters, or webhook payloads that the agent generates as part of normal operation

Why Claude and GPT Agents Respond Differently

Claude's Constitutional AI training makes it more resistant to direct jailbreaks but does not fundamentally solve indirect injection — if the retrieved content plausibly looks like an operator instruction, Claude will often comply. GPT-4 class models show different patterns: stronger resistance to some persona override attacks but higher susceptibility to tool parameter injection when system prompts do not explicitly constrain tool call construction. Neither is 'more secure' — they have different vulnerability profiles that require different defences.

Measuring Your Exposure

Audit every data source your agent retrieves content from — treat each as a potential injection vector
Test your agent against the OWASP LLM Top 10 prompt injection test suite for your specific use case
Measure the agent's instruction-following fidelity: does it distinguish between operator instructions and data-layer content?
Check whether tool calls can be triggered by content in retrieved documents with no human approval step
Test exfiltration paths: can an injected instruction cause the agent to send data to an external endpoint?
Run injection tests at the parameter level, not just the message level — many defences miss this vector

Effective Mitigations in Production

No single mitigation eliminates prompt injection risk. The most effective production posture combines multiple layers: strict privilege separation so that retrieved content cannot trigger write operations, explicit tool call approval for high-impact actions, output filtering that inspects agent responses for signs of injection (unusual API calls, unexpected data in payloads), and continuous red-teaming integrated into the deployment pipeline. Organisations that treat prompt injection as a one-time fix will be breached. Those that treat it as an ongoing operational discipline will not.

most-exploited vulnerability class in AI agent deployments per OWASP LLM Top 10 2025 update

76%

of indirect injection attacks succeeded in agents without explicit data/instruction separation

4 min

median time for a skilled attacker to find a working injection path in an unprotected agent

The Uncomfortable Truth

Prompt injection is not going to be solved at the model level in the near term. The property that makes LLMs powerful — their ability to follow nuanced natural-language instructions — is the same property that makes them vulnerable to injection. The defence has to be architectural, and it has to be operational. Teams that accept this and build accordingly will be in a fundamentally different security posture than those waiting for model providers to patch the problem away.

Read more field notes, explore our services, or get in touch at info@bipi.in. Privacy Policy · Terms.