AI Agent Memory Poisoning: How Attackers Corrupt Long-Term Context and What It Means for RAG-Backed Agents
AI Security
Agent memory is the new persistence layer for AI attacks. Whether it is a vector store, a key-value cache, or a structured conversation history, if an agent trusts what it remembers, an attacker who controls what gets written has a persistent foothold. Here is what memory poisoning looks like in practice.
By Arjun Raghavan, Security & Systems Lead, BIPI · July 9, 2025 · 11 min read
The AI safety conversation has spent enormous energy on what goes into the model at training time. Comparatively little attention has been paid to what goes into the agent at runtime — specifically, into the memory systems that agents use to maintain context across sessions, accumulate knowledge, and make decisions. Memory poisoning attacks target this layer, and in 2025 they represent one of the most underdefended surfaces in production AI deployments.
The Anatomy of Agent Memory
Modern AI agents rely on several distinct memory types. In-context memory is the conversation window — temporary and session-scoped. External memory is the persistent layer: vector databases for semantic retrieval, key-value stores for structured facts, conversation histories, and workflow state. Episodic memory captures past interactions and outcomes. Semantic memory stores facts about the world. Procedural memory encodes learned behaviours. An attacker who can write to any of these can influence all future agent behaviour that reads from it.
RAG Poisoning: The Most Common Attack Vector
Retrieval-Augmented Generation is now the standard architecture for agents that need domain knowledge. The agent embeds queries, retrieves semantically similar documents from a vector store, and incorporates them into the prompt. If an attacker can inject documents into that vector store — through a compromised data pipeline, a poisoned external data source, or a direct write operation — those documents will surface as 'relevant context' in future queries.
- Direct store injection: attacker with write access to the vector database inserts documents containing adversarial instructions
- Pipeline poisoning: compromise of the data ingestion pipeline that feeds the vector store, allowing bulk poisoning at scale
- Semantic neighbourhood attack: crafting documents that cluster near high-value query embeddings to ensure they are always retrieved
- Temporal poisoning: injecting documents with false timestamps to make them appear as recent authoritative sources
- Cross-tenant contamination: in multi-tenant RAG systems, exploiting namespace isolation failures to poison another tenant's context
Long-Term Memory Manipulation in Agentic Systems
Some agent frameworks implement explicit memory writing — the agent decides what to commit to long-term memory based on the significance of an interaction. This introduces a meta-attack path: if an attacker can cause the agent to form a false memory during one session, that memory persists and influences all future sessions. The agent does not distinguish between memories formed from legitimate experience and memories formed from adversarial interactions.
Detection Patterns
Memory poisoning is difficult to detect because the injected content often looks legitimate. The most effective detection approaches operate at the retrieval layer: monitor for unusually high retrieval frequency of specific documents, flag documents that contain instruction-like language and are being retrieved in non-instruction contexts, implement cosine similarity thresholds that alert when retrieved content is anomalously close to a query that should not require that content.
- Audit who can write to your vector stores — treat write access as equivalent to code execution access
- Hash and sign documents at ingestion time; verify signatures at retrieval time to detect post-ingestion tampering
- Implement content classifiers that flag documents containing instruction patterns before they enter the vector store
- Monitor retrieval patterns for semantic neighbourhood concentration — legitimate use rarely retrieves the same documents repeatedly
- Implement memory versioning so that poisoned writes can be identified and rolled back
- Run periodic red-team exercises that attempt to poison your production vector stores through available input channels
The Fundamental Fix
Memory poisoning is ultimately an access control problem wearing an AI costume. The fix is the same as for any data integrity problem: control who can write, validate what gets written, sign what you store, and monitor what gets read. The AI-specific layer is the content validation — distinguishing documents that contain instructions from documents that contain information, and applying appropriate scrutiny to each. Teams that treat their vector stores with the same security posture as their databases will be in a far stronger position than those that treat them as a convenient append-only cache.
Read more field notes, explore our services, or get in touch at info@bipi.in. Privacy Policy · Terms.