BIPI
BIPI

Agent Memory Without the Bloat: Episodic, Semantic, Procedural

Agentic AI

Most agent memory architectures collapse into one giant vector store and call it good. We separate episodic, semantic, and procedural memory and show how each maps to a different storage tier.

By Arjun Raghavan, Security & Systems Lead, BIPI · May 8, 2024 · 8 min read

#ai-agents#memory#architecture

A SaaS team called us six months into building their assistant. The vector store had grown to 92 million chunks. Retrieval latency had climbed past two seconds. Quality of recall had dropped. Their bill from the vector vendor exceeded their LLM bill. The fix was not a bigger index. It was sorting what they were storing.

Most production agent memory problems start the same way. Engineers reach for a vector database, stuff everything in, and watch quality degrade as the index grows. The fix begins with naming what kind of memory you are storing.

Three memory types, three storage choices

Borrowing from cognitive science, separate memory into episodic (what happened in this session or thread), semantic (facts about the user, the world, the org), and procedural (how to do things, learned patterns, skills).

  • Episodic: short-lived, conversation-scoped, ordered. Best stored in a SQL or document store keyed by thread ID. Vector search is overkill.
  • Semantic: facts that need fuzzy retrieval and cross-session recall. This is where vector stores earn their keep, but only with strict ingest discipline.
  • Procedural: workflows, tool sequences, error recovery patterns. Best stored as structured records keyed by task type, retrievable by symbolic match first, vectors as fallback.

When we split the SaaS team's memory across these three tiers, the vector index dropped from 92M to 4M entries. Retrieval latency went under 200ms. Recall quality went up because the noise was gone.

What belongs in episodic memory

Per-thread state: the user's current intent, recent tool results, partial plans. Time-ordered access patterns. You almost always want the last five things, not the most semantically similar of all things. Postgres or DynamoDB is the right answer here, not a vector store. Index by thread ID and timestamp. Set a TTL.

We see teams shove every assistant turn into the vector index because the framework defaults that way. A six-message conversation does not need similarity search. It needs an array.

What belongs in semantic memory

User preferences, organisational facts, learned entities, summarised history that should outlast a single thread. This is the only tier where a vector store is the right primary tool, and even here the discipline that matters is what you do not store.

Our heuristic on every engagement: do not write to semantic memory from a single observation. Require either an explicit user signal (they confirmed it), a derived fact with a confidence threshold, or a summarisation pass that aggregates across N episodes. Without this, the index fills with noise within weeks.

Procedural memory as a competitive edge

This is the tier teams skip. Procedural memory captures how the agent solved similar tasks in the past, ideally with success metrics. When a new task arrives, look up by task signature first. If a known good plan exists, hand it to the model as a hint. Cheaper, faster, more reliable than re-deriving from scratch.

On a logistics engagement, adding procedural memory for the top 40 task templates cut average plan length by 35 percent and reduced tool calls per task by half. The agent learned, in a structured way, that we got to control.

92M to 4M
Vector index size after sorting memory tiers
35%
Reduction in plan length with procedural memory
Sub-200ms
Retrieval latency after splitting tiers

Where to start if you are inheriting a mess

  1. Audit what is in your vector store. Categorise the top 10 sources by row count.
  2. Move conversation-scoped data out. SQL or KV, with a TTL.
  3. Define one clear ingest rule for semantic memory and reject the rest.
  4. Add a procedural memory table keyed by task signature, even if you start with five entries.
  5. Track recall precision per tier. If a tier cannot show value, delete it.

Memory is the one part of your agent stack that compounds, for good or ill. Build it deliberately or it will become the slowest, most expensive, least useful part of your system within a year.

Read more field notes, explore our services, or get in touch at info@bipi.in. Privacy Policy · Terms.