RAG Data Leakage Between Tenants Is the Multi-Tenant LLM Bug Nobody Tests For.
AI Security
Build a RAG system on a shared vector index without rigid tenant filtering and you have shipped a cross-tenant data leak. The bug is silent, the test coverage is usually missing, and we have found it on most multi-tenant LLM platforms we audit.
By Arjun Raghavan, Security & Systems Lead, BIPI · March 6, 2026 · 7 min read
Multi-tenant SaaS that adds an LLM-based search or assistant feature usually does it the same way: chunk all customer documents, embed them, write to a shared vector database, query at runtime with tenant filter. The architecture is correct. The implementation usually has a leak.
We audited 9 such systems in the last year. 7 of them had at least one path where a query from tenant A could retrieve content from tenant B. Most of them had no test that would have caught it.
How the leak happens
The intended flow: store {tenant_id, document_id, chunk_text, embedding} in the vector DB. Queries include a metadata filter where tenant_id = X. The vector search returns only chunks owned by tenant X. The LLM answers using only those chunks.
Common failure modes that break tenant isolation:
- The metadata filter is set in the query function, but a bug or refactor lets a different code path skip it. We have seen this when the developer added a 'global search' debug feature and left it accessible.
- The filter is applied post-hoc: top-k retrieved without a filter, then filtered. If k=10 and the top 10 are all from another tenant, the filtered result is empty but the leak happened (the chunks were transmitted to the application layer).
- The vector DB allows the filter but does not enforce it as a security boundary. Some vector DBs treat metadata filters as performance hints; if the index is rebuilding, the filter may be skipped.
- Embedding similarity itself can be informative: an attacker submits a query and infers from latency or response patterns whether content matching their query exists in any tenant.
- Inverted index leakage on text-based search alongside vector search. The team filters the vector path but forgets the BM25 path.
Why tests miss it
Standard test cases set up tenant A with documents and verify retrieval works. They rarely set up tenant A and tenant B simultaneously and verify cross-tenant isolation under every query path. The tests pass because they only exercise one tenant at a time.
The test pattern that catches this: every retrieval test creates two tenants with overlapping content, queries from one, asserts no chunks from the other appear in the result. Adversarial: query terms specifically tuned to match the other tenant's content.
If your RAG tests do not include cross-tenant assertions on every query path, you have not tested the security property that matters.
What strong isolation looks like
- Per-tenant vector namespaces (Pinecone supports this, Weaviate via classes, Qdrant via collections). Cross-namespace queries are physically not possible, not just logically filtered.
- If you must use a shared index, treat the metadata filter as a security control: enforced at the vector DB level, audited on every code path, never optional.
- All retrieval calls go through one client wrapper that injects tenant filter from session context. Direct vector DB access from application code is forbidden.
- Test suite includes adversarial cross-tenant queries on every retrieval path. Static analysis flags any vector DB call without tenant context.
- Embedding model fine-tuning, if you do it, never trains on cross-tenant data. The model itself can leak training content via inversion attacks.
The harder question: prompt injection through retrieved content
Even with perfect tenant isolation on retrieval, an attacker who plants prompt-injection content in their own documents can manipulate the LLM into revealing internal system context (system prompt, instructions, sometimes other users' data within the same tenant). Tenant isolation does not solve this. It does not make it worse, but it does not solve it.
Mitigations are at the LLM-call layer: structured output formats, content-safety filters on retrieved chunks, separation of system and retrieved content via clear delimiters that the model is trained to respect. None are perfect. All are required.
Closing
RAG cross-tenant leakage is one of those bugs that is invisible until it is not. By the time a customer reports seeing another customer's content in an LLM response, the breach is already a public incident. The defensive test is cheap to add and catches the bug class. Add it before you find out the hard way.
Read more field notes, explore our services, or get in touch at info@bipi.in. Privacy Policy · Terms.