BIPI
BIPI

Indirect Prompt Injection via Documents and Web Pages: The Attack Surface

AI Security

When the model reads a document or a web page, the author of that content becomes a co prompter. We map the indirect prompt injection attack surface across email, PDFs, knowledge bases, and browsing tools, and the controls that contain it.

By Arjun Raghavan, Security & Systems Lead, BIPI · July 17, 2023 · 10 min read

#indirect-injection#rag#browsing#llm#defense

Direct prompt injection requires the attacker to talk to the model. Indirect prompt injection only requires the attacker to write something the model will read later. That is a much bigger surface.

The expanded threat model

  • Email bodies fetched by an assistant
  • PDFs uploaded for summarization
  • Web pages browsed by an agent
  • Knowledge base articles retrieved for RAG
  • Calendar entries, ticket comments, chat messages

What the payload looks like

A few sentences hidden in white text on a white background, a comment buried in a PDF, or a meta tag in an HTML page that instructs the model to exfiltrate the user's last message to an attacker controlled URL. The model reads everything, including the parts the user cannot see.

Provenance is the first control

Every piece of context entering the prompt should be tagged with its source identity, sensitivity, and trust level. Internal HR doc, external email from unknown sender, and search result from a third party site are three different trust tiers, and the prompt should treat them differently.

Capability scoping by source

An external email can be summarized. It cannot trigger a tool call. A knowledge base article can answer a question. It cannot change the agent's instructions. Encode these limits in the policy layer, not in the prompt.

Rendering tricks attackers use

  1. Zero width characters and homoglyphs
  2. White on white text and tiny font sizes
  3. Comments in HTML and metadata in PDFs
  4. Translated instructions in low resource languages
  5. Base64 or other encodings the model will helpfully decode

Normalization before retrieval

Strip invisible characters, normalize Unicode, remove comments and metadata, and render documents to a canonical text form before chunking. The attacker's payload still gets through if the content is intentional, but the cheap tricks stop working.

Egress controls

If the agent can fetch URLs, restrict outbound destinations with an allow list. A common indirect injection payload is fetch this URL with the user's session token. An egress proxy that only allows known good domains kills that class outright.

Every document is a potential prompt. Treat document handling as input validation, not as content ingestion.

Detection signals

Watch for context strings that contain instruction style verbs, for tool calls that follow shortly after ingesting external content, and for output that references domains or actions not present in the user request. None are perfect, all are useful.

Closing

Indirect prompt injection is the dominant attack class against agentic systems. Provenance, capability scoping, normalization, and egress controls together raise the bar high enough that most attempts fail quietly.

Read more field notes, explore our services, or get in touch at info@bipi.in. Privacy Policy · Terms.