BIPI
BIPI

Testing AI Features for Bug Bounty: Chatbots, RAG, and Tool Use

AI Security

AI features are a new attack surface and most programs are still figuring out scope. Here is the practical hunter's guide to chatbots, retrieval-augmented generation, and tool-use agents, with the OWASP LLM Top 10 patterns that map to real payouts.

By Arjun Raghavan, Security & Systems Lead, BIPI · June 20, 2023 · 11 min read

#bug-bounty#ai-security#llm#rag#tool-use#owasp-llm

Three classes of AI feature, three threat models

  • Chatbot, user input plus system prompt plus model, output back to user
  • RAG, user input plus retrieved documents plus system prompt, output back to user
  • Tool use, model can call functions that read or write real data

Each class has its own bug patterns. Treat them differently in your hunting notes.

Chatbot bug patterns

  1. System prompt extraction, recover the hidden instructions
  2. Jailbreak that produces policy-violating output the brand will not want screenshotted
  3. PII leakage from training data or session state
  4. Cross-tenant leakage if the chatbot is multi-tenant

RAG bug patterns

RAG retrieves documents and stuffs them into the prompt. The documents themselves can carry instructions. This is indirect prompt injection and it is the single richest class of AI bug for bounty hunters today.

  • Inject instructions into a document the model will retrieve, change the model's behaviour for the next user
  • Cross-tenant document leakage, ask the model to summarise documents it should not have access to
  • Prompt-poisoned answer that exfiltrates session data via a markdown image to attacker.com

Tool use bug patterns

When the model can call functions, prompt injection becomes remote action. Look at every tool the agent has.

  • Read tools that can be tricked into reading other users' data, classic IDOR via natural language
  • Write tools that can be tricked into sending email, posting to chat, or updating records
  • Code execution tools, the holy grail, ranges from leaking environment to full RCE
  • Browser tools, SSRF via natural-language fetch instructions

OWASP LLM Top 10, mapped to payouts

  • LLM01 Prompt Injection, high value when chained to tool use
  • LLM02 Insecure Output Handling, XSS or SSRF via model output
  • LLM06 Sensitive Information Disclosure, system prompt or training data
  • LLM07 Insecure Plugin Design, the tool use IDORs
  • LLM08 Excessive Agency, when the agent can act outside user scope
  • LLM09 Overreliance, less a bug, more a design concern

Reporting AI bugs

Programs are still calibrating. Lead with concrete impact, not theoretical risk. A jailbreak that gets the chatbot to recommend a competitor is not a bug. A jailbreak that exfiltrates another user's chat history is.

The PoC for an AI bug

  1. Exact prompt or document content
  2. Exact model output, screenshotted and copied as text
  3. What the attacker gains, named in money, data, or access
  4. Reproducibility note, models vary, run the test five times and report the success rate

Scope traps to read carefully

  • Some programs exclude prompt injection unless it has secondary impact
  • Hallucination is usually out of scope
  • Bias and policy violations are usually out of scope unless tied to data leak
  • Tool-use abuse is almost always in scope, focus there
The bugs that pay in AI features are the ones where the model touched real data or took real action. Hunt where the model has hands, not where it has opinions.

Read more field notes, explore our services, or get in touch at info@bipi.in. Privacy Policy · Terms.