BIPI
BIPI

Content Discovery That Actually Works: Wordlists, Recursion, 200-OK Soup

Cybersecurity

ffuf and feroxbuster only pay off when you tune wordlists, recursion, and filters. Here is how to find real paths without drowning in 200-OK noise.

By Arjun Raghavan, Security & Systems Lead, BIPI · January 12, 2023 · 9 min read

#bug-bounty#ffuf#content-discovery#wordlists#recursion

Brute forcing paths is not a science fair

Pointing ffuf at a target with raft-large-words and walking away is how you get 40,000 results, zero of them useful. Real content discovery is about picking the right list for the right stack and filtering out the soup the server returns by default.

Pick the wordlist for the stack

  • SecLists Discovery/Web-Content for general purpose
  • raft-medium-directories for first pass, raft-large for deep dives
  • Language specific lists, php.txt, aspnet.txt, jsp.txt when you know the stack
  • API specific lists like api-endpoints.txt for json backends
  • Custom lists built from your own JS file analysis

A 5,000 word list tuned to the stack beats a 200,000 word list every time. It runs faster, hits more real paths, and produces fewer false positives.

Kill the 200-OK soup

  1. Hit a known bogus path first, note the response size and word count
  2. Use -fs or -fw in ffuf to filter that exact size or word count
  3. Filter common status codes when needed, -fc 404,403,401 as appropriate
  4. Sort by size to spot outliers, those are the real paths

Recursion, but carefully

feroxbuster does recursion well. Set a depth limit, usually 2 or 3 is enough. Recursing without limits on a marketing site turns your terminal into a slideshow and finds nothing new. Limit recursion to paths that returned a 200 with interesting size.

Extensions matter

Pass -e .bak,.old,.zip,.tar.gz,.swp,.env on hosts where backups are plausible. On a Java backend, swap to .jsp and .do. On dotnet, .aspx and .asmx. Tailor the extension list to the fingerprint you already gathered.

Where the real wins hide

/admin
still works on tiny apps
/.git/HEAD
the gift that keeps giving
/api/v1
where bugs live
/swagger.json
free roadmap

Mix in passive paths

  • gau and waybackurls give you historical paths the app once exposed
  • katana and gospider crawl current JS for live links
  • Merge both into your wordlist, then dedupe against your active runs
  • Feed paths discovered on one host into the wordlist for sibling hosts
Content discovery is a feedback loop. Every interesting path you find should feed the next run, on the next host, on the next program.

When to stop

Stop when the marginal new path is no longer interesting. If your last 500 results were all marketing pages, your wordlist is wrong, not the host. Switch lists, narrow the focus, or move on. Discovery without a stopping rule is just procrastination.

Read more field notes, explore our services, or get in touch at info@bipi.in. Privacy Policy · Terms.