BIPI

Content Discovery That Actually Works: Wordlists, Recursion, 200-OK Soup

Cybersecurity

ffuf and feroxbuster only pay off when you tune wordlists, recursion, and filters. Here is how to find real paths without drowning in 200-OK noise.

By Arjun Raghavan, Security & Systems Lead, BIPI · January 12, 2023 · 9 min read

#bug-bounty#ffuf#content-discovery#wordlists#recursion

Brute forcing paths is not a science fair

Pointing ffuf at a target with raft-large-words and walking away is how you get 40,000 results, zero of them useful. Real content discovery is about picking the right list for the right stack and filtering out the soup the server returns by default.

Pick the wordlist for the stack

SecLists Discovery/Web-Content for general purpose
raft-medium-directories for first pass, raft-large for deep dives
Language specific lists, php.txt, aspnet.txt, jsp.txt when you know the stack
API specific lists like api-endpoints.txt for json backends
Custom lists built from your own JS file analysis

A 5,000 word list tuned to the stack beats a 200,000 word list every time. It runs faster, hits more real paths, and produces fewer false positives.

Kill the 200-OK soup

Hit a known bogus path first, note the response size and word count
Use -fs or -fw in ffuf to filter that exact size or word count
Filter common status codes when needed, -fc 404,403,401 as appropriate
Sort by size to spot outliers, those are the real paths

Recursion, but carefully

feroxbuster does recursion well. Set a depth limit, usually 2 or 3 is enough. Recursing without limits on a marketing site turns your terminal into a slideshow and finds nothing new. Limit recursion to paths that returned a 200 with interesting size.

Extensions matter

Pass -e .bak,.old,.zip,.tar.gz,.swp,.env on hosts where backups are plausible. On a Java backend, swap to .jsp and .do. On dotnet, .aspx and .asmx. Tailor the extension list to the fingerprint you already gathered.

Where the real wins hide

/admin

still works on tiny apps

/.git/HEAD

the gift that keeps giving

/api/v1

where bugs live

/swagger.json

free roadmap

Mix in passive paths

gau and waybackurls give you historical paths the app once exposed
katana and gospider crawl current JS for live links
Merge both into your wordlist, then dedupe against your active runs
Feed paths discovered on one host into the wordlist for sibling hosts

Content discovery is a feedback loop. Every interesting path you find should feed the next run, on the next host, on the next program.

When to stop

Stop when the marginal new path is no longer interesting. If your last 500 results were all marketing pages, your wordlist is wrong, not the host. Switch lists, narrow the focus, or move on. Discovery without a stopping rule is just procrastination.

Read more field notes, explore our services, or get in touch at info@bipi.in. Privacy Policy · Terms.