Content Discovery That Actually Works: Wordlists, Recursion, 200-OK Soup
Cybersecurity
ffuf and feroxbuster only pay off when you tune wordlists, recursion, and filters. Here is how to find real paths without drowning in 200-OK noise.
By Arjun Raghavan, Security & Systems Lead, BIPI · January 12, 2023 · 9 min read
Brute forcing paths is not a science fair
Pointing ffuf at a target with raft-large-words and walking away is how you get 40,000 results, zero of them useful. Real content discovery is about picking the right list for the right stack and filtering out the soup the server returns by default.
Pick the wordlist for the stack
- SecLists Discovery/Web-Content for general purpose
- raft-medium-directories for first pass, raft-large for deep dives
- Language specific lists, php.txt, aspnet.txt, jsp.txt when you know the stack
- API specific lists like api-endpoints.txt for json backends
- Custom lists built from your own JS file analysis
A 5,000 word list tuned to the stack beats a 200,000 word list every time. It runs faster, hits more real paths, and produces fewer false positives.
Kill the 200-OK soup
- Hit a known bogus path first, note the response size and word count
- Use -fs or -fw in ffuf to filter that exact size or word count
- Filter common status codes when needed, -fc 404,403,401 as appropriate
- Sort by size to spot outliers, those are the real paths
Recursion, but carefully
feroxbuster does recursion well. Set a depth limit, usually 2 or 3 is enough. Recursing without limits on a marketing site turns your terminal into a slideshow and finds nothing new. Limit recursion to paths that returned a 200 with interesting size.
Extensions matter
Pass -e .bak,.old,.zip,.tar.gz,.swp,.env on hosts where backups are plausible. On a Java backend, swap to .jsp and .do. On dotnet, .aspx and .asmx. Tailor the extension list to the fingerprint you already gathered.
Where the real wins hide
Mix in passive paths
- gau and waybackurls give you historical paths the app once exposed
- katana and gospider crawl current JS for live links
- Merge both into your wordlist, then dedupe against your active runs
- Feed paths discovered on one host into the wordlist for sibling hosts
Content discovery is a feedback loop. Every interesting path you find should feed the next run, on the next host, on the next program.
When to stop
Stop when the marginal new path is no longer interesting. If your last 500 results were all marketing pages, your wordlist is wrong, not the host. Switch lists, narrow the focus, or move on. Discovery without a stopping rule is just procrastination.
Read more field notes, explore our services, or get in touch at info@bipi.in. Privacy Policy · Terms.