BIPI
BIPI

Threat Hunting Hypothesis-Driven: A Repeatable Hunt Loop

Threat Intelligence

Hunting is not browsing dashboards until something looks weird. A repeatable hunt is hypothesis driven, time boxed, documented, and either becomes a detection or gets retired. Here is the loop that turns hunting from art to engineering.

By Arjun Raghavan, Security & Systems Lead, BIPI · September 14, 2023 · 10 min read

#threat-hunting#mitre-att&ck#soc#detection-engineering#velociraptor

Threat hunting as practiced in many SOCs is indistinguishable from analyst curiosity. Someone has a quiet afternoon, opens the SIEM, runs queries based on intuition, finds nothing, and closes the window. There is no record, no hypothesis, no measurement, and no path from hunt to detection. The team has spent four hours and produced no asset.

The Hunt Has a Shape

A repeatable hunt has six stages. Hypothesis, data scoping, query development, execution, finding triage, and outcome. Each stage produces an artifact. The artifacts together form a hunt report that is reviewed, archived, and feeds either a new detection rule or a documented coverage gap.

  1. Hypothesis: a testable statement like an adversary using T1078.004 valid cloud accounts would log in from a country we do not operate in within 30 minutes of a successful MFA challenge from a normal country
  2. Data scoping: identify which log sources contain the evidence, time range, and entities to scope the query against
  3. Query development: write the SPL, KQL, or YARA-L that operationalizes the hypothesis
  4. Execution: run the query, capture results, sample for review
  5. Finding triage: false positive, true positive, inconclusive, with reasoning recorded for each
  6. Outcome: promote to detection, document as coverage gap, or retire the hypothesis

Where Hypotheses Come From

Good hypotheses come from three sources. Threat intelligence reports describing a TTP used against a peer organization. Internal incident postmortems revealing techniques that were caught late. ATT&CK coverage gaps where telemetry exists but no rule is deployed. Bad hypotheses come from a single anomalous event in yesterday's dashboard.

Tools for the Stages

  • Hypothesis tracking: a simple Markdown file per hunt in a Git repo, or Jira tickets with a hunt template
  • Query development: the SIEM's own search interface, with queries saved to the hunt artifact
  • Endpoint forensics on demand: Velociraptor or KAPE for collecting artifacts the SIEM does not have
  • Pivoting and timeline: Timesketch or the SIEM's own timeline view for multi event reconstruction
  • Documentation: ATT&CK technique IDs in every hunt for cross referencing

An Example Hunt

Hypothesis: an adversary using T1003.001 LSASS memory dumping would create a minidump file with a non standard extension on an endpoint where the parent process is not Task Manager. Data scope: Sysmon event ID 11 file create events over 14 days, joined to Sysmon event ID 10 process access events targeting lsass.exe. Query: in KQL, DeviceFileEvents where FileName matches dmp or has non standard extension and not parented by taskmgr.exe, joined to DeviceEvents on DeviceId within five minutes. Execution returned 47 hits across 12 endpoints, 44 were a backup agent legitimately accessing LSASS for credential vaulting, 3 were a legitimate forensic tool run by IR. Outcome: promote to detection with the backup agent exclusion documented, archive the forensic tool finding as expected.

The Hunt to Detection Bridge

When a hunt finds something real, the next step is not a Slack message to the team. It is a Sigma rule with the hunt's query as the basis, the hunt's exclusions baked in, and the hunt's findings as the test fixtures. The hunt artifact becomes the test corpus, which means the rule's intent is preserved in code rather than in the analyst's head.

Measuring Hunting

  • Hunts per quarter: a healthy team runs 8 to 15 hypothesis driven hunts per quarter, not 200
  • Hunt to detection conversion: 30 to 50 percent of hunts should produce a new rule or tune an existing one
  • Gaps documented: hunts that reveal telemetry gaps are as valuable as those that produce rules
  • Time to hypothesis: from intelligence input to first query should be under four hours for known TTPs
A hunt that produces no artifact is not a hunt, it is recreation. Every hypothesis must end in a rule, a documented gap, or a retired hypothesis with reasoning.

Common Failure Modes

The most common failure is the eternal hunt, where a curious analyst keeps digging because the data is interesting rather than because the hypothesis is testable. The second is the unfalsifiable hunt, where the hypothesis is so vague that no result could disprove it. The third is the unowned hunt, where five people contribute queries and nobody is responsible for the outcome. Time boxing, hypothesis writing standards, and assigned ownership solve all three.

Read more field notes, explore our services, or get in touch at info@bipi.in. Privacy Policy · Terms.