BIPI
BIPI

Tuning Down Alert Noise: The 80/20 of False Positive Reduction

Cybersecurity

Most SOCs do not have a detection problem, they have a tuning problem. A small number of rules generate the majority of false positives, and a small number of well placed exclusions cut analyst workload in half. Here is the practical playbook.

By Arjun Raghavan, Security & Systems Lead, BIPI · September 5, 2023 · 9 min read

#soc-operations#alert-tuning#false-positives#siem#detection-engineering

Walk into any mid sized SOC and pull the alert volume by rule for the last 30 days. Sort descending. You will find that roughly 20 percent of the rules generate 80 percent of the alerts, and within that 20 percent, a handful of rules account for the analyst burnout problem the CISO keeps hearing about.

Start With the Histogram

Before you tune anything, build a rule volume histogram for the last 30 and 90 days. In Splunk this is a one line search: index=notable | stats count by rule_name | sort -count. In Sentinel it is a KQL query against SecurityAlert summarized by AlertName. The shape of that histogram tells you where to spend the next two weeks.

  • Top 5 rules: usually account for 60 to 75 percent of total alert volume
  • Top 20 rules: typically 85 to 95 percent of volume
  • The long tail: hundreds of rules that fire once a month and probably need either deletion or a hunt promotion

Triage Each High Volume Rule

For every rule in the top 20, ask four questions. What is the true positive rate over the last 90 days. What benign source generates most of the noise. Is the rule still mapped to a threat that matters. Who owns this rule. If you cannot answer any of these in five minutes, the rule is unowned and you are about to inherit it.

Exclusion Patterns That Actually Work

  1. Allowlist by parent process and signing certificate together, never by parent process alone
  2. Exclude scheduled task creators only when the task name matches a known good pattern and the creator is SYSTEM or a specific service account
  3. For PowerShell encoded command alerts, exclude command lines that resolve to known SCCM, Intune, or backup software hashes
  4. Never exclude by user account alone, because credential theft makes that exclusion the attacker's gift

Aggregate Before You Alert

A single failed login from a service account is not interesting. Forty failed logins from that service account in five minutes is. Many rules that fire constantly are missing an aggregation window. In Sigma you can express this with timeframe and count conditions. In KQL it is a summarize over a bin of time. The change from per event to per window typically cuts alert volume by 70 percent on rules covering T1110 brute force and T1078 valid accounts.

Risk Scoring Instead of Binary Alerts

Microsoft Sentinel Fusion, Elastic Detection Rules with risk scoring, and Splunk Enterprise Security's risk based alerting all express the same idea. A single weak signal does not page anyone, but ten weak signals on the same entity within an hour do. Moving rules from alert mode to risk contribution mode lets you keep coverage without paying for it in analyst attention.

Kill the Zombies

  • Any rule that has not fired in 180 days is either broken or covering a threat that no longer exists, investigate before keeping
  • Any rule with 99 percent false positive rate over 90 days is doing harm, not coverage
  • Any rule owned by someone who left the company two years ago is unowned
  • Any rule that duplicates another rule with slightly different logic is technical debt
Cutting alert volume by half rarely requires new tooling. It requires reading the histogram, owning the top 20 rules, and accepting that some detections are doing harm.

Measuring the Result

Track three metrics weekly: alert volume per analyst per shift, true positive rate by rule, and mean time from alert to disposition. A successful tuning program drops volume by 40 to 60 percent in the first 30 days without lowering true positive count. If TP count drops alongside volume, you cut too deep and the exclusions need review.

What Not to Tune Away

Some detections look noisy but are doing critical work. Authentication anomalies, lateral movement signals like T1021 remote services, and credential access patterns like T1003 LSASS access should never be tuned to zero. If those rules are loud, the answer is risk based aggregation, not exclusion. Tune the delivery, not the visibility.

Read more field notes, explore our services, or get in touch at info@bipi.in. Privacy Policy · Terms.