BIPI

Your SOAR Is Stuck on Enrichment. The High-Leverage Automations Are on the Other Side

Cybersecurity

Most SOAR deployments automate VirusTotal lookups and call it a day. The 10x value is in response actions: auto-isolate, auto-disable, auto-block. Here is how to ship them without scaring leadership into rolling everything back.

By Arjun Raghavan, Security & Systems Lead, BIPI · January 9, 2024 · 7 min read

#soar#automation#incident-response#soc

A logistics company we worked with had Splunk SOAR in production for two years. They had built 47 playbooks. Every single one stopped at enrichment. VirusTotal, AbuseIPDB, internal asset lookup, ServiceNow ticket creation. Not one playbook did anything that changed the state of the environment. Their dwell time on commodity malware was still 6 hours from alert to containment because every action required a human to click.

Why teams stall at enrichment

Enrichment is safe. It calls APIs that return data. Worst case it costs you an API quota. The moment you suggest auto-isolating an endpoint or auto-disabling a user, the conversation gets political. Operations worries about availability. HR worries about wrongful disabling. The CISO worries about a 3am page from the CEO whose laptop got auto-isolated mid-board-deck. So nothing ships.

The pattern that breaks the deadlock is approval-gated automation with fast paths for high-confidence cases. Not all-or-nothing.

The three response actions worth shipping first

Auto-isolate endpoint on confirmed malware (EDR high-confidence detection + matching IOC). Most EDR vendors expose this as a single API call.
Auto-disable user on confirmed credential compromise (impossible travel + suspicious sign-in + risky token). Tied to your IdP via Graph API or Okta API.
Auto-block IP/domain at the perimeter on confirmed C2 (EDR or NDR detection of beaconing). Push to firewall, proxy, and DNS sinkhole.

How to make it palatable

We use a confidence-tiered model. Three lanes:

Lane 1 (auto-execute): Detection has multiple corroborating signals, asset is in a pre-approved scope (no executives, no production servers, no medical devices). Action runs immediately, ticket gets created with 'auto-contained' tag, analyst reviews within 15 minutes.
Lane 2 (one-click approve): Single high-confidence signal. SOAR posts a Slack/Teams message with action button. Tier-2 analyst clicks approve, action fires. SLA is 5 minutes.
Lane 3 (manual): Anything outside the above. Standard playbook, no automation.

For that logistics customer, Lane 1 covered 22% of incidents. Lane 2 covered 48%. Manual dropped from 100% to 30%. Mean time to contain went from 6 hours to 14 minutes for malware cases.

The controls that keep you safe

Automation without guardrails is how you take down production. We always implement these before shipping any auto-action playbook:

Asset-tier exclusion list. Domain controllers, jump hosts, and named executive endpoints are never auto-isolated. They go to Lane 2 or 3.
Rate limiting. No more than 50 endpoints isolated per hour without human approval. Catches false-positive storms.
Reversibility. Every auto-action has a one-click undo button posted to the same Slack channel. If you isolate the wrong host, recovery is 30 seconds.
Audit log to a write-once store. Every automated action gets logged outside the SOAR for incident review.

Measure what changes

Six metrics we track for every SOAR engagement:

70%

Reduction in MTTD on malware after Lane 1 ships

14 min

Median time to contain (was 6 hrs)

0.3%

False positive rate on auto-isolate after tuning

If your SOAR program is stuck, the unblocker is usually political, not technical. Run a tabletop with the operations leadership where you walk through three real incidents and show how long manual response took. Then show what Lane 1 would have done. The conversation shifts when leaders see the dwell-time cost of their caution.

Enrichment automation saves analyst seconds. Response automation saves business hours. The leverage is on the response side.

Pick one playbook. Ship it to Lane 2 (approval-gated) for 30 days. Measure approval rate and false positive rate. If both look clean, promote to Lane 1 for the safe asset tiers. Repeat. Within a quarter you will have shipped meaningful response automation, and the political resistance fades because the data shows it works.

Read more field notes, explore our services, or get in touch at info@bipi.in. Privacy Policy · Terms.