Detection Engineering as Code: Sigma, YARA-L, and Test-Driven Detections
Cybersecurity
Detections are software. They deserve version control, code review, unit tests, and a CI pipeline. We walk through a test-driven workflow that uses Sigma as the portable rule language, YARA-L for Chronicle, and Atomic Red Team for telemetry generation.
By Arjun Raghavan, Security & Systems Lead, BIPI · September 2, 2023 · 11 min read
Most SOCs still treat detection rules as configuration. A rule is clicked together in the Splunk or Sentinel console, saved, and immediately deployed to production. There is no diff, no peer review, no test, and no rollback path when it fires 4,000 times overnight. Detection engineering as code rejects that model and treats every rule the same way a developer treats application code.
Why Detections Belong in Git
When a rule lives in a Git repository, four things become possible. Reviewers can read a diff before deployment. CI can run the rule against a corpus of known-good and known-bad logs. The rule has an immutable history, including who changed the regex on T1059.001 PowerShell encoded commands and why. And when an analyst writes a postmortem saying a detection misfired, the team can git blame to find the exact commit that introduced the noise.
- Every rule file should carry metadata: author, MITRE technique IDs, data source, false positive rate, last reviewed date
- Pull requests must include a test event that the rule matches and a test event the rule should not match
- CI should reject rules without test coverage, the same way you reject untested application code
- Production deployment is a merge to main, not a click in the SIEM UI
Sigma as the Portable Source of Truth
Sigma is the closest thing the industry has to a vendor neutral detection language. A Sigma rule expresses logsource and detection logic in YAML, and tools like sigmac and uncoder.io convert that YAML into Splunk SPL, Microsoft Sentinel KQL, Elastic EQL, Chronicle YARA-L, or Wazuh rules. The portability matters because most enterprises run more than one SIEM, and rewriting every rule by hand when you adopt a new platform is how detection coverage rots.
A simple Sigma rule for suspicious PowerShell looks like this in plain prose: logsource is windows process creation, detection requires Image ending in powershell.exe and CommandLine containing -enc or -EncodedCommand, condition is selection. That single YAML file compiles to a working KQL query against DeviceProcessEvents in Sentinel and a working YARA-L rule against udm.principal.process in Chronicle.
The Test-Driven Loop
- Write the test first: capture a malicious event using Atomic Red Team test T1059.001-1, store the raw log as a fixture
- Capture a benign event: an admin running a legitimate encoded command, store as a negative fixture
- Write the Sigma rule that matches the malicious fixture and not the benign one
- Run sigmac in CI, generate the SIEM specific query, execute it against both fixtures
- Merge only when the rule passes both assertions
YARA-L Specifics for Chronicle Shops
Chronicle uses YARA-L 2.0, which is more expressive than Sigma but also more verbose. The conversion from Sigma to YARA-L is mechanical for simple selection logic but breaks down for stateful detections involving event correlation across time windows. For those, write the YARA-L by hand and keep the Sigma rule as documentation. The match section, events block, and condition block in YARA-L map cleanly to the multi event hunt patterns analysts already think in.
Coverage Tracking Without the Spreadsheet
Once rules live in Git and carry MITRE technique metadata in their frontmatter, a small script can read every rule file and emit a coverage report keyed by technique ID. Pipe that into the MITRE ATT&CK Navigator layer format and you get a live heatmap that updates on every commit. No more quarterly coverage spreadsheets that are wrong the moment they are circulated.
Common Failure Modes
- Rules without negative test cases: they match the attack and also match every admin running a scheduled task
- Sigma rules that use vendor specific field names in the detection block, breaking portability
- Detections written against parsed fields without checking that the parser is deployed in production
- No ownership: a rule fires 200 times a day and nobody knows whose pager it should reach
Rollout Discipline
New rules go to a shadow mode for at least seven days. Shadow mode means the rule runs and writes its hits to a separate index but does not create an analyst alert. After seven days you read the hit volume, sample a dozen events, and either tune, promote, or kill the rule. Detections that never leave shadow mode are still better than detections that wake an analyst every twelve minutes.
A detection without a test is a hope, not a control. Treat every Sigma file the same way you treat a function in production code: reviewed, tested, owned, and revertible.
Where This Pays Off
Six months into a detection as code program, teams typically see a 40 to 60 percent reduction in alert volume, faster onboarding for new analysts who can read the rule history to understand intent, and dramatically faster recovery from SIEM migrations. The coverage map becomes a credible artifact for leadership and auditors rather than a fiction. Most importantly, the team stops being afraid to delete bad rules, because the rule history is preserved and the test fixtures remain.
Read more field notes, explore our services, or get in touch at info@bipi.in. Privacy Policy · Terms.