The Postmortem Template That Actually Produces Action. And the Process Around It
Cybersecurity
Most incident postmortems become Confluence pages nobody revisits. The ones that change behavior have specific structure, owner accountability, and a direct line into the detection engineering backlog.
By Arjun Raghavan, Security & Systems Lead, BIPI · January 30, 2024 · 7 min read
I reviewed a year of postmortems for a tech company last quarter. They had 38 documents from real incidents. 38 had timelines. 36 had root cause sections. 31 had action items. 12 had action item owners. 4 had action items that were marked complete with dates. Of those 4, two were 'updated the runbook'. The other two had genuinely changed behavior. The other 34 documents were institutional theater. The team felt good writing them. Nothing changed.
What separates useful from theater
Postmortems that produce action share five traits:
- Specific named owners on every action item, not 'the team' or 'security'
- Action items tied to existing backlogs (Jira, Linear, detection engineering) so they cannot be lost
- A scheduled review cadence (30, 60, 90 days) where progress gets reported back to the same group that did the postmortem
- Distinction between contributing factors (process, training, tooling) and the immediate technical cause
- A 'what we got right' section, because reinforcing successful patterns is as valuable as fixing failures
The template we use
Eight sections, in this order:
- Summary (3-4 sentences). What happened, when, who detected, what was the impact. Anyone reading this should understand the incident in 30 seconds.
- Timeline. Every relevant event with timestamp in UTC. Detection, escalation, containment, eradication, recovery. Annotate where decisions were made and why.
- Impact. Quantified: how many users, what data, how long, what financial cost, what regulatory implications.
- What went well. Specific things the response did right. Naming detection that fired, decisions that were correct, automation that helped.
- What did not go well. Honest assessment. Detection delays, communication gaps, decision points where information was missing.
- Root cause analysis. Use 5-whys or fishbone. Distinguish proximate cause (the technical failure) from contributing factors (process, training, design).
- Action items. Each with owner, target date, success criteria, linked backlog ticket, and category (preventive, detective, responsive).
- Lessons. One paragraph distilled into something that survives the document. Goes into the org's collective lessons learned database.
Who writes, who reviews
The lead incident commander writes the first draft within 5 business days of resolution. Memory fades fast. Beyond two weeks the timeline reconstruction becomes guesswork.
Review process has three stages:
- Technical review: senior analyst or detection engineer fact-checks the timeline and root cause. 2 days.
- Stakeholder review: anyone named in the timeline reads it for accuracy. 3 days.
- Leadership review: CISO or designate signs off on action items and resourcing. 1 week.
The detection engineering pipeline
The single highest-leverage habit: every postmortem produces at least one detection engineering ticket. New rule, tuning to existing rule, new playbook, new enrichment, dashboard panel. The connection is mandatory. If a postmortem produces zero detection improvements, the question is: did we genuinely understand what we missed?
Tickets go into a labeled backlog (we tag them 'postmortem-derived' in Jira). The detection engineering team has SLAs based on incident severity:
- P1 incident: detection improvements ship within 14 days
- P2: ship within 30 days
- P3: ship within 60 days, batched with other improvements
Blameless does not mean consequence-free
Blameless postmortems mean we do not punish individuals for systemic failures. It does not mean we ignore patterns. If the same person makes the same kind of mistake across three incidents, that is a coaching conversation, not a postmortem topic. Keep the postmortem focused on systems and the management conversation focused on individuals. Conflating them either chills honest reporting or lets patterns slide.
Make it findable
The lessons section gets indexed. We use a simple lessons-learned wiki tagged by incident type (phishing, ransomware, insider, cloud misconfiguration, supply chain). When a new incident starts, the IC searches the wiki for similar incidents in 60 seconds and pulls relevant runbooks and prior decisions. This compounds over years. The 50th phishing incident benefits from lessons of the previous 49.
A postmortem that does not produce a backlog ticket and a calendar reminder is a journal entry. Useful for the writer, invisible to the org. Make every postmortem operationally consequential.
The discipline is harder than the template. Templates are easy. The hard part is the 90-day review actually happening, the action items actually being closed, and leadership actually funding the work that came out of last quarter's incidents. Without that follow-through, the template is just a nicer-looking journal. Build the follow-through first, then the template helps.
Read more field notes, explore our services, or get in touch at info@bipi.in. Privacy Policy · Terms.