BIPI

AWS Incident Response Runbook: From CloudTrail Alert to Contained Account

Cybersecurity

A working AWS IR playbook covering CloudTrail triage, IAM compromise scoping, GuardDuty correlation, EBS snapshot forensics, and Detective pivots with the exact commands responders run.

By Arjun Raghavan, Security & Systems Lead, BIPI · June 2, 2024 · 9 min read

#aws#ir#runbook#cloud

An AWS incident starts loud or quiet. Loud is a GuardDuty UnauthorizedAccess finding pinging your SOC channel at 2 a.m. Quiet is a developer noticing an unfamiliar IAM user the next morning. The runbook below assumes both can happen and gives you the order in which to move.

We have shipped this playbook into mid-size AWS estates running across two to fifteen accounts. It is opinionated. Where AWS gives you three buttons, we tell you which one to press first.

1. Detection and initial signal triage

The reliable signals are GuardDuty findings, CloudTrail anomalies surfaced through Athena, and IAM Access Analyzer external findings. Treat any GuardDuty finding with severity >= 4.0 involving credentials, CloudTrail, or S3 as an incident until disproven.

GuardDuty: filter to type contains 'CredentialAccess', 'UnauthorizedAccess:IAMUser', or 'Exfiltration'.
CloudTrail: look for ConsoleLogin from new geographies and CreateAccessKey on principals that already had keys.
Access Analyzer: any new external finding on a bucket or KMS key after the suspected window.

2. Scope the compromised identity

Before you touch the principal, snapshot what it did. Run an Athena query against your CloudTrail logs partitioned by date. Replace the partition predicate with your actual range.

SELECT eventTime, eventName, sourceIPAddress, userAgent, requestParameters FROM cloudtrail_logs WHERE useridentity.arn = 'arn:aws:iam::111122223333:user/svc-build' AND eventTime BETWEEN '2024-05-30T00:00:00Z' AND '2024-06-01T00:00:00Z' ORDER BY eventTime

Pipe the result into a spreadsheet and sort by eventName. The interesting rows cluster: CreateAccessKey, AttachUserPolicy, AssumeRole, GetSecretValue, ListBuckets, GetObject. Each represents a step in the attacker's plan.

Pivot in AWS Detective by opening the principal's profile and walking the timeline. Detective is slower than Athena but answers the 'what else did this principal touch' question in two clicks.

3. Contain without destroying evidence

Do not delete the user. Quarantine instead. Attach the AWS managed AWSCompromisedKeyQuarantineV2 policy and deactivate all access keys. This blocks new actions but preserves the principal for forensics.

aws iam attach-user-policy --user-name svc-build --policy-arn arn:aws:iam::aws:policy/AWSCompromisedKeyQuarantineV2
aws iam list-access-keys --user-name svc-build then aws iam update-access-key --status Inactive --access-key-id AKIA...
For assumed roles, revoke active sessions: aws iam put-role-policy with AWSRevokeOlderSessions inline policy on the role.

4. EBS snapshot and instance forensics

If an EC2 instance is implicated, isolate it before imaging. Move it into a forensics security group that denies all inbound and outbound except your responder bastion. Then snapshot every attached volume and copy snapshots to a dedicated forensics account using KMS keys that the production account cannot read back.

aws ec2 modify-instance-attribute --instance-id i-0abc --groups sg-forensics-quarantine && aws ec2 create-snapshot --volume-id vol-0xyz --description 'IR-2024-0602 instance i-0abc root'

Tag every snapshot with the incident ID. We use a tag schema of ir:incident, ir:source, ir:collected-by, ir:hash. The hash is the SHA-256 of the EBS direct API block list, captured by aws ebs list-snapshot-blocks then hashed locally.

5. Recovery and key rotation

Rotate in this order: long-lived IAM user keys, role trust policies that reference compromised principals, KMS key policies, then any application secrets the principal could read from Secrets Manager or Parameter Store. Do not skip the KMS step. We have seen attackers leave a kms:Decrypt grant behind that survives the access key rotation.

6. After-action: hardening the next incident

Every AWS incident teaches the same lessons. Service accounts had long-lived keys, MFA was not enforced on the root account of a sub-org, GuardDuty was on but the findings went to an inbox nobody read. Pick three controls and ship them inside two weeks: SCP to deny CreateAccessKey on humans, IAM Access Analyzer policy validation in CI, and a daily Athena query for ConsoleLogin from unexpected ASNs.

2.5 hrs

Median AWS IR scoping time with Athena ready

11 hrs

Without prepped queries

Snapshots retained per instance

An AWS runbook is only useful if the queries are pre-saved, the forensics account exists before you need it, and the on-call has the AWSCompromisedKeyQuarantineV2 policy ARN in muscle memory. Build that quietly, on a Tuesday afternoon, not during the incident.

Read more field notes, explore our services, or get in touch at info@bipi.in. Privacy Policy · Terms.