GCP Incident Response Runbook: Audit Logs, Chronicle, and Service Account Forensics
Cybersecurity
A working GCP IR playbook spanning Cloud Audit Logs, Chronicle SecOps hunts, service account compromise scoping, organization policy containment, and VPC Service Controls during active response.
By Arjun Raghavan, Security & Systems Lead, BIPI · June 5, 2024 · 8 min read
GCP IR centers on Cloud Audit Logs (Admin Activity, Data Access, System Event, Policy Denied) and Chronicle SecOps for retention and correlation. Most teams have Admin Activity logs by default but not Data Access. If you are reading this before an incident, turn on Data Access logging for IAM, Cloud Storage, and Secret Manager today.
1. The opening pivot
When a service account is suspected compromised, scope its activity with a single gcloud query. The protoPayload structure is where the truth lives.
gcloud logging read 'protoPayload.authenticationInfo.principalEmail="sa-build@proj.iam.gserviceaccount.com" AND timestamp>="2024-05-30T00:00:00Z"' --project=proj --format='value(timestamp, protoPayload.methodName, protoPayload.requestMetadata.callerIp, resource.type)' --limit=500
Sort by methodName. The ones that signal action are iam.serviceAccountKeys.create, SetIamPolicy, storage.objects.list, storage.objects.get, secretmanager.versions.access, and compute.instances.setMetadata. The last one is how attackers persist on Compute Engine via ssh-keys metadata.
2. Chronicle for retention and pivot
Cloud Logging has a 30-day default retention. Chronicle has a year by default. When the incident window is older than 30 days, query Chronicle UDM directly.
principal.user.email_addresses = "sa-build@proj.iam.gserviceaccount.com" AND metadata.event_type = "USER_RESOURCE_UPDATE_PERMISSIONS"
Chronicle's entity graph builds a 14-day pattern for the principal. Anything outside that pattern shows up in red on the entity timeline. Use it to spot the 'first weird thing' that started the chain.
3. Containing a compromised service account
Do not delete the service account. Disable it. Deleting destroys the audit trail of who granted what to it.
- gcloud iam service-accounts disable sa-build@proj.iam.gserviceaccount.com
- List and delete keys: gcloud iam service-accounts keys list --iam-account=sa-build@proj.iam.gserviceaccount.com then gcloud iam service-accounts keys delete <KEY_ID>
- Revoke OAuth tokens: identify any user-impersonation grants in IAM policy with roles/iam.serviceAccountTokenCreator and audit which humans hold the role.
If the account was used for workload identity from a GKE pod, cordon the namespace and delete the pods so the new pods cannot acquire fresh tokens against the disabled account.
4. Organization policy as containment lever
Organization policies can be deployed in minutes and used as scalpel-precise containment. During an active incident, two are particularly useful.
- constraints/iam.disableServiceAccountKeyCreation applied at folder level prevents the attacker from generating fresh keys on adjacent service accounts.
- constraints/iam.allowedPolicyMemberDomains scoped to your domain blocks the attacker from granting access to external principals.
Both can be lifted after the incident. Both have caught lateral movement attempts in our IR engagements.
5. VPC Service Controls during active response
If the attacker reached BigQuery or Cloud Storage, VPC Service Controls is the data-plane firewall. You can add a perimeter and place sensitive projects inside it during the incident. New egress is blocked. The perimeter takes a few minutes to propagate, but it stops exfiltration cold.
gcloud access-context-manager perimeters update prod-perimeter --add-resources=projects/123456789 --policy=POLICY_ID
Watch the VPC SC logs after the change. Denied requests appear in protoPayload.metadata with violationReason values. Each denial is a signal of where the attacker tried to go next.
6. Recovery and the evidence package
Rotate every key the principal had access to: KMS keys via gcloud kms keys versions create with state set to enabled then disable the prior version, Secret Manager versions, and any user-managed service account keys it could have created. Image any affected GCE instances by creating a machine image and copying it to a forensics project.
The GCP runbook lives or dies on Data Access logging being enabled before the incident. If yours is not, that is the action item. The rest of this playbook waits for the next pager.
Read more field notes, explore our services, or get in touch at info@bipi.in. Privacy Policy · Terms.