BIPI
BIPI

SaaS Startup IR: Multi-Tenant Blast Radius and Customer Notification

Cloud Security

A security incident in a multi-tenant SaaS platform can affect every customer simultaneously. This playbook covers blast radius assessment, tenant isolation failures, SOC 2 obligations, and GitHub secrets leaks.

By Arjun Raghavan, Security & Systems Lead, BIPI · October 11, 2024 · 10 min read

#incident-response#saas#multi-tenant#soc-2#github#cloud-security

The 2023 CircleCI incident began with a single stolen session cookie and resulted in customer secrets being exposed across the entire platform. The 2023 Okta support system breach exposed customer session tokens that were then used to compromise several high-profile Okta customers including MGM Resorts and Caesars Entertainment. Multi-tenant SaaS breaches have a blast radius that scales with the customer count, and the IR response must match that scale.

The Multi-Tenant Threat Model

In a properly isolated multi-tenant architecture, a compromise of one tenant's data should not expose another tenant's data. In practice, implementation gaps in tenant isolation are common and are a primary target for attackers who want to maximize impact. Shared infrastructure, shared databases with row-level security, shared message queues, and shared caching layers are all potential isolation failure points.

  • Shared database with row-level security (RLS): A bug in RLS policy can expose all tenant data to any authenticated user.
  • Shared application tier: A server-side request forgery (SSRF) vulnerability can be used to access internal APIs that serve data for other tenants.
  • Shared message queues: Improper tenant_id validation in queue consumers can result in one tenant processing another tenant's events.
  • Shared logging infrastructure: Log aggregation pipelines that do not enforce tenant isolation can leak data across tenant boundaries.

First Response: What Is the Blast Radius?

The first question in a SaaS incident is not who attacked you but how many customers are affected. This requires understanding the architecture well enough to trace the attack path from the initial compromise to the data that was accessed, and then mapping that data to the tenants who own it.

  1. Identify the attack vector: Was it a credential compromise, a code vulnerability, a supply chain attack, or a GitHub secrets leak?
  2. Determine which system was initially compromised and what that system has access to. A developer laptop has different blast radius potential than a production database host.
  3. Pull API access logs and database query logs for the compromise window. Identify every tenant_id that was accessed by the compromised credential or service account.
  4. Check for cross-tenant data access: were any queries or API calls made that accessed data belonging to tenants other than the one associated with the compromised credential?
  5. Identify all data types accessed: PII, financial data, health data, credentials, API keys. Each data type may trigger different notification obligations.

GitHub Secrets Leak: A Common SaaS IR Trigger

Leaked secrets in public GitHub repositories are responsible for a significant percentage of SaaS startup security incidents. A single committed .env file, API key, or database connection string can give an attacker full production access. GitHub's secret scanning alerts are valuable but are not a substitute for pre-commit hooks and secrets scanning in the CI pipeline.

  • Immediately rotate all credentials that were exposed, in the following priority: cloud provider access keys (AWS, GCP, Azure), database credentials, third-party API keys, signing secrets and JWT secrets.
  • After rotating, check cloud provider access logs (AWS CloudTrail, GCP Audit Logs) for any API calls using the exposed key. The window is the time between the commit that exposed the secret and the time of rotation.
  • Check for any IAM roles or service accounts created using the exposed key. Attackers establish persistence by creating new credentials before the original secret is rotated.
  • Audit all resources in the affected cloud account for unexpected modifications: new EC2 instances, new Lambda functions, modified S3 bucket policies, new IAM users.

SOC 2 Breach Notification Obligations

SOC 2 Type II is a controls attestation, not a regulatory framework. It does not prescribe specific breach notification timelines. However, your SOC 2 report almost certainly includes a description of your incident response and breach notification procedures, and your customers have likely contractually relied on those descriptions. Violating your stated procedures is both a SOC 2 audit finding and a potential contract breach.

undefined
undefined
undefined
undefined
undefined
undefined
undefined
undefined
Your SOC 2 report is a contractual representation to your customers. If your actual incident response does not match what your controls say, you have a simultaneous security failure and a contract risk.

Customer Notification: What to Say and When

  • Notify affected customers within the timeframe specified in your MSA or DPA, typically 72 hours.
  • The initial notification should include: the date of discovery, what happened (at a high level), what data was involved, what you have done to contain the incident, and what affected customers should do.
  • Do not wait for a complete forensic investigation to send the initial notification. Send what you know, acknowledge what you do not yet know, and commit to updates.
  • Designate a single point of contact for customer questions. Do not have multiple team members giving different answers to the same customer.
  • Follow up with a detailed post-incident report within 30 days covering root cause, full scope, and remediation steps taken.

Hardening Multi-Tenant Architecture Post-Incident

  • Implement tenant isolation testing as part of your CI/CD pipeline. Automated tests that verify one tenant cannot access another tenant's data should run on every deployment.
  • Deploy pre-commit hooks and secrets scanning (truffleHog, detect-secrets) in all developer environments and as a required CI check.
  • Implement just-in-time (JIT) access for production systems. Engineers should not have standing production database access; access should be time-limited and logged.
  • Enable VPC flow logs, CloudTrail, and database audit logging in all production environments. You cannot investigate what you cannot see.
  • Conduct an annual third-party penetration test specifically targeting tenant isolation, in addition to your SOC 2 audit.

Read more field notes, explore our services, or get in touch at info@bipi.in. Privacy Policy · Terms.