BIPI
BIPI

Internet Archive: 31M Records, A Token Leak, and a DDoS Combo

Threat Intelligence

The Internet Archive took on a multi-vector attack in October 2024 with data theft, JavaScript defacement, and DDoS. The Zendesk token leak that followed was its own crisis.

By Arjun Raghavan, Security & Systems Lead, BIPI · May 13, 2024 · 8 min read

#internet-archive#breach#ddos

On October 9, 2024, visitors to archive.org saw a JavaScript pop-up announcing that Internet Archive had been breached and 31 million user records leaked to Have I Been Pwned. Simultaneously, the site fell under a sustained DDoS attack. Over the following weeks, the same actor continued to harass the Archive, including a separate leak of Zendesk support tokens that exposed support correspondence with users who had filed copyright takedown requests. Three distinct attack vectors against a small non-profit running on a shoestring.

Timeline

The 31 million record exfiltration appears to have happened around late September 2024. The actor sat on the data for over a week. On October 9, they leveraged either a compromised admin account or an unpatched JavaScript injection point to ship a defacement banner to every site visitor. Simultaneous DDoS came from a separate (likely allied) actor. The Internet Archive took the site fully offline on October 9 to triage. The Wayback Machine returned in read-only mode on October 13. Full functionality returned days later. Then on October 20, BleepingComputer reported that the same actor had used a leaked Zendesk API token to access the Archive's support ticket history.

Root cause: a stack hardened against archival, not adversaries

The Internet Archive is open about its operating model: small team, narrow budget, mission focused on preservation. The breach details that have emerged point to a compromise of a GitLab token in a repository, which gave the actor access to source code, and from there to credentials embedded in source. From those credentials, the actor reached production user data. The Zendesk leak was its own chain: an authentication token tied to the Archive's Zendesk instance leaked separately and was not rotated even after the initial breach disclosure. That gave the actor a second window into a different sensitive data set.

Attacker actions

The actor's playbook had three distinct phases. Data theft first: pull the user database, extract emails, bcrypt password hashes, and screen names. Defacement second: ship a JavaScript banner on October 9 timed to maximize public attention. DDoS third, plausibly via a separate aligned actor, to keep the Archive offline while the defacement carried news cycles. The Zendesk token leak came later and reads more like opportunistic harvesting than a coordinated phase: the token was in a different secret store, expired credentials were not rotated, and the actor walked through that door because it was open.

Detection

We do not have public post-mortem detail on whether internal telemetry caught the September exfiltration. What is known is that the defacement was the disclosure event: users saw the pop-up before the Archive announced. That sequence is the worst outcome for an incident response timeline. The detective control that should have caught this earlier is secret scanning on the source repo (with alerting on commits that introduce secrets) plus egress monitoring on database hosts. Both are achievable on non-profit budgets. Have I Been Pwned ingesting the 31M record file as 'Internet Archive' was the second public detection event.

Lessons

First, the Archive's situation is the future of public-interest internet infrastructure attacks. Mission-critical preservation, journalism, civic tech, and research data live on lean ops budgets and rely heavily on SaaS support tools that get treated as low-stakes. The threat model for those organizations has graduated, but the security investment has not. Second, the Zendesk angle is the part to dwell on. Support inboxes contain extraordinarily sensitive material: copyright disputes, harassment reports, account recovery details, user-submitted documents. Treat support platforms with the same care as production databases.

The BIPI take

Multi-vector campaigns against single targets are now a normal threat pattern, not an APT specialty. The combination of data theft, defacement, DDoS, and follow-on token abuse against the Archive shows what a motivated single-actor crew can do against a small operations team. The defensive answer is not bigger budgets the Archive cannot raise; it is targeted hardening of the highest-leverage paths: source secrets, support platforms, and out-of-band incident comms.

Read more field notes, explore our services, or get in touch at info@bipi.in. Privacy Policy · Terms.