BIPI

AI-Generated Phishing: Text Detection Is Dead

AI Security

Generated phishing copy is now indistinguishable from human writing. Detection signal has shifted entirely to behavior, link patterns, and sender provenance. Teams still relying on text-based filters are losing ground every quarter.

By Arjun Raghavan, Security & Systems Lead, BIPI · April 20, 2024 · 7 min read

#ai#phishing#detection

Two years ago we could often spot generated phishing by tone. Slightly too polished, oddly formal, weird transitions. Today we cannot, and neither can any text classifier we have tested. The signal moved. The teams that have not moved with it are still measuring detection rate against last year's threat.

On a recent engagement we ran two phishing campaigns against an insurance client's employees. One was hand-crafted by a senior consultant. One was generated end-to-end with a frontier model and a custom prompt. The generated campaign had a 23 percent click rate. The hand-crafted one had 19 percent. The generated one was cheaper to produce by a factor of fifty.

What stopped working

The text-classifier era of phishing detection assumed generated content had artifacts. It did, until roughly mid-2024. Frontier models with good prompting now produce copy that human editors cannot reliably distinguish from peer correspondence. The classifiers that scored 89 percent on 2023 phishing corpora score 51 percent on current generated samples in our internal testing.

Tone analysis: dead. Generated copy matches register and vocabulary of the impersonated sender given a few examples.
Grammar artifacts: dead. Output is cleaner than most internal email.
Boilerplate detection: dying. Models avoid templated phrasing when prompted to vary.
Length distribution: weakly useful. Generated phishing tends to be more concise than corporate norms, but the gap is closing.

What still works

Detection has shifted to signals that are not in the text body. The attacker can generate perfect copy but cannot easily forge sender history, link infrastructure, or behavioral context. These signals require investment in pipeline and data, which is why most teams under-invest in them.

Sender reputation graph: history of communication between sender and recipient, frequency, recent changes. New senders to high-value targets get scrutiny.
Link infrastructure analysis: domain age, registrar, certificate transparency, hosting provider, similarity to legitimate brand domains.
Behavioral context: time of day relative to sender's normal pattern, device, geolocation, typing fingerprint where available.
Authentication signals: SPF, DKIM, DMARC, ARC chain, BIMI presence. Generated copy in unauthenticated email is easier to flag.
Recipient action context: clicks from outside normal device or network get challenged regardless of text content.

23%

click rate on generated phishing in our recent test

51%

detection rate of legacy text classifiers on current generated samples

50x

production cost reduction vs hand-crafted phishing

What we tell defenders to do

Stop spending on text classifier upgrades. Spend on link infrastructure analysis, sender reputation, and authentication enforcement. The marginal value is in the metadata layer, not the content layer. Every dollar spent on better text models for detection is a dollar that could go to better link and sender analysis.

Browser-side click protection is underrated. By the time a user clicks, the email has already passed your gateway. A click intercept that checks the destination against threat intel, certificate age, and similarity to internal SSO domains catches what the inbox filter missed. Three of our clients have moved click protection to a Tier 1 control in the last year.

User training in the AI-phishing era

The 'check for typos' guidance is harmful. It teaches users to trust well-written email. Replace it with action-based guidance. Did the email ask you to do something out of band, urgent, or financial. Did it appear from a sender you do not normally hear from. Does the link go where the visible text claims. The questions that work do not depend on the text quality.

We rewrote one client's training program along these lines and their reported phishing rate went from 1.2 percent to 4.8 percent. That is the right direction. Users were reporting suspicious behavior instead of looking for typos that no longer exist. The actual successful compromise rate dropped because reported phish gets pulled from inboxes faster than it gets clicked.

Read more field notes, explore our services, or get in touch at info@bipi.in. Privacy Policy · Terms.