AI Watermarking: What It Promises, What It Delivers
AI Security
Watermarking AI outputs sounds like a clean solution to provenance. Reality is messier: text watermarks survive paraphrasing poorly, image watermarks fight an arms race with edits, and deployment across providers is patchy. Here is the state in 2026.
By Arjun Raghavan, Security & Systems Lead, BIPI · April 29, 2024 · 7 min read
A media customer asked us last quarter whether they should require all generated content vendors to use watermarking. The marketing copy from the vendors said yes. The technical reality is that watermarking covers a smaller fraction of the threat than the marketing suggests, and the customer ended up with a hybrid policy that combined watermarking with provenance metadata and out-of-band registration.
Watermarking is real and useful. It is not a complete solution and treating it like one is how organizations end up surprised when their detection rate is 30 percent on adversarial content.
How text watermarking works in 2026
The state of the art is descended from Kirchenbauer et al's green-list approach. The model biases sampling toward a pseudo-random subset of tokens at each step. Detection runs the same hash and counts how many of the produced tokens fall in the green list. Statistical test gives a confidence score that the text was generated by the watermarked model.
- Detection rate on unmodified output: 95 to 99 percent depending on text length.
- Detection rate after light paraphrase by another LLM: 60 to 80 percent.
- Detection rate after aggressive paraphrase or human editing: 20 to 50 percent.
- False positive rate on human-written text: well below 1 percent at standard thresholds.
The paraphrase attack is the central limitation. An adversary who runs the output through another model loses most of the watermark signal. For benign use cases like detecting student AI use this is acceptable. For adversarial use cases like deepfake content traced back to a model, it is a hole.
Image and video watermarking
Image watermarks operate either by perturbing pixels in patterns invisible to humans (Stable Signature, Tree-Ring) or by embedding signatures in the latent space before decoding. Robustness against common edits varies widely. JPEG compression at quality 80 leaves most modern watermarks intact. Cropping by 30 percent destroys some, leaves others. Adversarial editing tools targeted at watermark removal exist and work.
Video adds temporal redundancy that helps watermarks survive. Re-encoding through a heavily compressed pipeline still degrades them. The watermarks Anthropic, Google, OpenAI, and Meta have deployed all have public attack literature against them and the gap between embed and remove is roughly 6 to 12 months at any given time.
Deployment status across major providers
Patchy. Google has SynthID for text, image, and audio across Gemini and Imagen. OpenAI has not deployed text watermarking to production despite repeatedly announcing intent. Anthropic ships C2PA metadata on Claude image outputs. Meta deploys watermarks on Imagine outputs but not consistently across surfaces.
From a buyer perspective, this means you cannot assume any specific output is watermarked even if the provider has watermark technology. The deployment decision is per-product and changes quietly. We tell clients to ask vendors three questions: which models, on which surfaces, and what is the documented detection rate against paraphrase.
What customers should ask vendors
- Which of your model surfaces watermark output, and what is the documented detection threshold and false positive rate?
- What is your published detection rate against paraphrase by GPT-4 class models?
- Do you publish a detection API and on what terms is it accessible?
- Do you support C2PA or another cryptographic provenance standard alongside watermarking?
- What is your incident response if a customer reports content that should be watermarked but is not detected?
Where this is heading
Watermarking will keep improving on the embed side. Removal will keep improving on the attack side. The gap is not closing. Long term, provenance through cryptographic signature on capture and registration of generated content in trusted registries does more than watermarking ever will. The media customer we mentioned ended up requiring vendors to register every generated asset in a shared registry with a hash and provenance chain. Watermarking became a fallback signal, not the primary control.
If you are designing a content provenance program in 2026, build for layered provenance with watermarking as one of three layers. Build assuming the watermark layer will fail on adversarial content. The buyers who understand this are the ones quietly winning trust with regulated customers and platforms that care about provenance for real reasons, not just compliance theater.
Read more field notes, explore our services, or get in touch at info@bipi.in. Privacy Policy · Terms.