AI watermarking refers to techniques that embed imperceptible signals — statistical patterns in text, or pixel-level modifications in images and video — into AI-generated content, enabling later detection of whether a piece of content was produced by an AI system. Text watermarking modifies the probability distributions used during token sampling to create a detectable statistical fingerprint. Image watermarking embeds patterns imperceptible to humans but detectable by algorithms. In 2026, AI watermarking is an active area of research and regulation, required by the EU AI Act for certain high-risk AI outputs and adopted voluntarily by Google, Meta, and OpenAI.
How text watermarking works
The most studied text watermarking approach (Kirchenbauer et al., 2023) works by dividing the vocabulary into 'green' and 'red' token lists at each generation step — the division changes based on the preceding token. The model is biased to sample from the green list more often than chance. Text generated with this bias contains a statistically anomalous proportion of green tokens, detectable by anyone with the secret partition key. Without the key, the watermark is invisible and the text appears natural.
| Watermarking type | How it works | Detectability | Evasion resistance |
|---|---|---|---|
| Statistical token bias (text) | Green/red token lists — green tokens sampled more often | Detectable with key; near-zero false positives | Vulnerable to paraphrasing; robust to minor edits |
| Semantic watermarking (text) | Encodes bits in synonym choices, sentence order | More robust to paraphrasing | Harder to implement; still vulnerable to aggressive rewriting |
| Perceptual image watermarking | Imperceptible pixel modifications using steganography | Invisible to humans; detectable algorithmically | Vulnerable to compression, resizing, colour editing |
| SynthID (Google DeepMind) | Watermarks in both pixel space and spectral domain | Robust to many transformations | Industry-leading robustness; used in Gemini-generated images |
| C2PA metadata (Adobe/Microsoft) | Cryptographic provenance metadata attached to file | Easy to check with compatible tools | Trivially removed by screenshot or recompression |
The EU AI Act watermarking requirement
The EU AI Act (in force from August 2026) requires that AI-generated content in high-risk categories — deepfakes, AI-generated audio and video in specific political or judicial contexts, and AI-synthesised text used to deceive — must be marked as AI-generated in a machine-readable format. For general-purpose AI systems above the systemic risk threshold (>10²⁵ FLOP), labelling of AI-generated synthetic media is required. This regulatory requirement is driving standardisation efforts including C2PA (Coalition for Content Provenance and Authenticity), backed by Adobe, Microsoft, Google, and Truepic.
Limitations — the fundamental challenge
- Paraphrase attacks: A determined actor can ask another AI to rewrite watermarked text in different words — preserving meaning but removing the statistical watermark pattern.
- Screenshot attacks: Any image watermark can be defeated by screenshotting the image and resaving, which discards metadata and scrambles pixel-level watermarks.
- No central authority: Text watermarking only works if the generating model is known and the detection key is available. Without a universal watermarking standard and verification infrastructure, detecting AI origin across multiple models remains practically difficult.
- False negatives for non-watermarked AI: Absence of a detectable watermark does not prove human authorship — it proves only that the content was not generated by a watermarking-enabled model. Non-watermarked AI outputs are indistinguishable from human content without watermarking.
Practice questions
- What statistical property does text watermarking modify to embed a signal? (Answer: Text watermarking modifies the token sampling probability distributions. The token vocabulary is randomly partitioned into green and red lists. During generation, the model is biased to sample from the green list with increased probability. Watermarked text has more green-list tokens than expected by chance (detectable via z-score test). The reader can verify watermarking by hashing each token's context seed and checking green/red list membership.)
- Why is paraphrasing a powerful attack against text watermarks? (Answer: Current text watermarks are fragile under paraphrasing — rewriting watermarked text with different words while preserving meaning destroys the green/red token pattern. Since the watermark is token-sequence-specific, synonym substitution, translation to another language and back, or summarisation removes the signal. Semantic watermarking (embedding meaning-level patterns) is more robust but harder to implement without affecting generation quality.)
- What is the difference between visible and invisible AI watermarks in images? (Answer: Visible: a logo or label overlaid on the image (easily removed with cropping or inpainting). Invisible: imperceptible pixel-level modifications detected only by algorithms. StegaStamp and Stable Signature embed robust patterns in frequency domain (DCT coefficients) that survive JPEG compression, resizing, and minor editing. Tree-Rings embeds patterns in the initial noise latent of diffusion models — more robust but requires access to the original generation parameters for detection.)
- Under the EU AI Act, which types of AI content require watermarking or disclosure? (Answer: Articles 50 and 51 require: AI-generated audio/video/image content used to mislead (deepfakes) must be labelled. GPAI systems generating synthetic media must implement technical measures to mark content as AI-generated. Chatbots must inform users they are talking to AI. The requirement is labelling/disclosure, not necessarily technical watermarking — though technical watermarks support automated enforcement.)
- C2PA (Coalition for Content Provenance and Authenticity) provides an alternative to statistical watermarking. How? (Answer: C2PA uses cryptographic signing rather than invisible pixel/token patterns. Each AI-generated image carries a digitally signed manifest (metadata) from the creator's hardware or software that records: who created it, when, with what AI system, and any editing history. This is attached to the file (not the pixels) so it survives most sharing but can be stripped. Adobe, Microsoft, Google, and major camera manufacturers all support C2PA.)