research··11 min read

Text Watermarking: How Green-List, SynthID-Text, and Kirchenbauer's Z-Test Actually Work

A working green-list watermark in your browser, the math behind Kirchenbauer's z-test, what SynthID-Text changed, and the attack landscape — paraphrasing, translation, and adversarial prompting. With a live round-trip you can run.

There are two ways to know whether a piece of text was written by an AI. One is forensic: read the text and look for the tells (we've covered these in How to Spot AI-Written Text). The other is cryptographic-ish: bias the LLM's sampling at generation time so a hash function downstream can prove it. The second approach is what people mean when they say "watermarked text."

This post is the second approach. The math, the working detector, the attacks, the honest limits.

We just shipped a real Kirchenbauer-style watermark detector at /check/text. Paste any text and click Generate watermarked sample to see the z-score flip from ≈ 0 to over 4 in the same panel. The code is at lib/watermark-text.ts — about 130 lines.

The Kirchenbauer scheme in five sentences

From Kirchenbauer, Geiping, Wen, Katz, Miers, Goldstein. A Watermark for Large Language Models. ICML 2023.

  1. At each generation step, hash the previously-emitted token together with a secret key.
  2. Use that hash to pseudorandomly partition the model's vocabulary into a "green list" (fraction γ) and a "red list" (1 − γ).
  3. Add a small bias δ to the logits of green-list tokens before sampling.
  4. The model still picks coherent text, but green-list tokens occur at frequency higher than γ.
  5. To detect: run the same hash function on candidate text, count green-list hits, compute a z-score against the null hypothesis of uncoordinated sampling.

That's it. The genius of the paper is that this is all you need — no neural network, no model surgery, no per-text key. Just a hash function, a bias, and a one-tailed binomial test.

The math

For text of N tokens, under the null hypothesis "independent of the key," each adjacent pair (prev, curr) is green with probability γ. So the count of green hits is Binomial(N − 1, γ).

  • Expected green hits: (N − 1) · γ
  • Standard deviation: sqrt((N − 1) · γ · (1 − γ))
  • Z-score: z = (observed − expected) / stdDev

If z > 2.33, the watermark is present at 99% one-sided confidence. The classical threshold in the paper is z > 4 for "robust" detection — that's about a 3 × 10⁻⁵ false positive rate. Our detector reports the raw z-score so you can pick your own threshold.

A 100-token watermarked text typically reports z ≈ 8–12 with γ = 0.5 and a strong bias δ. The detection is statistically airtight — provided you have the right key.

What we implement, exactly

Look at the actual code in lib/watermark-text.ts:

function isGreen(prev: string, curr: string, key: number, gamma: number): boolean {
  const h = hash32(prev + "\x1f" + curr, key);
  return h / 0x100000000 < gamma;
}

hash32 is FNV-1a-style. The PRF takes (prev_token, curr_token) and the secret key, returns a 32-bit hash, and treats it as a uniform random variable in [0, 1). The token pair is "green" if that variable is below γ.

Detection:

for (let i = 1; i < tokens.length; i++) {
  const g = isGreen(tokens[i-1], tokens[i], key, gamma);
  if (g) greenCount++;
}
const z = (greenCount - n*gamma) / sqrt(n * gamma * (1-gamma));

The watermarked-sample generator is a greedy Markov walk: starting from a prompt, pick each next token from a small vocabulary, preferring tokens that yield isGreen = true for the current prev. The output is repetitive and not human-quality prose — but it deterministically passes the z-test. That's the point: the round-trip proves the math without us shipping a 7B-parameter model in your browser.

Differences from a "real" production scheme

Our demo is mathematically identical to Kirchenbauer in the detection path. It differs in two practical ways:

  1. Scope of "vocabulary" — Kirchenbauer assumes a tokenizer that maps the model's actual subword vocabulary (50k+ tokens for a typical LLM). We hash on whitespace-split words. The math is the same; the actual hash buckets just have different keys. For our demo this is fine.

  2. Embed pathway — Kirchenbauer embeds at generation time by biasing logits. We can't bias an LLM you don't have. So our embed is the synthetic Markov-walk one above. A real watermarked LLM would produce coherent prose with the same z > 4 detection.

The detector itself is the same shape regardless of how the text got watermarked. Plug in the issuer's key and γ and you can verify any text they signed.

SynthID-Text — what Google changed

SynthID-Text (Google DeepMind, 2024) is the production-grade evolution. Key differences from Kirchenbauer:

  1. Tournament sampling instead of logit bias. Instead of adding δ to green-list tokens, SynthID-Text runs a small "tournament" between candidate tokens at each step, where the score is the hash output. This avoids the quality degradation of logit bias.

  2. g-values per token, not just green/red. Each token gets a real-valued "g-score" instead of a binary green/red. Detection averages g-scores; a high mean across the document indicates watermark presence. More information per token = lower text-length requirement for detection.

  3. Detection sensitivity at lower text lengths. Kirchenbauer needs ~25 tokens for reliable detection; SynthID-Text gets there at ~15.

  4. Open-sourced under permissive license. The detector is available — unlike SynthID for images, where Google holds the key. This is the most significant shift: SynthID-Text wants to be the de facto standard.

If you want production-grade detection of Google-watermarked text, integrate SynthID-Text directly. Our demo is for understanding the principle; SynthID-Text is the deployment.

The attack landscape

Every watermark has an attack section. Text watermarks, like image watermarks, sit on a tradeoff frontier between robustness and quality.

Easy attacks (defeat naive Kirchenbauer):

  • Paraphrase by hand. Re-write each sentence in your own words. Tokens change → hash buckets change → green-list rate drops to baseline.
  • Run through another LLM with a "rewrite this for me" prompt. Same effect: token replacement at scale.
  • Translate to another language and back. Different tokenizers, different vocabulary, watermark gone.
  • Substitute synonyms. Even simple word-by-word synonym replacement breaks the hash chain because the previous token affects the green-list partition.

Harder attacks (defeat robust schemes too):

  • Chunk and shuffle. Break the text into sentences, shuffle them, glue back. The local hash chain breaks.
  • Insert nonsense tokens. A few junk insertions per paragraph dilutes the green-list signal.
  • Combine multiple watermarked outputs. If two LLMs from different vendors watermark differently, mix their outputs to break both detectors.

Hardest attacks (defeat all known schemes):

  • Adversarial prompting at generation time. Some prompts produce output that scores low on the watermark detector despite being from the watermarked LLM. The model can essentially be asked to write in a way that minimizes the watermark signal.

The published robustness benchmarks for Kirchenbauer (and SynthID-Text) include these attacks. Real numbers from the Kirchenbauer paper: under heavy paraphrasing, detection drops from 99.6% accurate to ~70%. Under translation roundtrip, drops below 60%.

The bottom line for buyers: a watermark is a good signal when present, but absence of watermark is not absence of AI authorship. Use detection as one piece of evidence, not the entire case.

What this means for our detector

The watermark scan in our text dashboard does one thing well: detect our specific demo key. It's an educational artifact and a foundation for issuer-specific deployments. Real-world workflows (a teacher checking student work, a journalist verifying a source, a moderator triaging UGC) need:

  • The issuer's key. Without it, we can't detect.
  • A length threshold. Below ~50 tokens, the z-score is too noisy to be useful.
  • An understanding that the absence of detection means very little.

These limits aren't a failure of watermarking; they're a structural property. Watermarking adds evidence to text that was signed at generation. It doesn't help with text that wasn't.

When watermarks make sense vs. when they don't

The clearest fit:

  • AI labs that want to identify their own outputs in the wild. OpenAI's Bing-AI watermark scheme, Anthropic's experiments, Google's SynthID-Text — all aimed at "is this from us?" Not "is this from anyone?"
  • Educational institutions that mandate use of a specific watermarked LLM. When the workflow is "you can use this LLM, and only this LLM, and we have its key," detection is robust.
  • News organizations watermarking human-authored AI-generated companion content (image captions, summaries) that's labeled as AI in the publication itself, but where readers might re-share without context.

The unclear fits:

  • General-purpose "is this AI?" tooling for arbitrary text. Watermarking helps for the cooperating subset; you still need forensic detection for everything else. Our text detector ensembles both.

The math goes broader than text

The same statistical machinery shows up in image watermarking (we wrote about this last week) — different signal carrier (DCT coefficients vs tokens), same z-score detection, same attack ladder. Audio watermarking adds a third domain (cepstral coefficients or echo delays); next session.

If you understand:

z = (observed - expected) / sqrt(variance)

…and you understand "the issuer hashes (state, output) with a secret key to bias the output distribution," you understand the whole watermarking literature. The differences between papers are mostly about which signal carrier and how robustly to bury the bias.

Try the round-trip

Drop in any text at /check/text. The Watermark Scan panel runs on every input ≥ 5 token-pairs. Click Generate watermarked sample to fill the textarea with our greedy-watermarked text — z-score should report 8+. Edit a few words, click around, watch the detection move.

The demo is honest about its limits. The math is real.

Further reading

  • J. Kirchenbauer, J. Geiping, Y. Wen, J. Katz, I. Miers, T. Goldstein. A Watermark for Large Language Models. ICML 2023.
  • S. Dathathri, A. See, S. Ghaisas, et al. Scalable watermarking for identifying large language model outputs. Nature 2024 (the SynthID-Text paper).
  • Y. Wang, X. Zhao, et al. On the Reliability of Watermarks for Large Language Models. ICLR 2024 (attack analysis).
  • M. Christ, S. Gunn, O. Zamir. Undetectable Watermarks for Language Models. COLT 2024.

For the image-side story, see Invisible Watermarks in AI Images. For the broader detection-vs-provenance argument, What is C2PA.