research··14 min read

Invisible Watermarks in AI Images: A Forensic Walkthrough You Can Run in Your Browser

How DCT spread-spectrum, QIM, SynthID, TreeRing, and StegaStamp actually work — with the math, the attack landscape, and a working DCT detector you can run on any image you upload.

Most published writing about AI watermarking falls into one of two camps: marketing copy from the model labs ("our images are securely watermarked"), or hand-wavy explainers that say "the watermark is invisible to humans" and stop there. This is neither. This is the math, the attack landscape, and a working detector you can run on any image at /check/image.

We built a real DCT-domain spread-spectrum watermark — embed and detect — into the image dashboard. You can upload an image, see whether our demo pattern is present, embed it round-trip, and watch the z-score flip from ≈ 0 to over 4. The same family of math underlies SynthID, TreeRing, and most production schemes. By the end of this article you'll be able to read a watermarking paper and predict what the attack section will say before you read it.

The problem in one sentence

You want to put a signal into an image such that:

  • (a) Imperceptibility: a human can't tell it's there.
  • (b) Robustness: it survives JPEG re-compression, mild cropping, color shifts, and the rounding errors of resampling.
  • (c) Capacity: enough bits to identify the signer or the source model.
  • (d) Security: an adversary without the key can't remove or forge the signal.

These objectives are in tension. Strong embedding signals are robust but more visible. High capacity drives down per-bit signal-to-noise. Defending against adversaries means burying the signal in places they can't surgically remove without destroying the image.

The watermarking literature is mostly the story of trying to find the Pareto frontier of these tensions. Three families of approaches:

  1. Spatial-domain — modulate pixel values directly. Easy to do, easy to defeat. Almost nobody ships this anymore.
  2. Transform-domain — modulate DCT, DFT, or DWT coefficients. The mid-frequency bands are the sweet spot: too low and the change is visible; too high and JPEG throws it away.
  3. Learned — train a neural encoder/decoder pair end-to-end against a robustness loss. State of the art in 2024–2026 for "survive screenshots and printing" use cases.

Below, in order of historical depth.

1. Spread-spectrum watermarking (Cox et al., 1997)

The seminal paper. Three-line summary:

  1. Compute the global DCT of the image (or a per-block 8×8 DCT — this is what JPEG does).
  2. Pick the N largest mid-frequency coefficients. Add a tiny pseudorandom signal proportional to coefficient magnitude.
  3. To detect, recompute the DCT, extract the candidate coefficients, and correlate against the pseudorandom signal.

The "spread spectrum" name comes from radio: the signal is buried below the noise floor across many coefficients. No single coefficient is changed by much, but the correlation across many coefficients becomes statistically detectable.

The classical formulation:

  • Embed: F'_i = F_i * (1 + α * w_i) where w_i is the watermark sequence.
  • Detect: ρ = Σ_i F'_i * w_i / sqrt(Σ_i (F'_i)²)

A high ρ means the watermark is present; near-zero means it isn't. Cox showed this scheme survives JPEG compression down to quality 30, mild cropping, and gamma correction.

Our implementation in /check/image uses an additive sibling of this idea, simplified for legibility:

  • Each 8×8 block of luminance gets ±α added to a fixed mid-band coefficient (we use position [3,4] — far enough from DC that the eye doesn't see it, far enough from high-freq that JPEG doesn't kill it).
  • The sign of α is determined by a deterministic PRF of the secret key and the block index.
  • Detection is one-tailed: count how many blocks have the expected sign at [3,4] and compute a z-score against the binomial null (p=0.5).
  • Detection threshold: z > 2.33, i.e. 99% one-sided.

You can read the actual code at lib/watermark-dct.ts and run it: drop any image into the detector, then click "Embed + download watermarked PNG" to see the round-trip.

What this scheme survives

  • ✅ JPEG re-compression down to quality ≈ 70 (we tested at 75; typical detection z-score drops from 12 to 4)
  • ✅ Slight crops (a few percent off any edge — 8×8 block alignment isn't a strict requirement because we average across many blocks)
  • ✅ Mild color and contrast adjustments (the watermark sits in luminance; chroma changes don't touch it)
  • ✅ Resizing within ±10% (the heatmap gets noisier but the z-score holds)

What it doesn't survive

  • ❌ Heavy crop (down to 25% of original area — too few blocks left to clear z-threshold)
  • ❌ Median filter or strong denoising (kills high-frequency artifacts the watermark rides on)
  • ❌ Geometric distortions (rotation, perspective) — synchronization fails, blocks no longer aligned to the embedded grid
  • ❌ Re-screenshot off a monitor — the analog hop launders most digital watermarks
  • ❌ Targeted attack: an adversary who knows the scheme can subtract the average mid-band coefficient bias

That last one is why this version is for demonstration, not for adversarial security. Production schemes use ensemble of frequencies, key-dependent locations, and error-correcting codes — but the math is the same family.

2. Quantization Index Modulation (Chen and Wornell, 2001)

QIM solves the "what about the existing image content?" problem more cleanly than additive embedding.

The idea: instead of adding a perturbation to each coefficient, quantize each coefficient to one of two interleaved lattices, depending on the bit you want to embed.

Pseudo-code for the simplest form:

embed(F, bit, step):
    if bit == 0:
        return round(F / (2 * step)) * 2 * step
    else:
        return round((F - step) / (2 * step)) * 2 * step + step

detect(F, step):
    distance_to_0 = |F mod (2 * step)|
    distance_to_1 = |((F - step) mod (2 * step))|
    return distance_to_0 < distance_to_1 ? 0 : 1

The bit isn't added; it's encoded in which lattice point the coefficient is nearest. This is more robust to attacks that scale the image (because the lattice is calibrated to the original step size) and is easier to allocate bits to.

QIM is the basis of many JPEG-resilient schemes deployed in the 2010s. It's also where digital cinema watermarks (the kind that lets studios trace a leaked screener back to the cinema chain that played it) live.

3. Frequency-domain watermarks survive JPEG. Why?

JPEG compression itself is a lossy DCT-quantization pipeline. It throws away high-frequency information aggressively (the quantization tables are aggressive at high u, v) and preserves low-frequency information faithfully. The "mid-band" we embed in is in the JPEG quality survival zone for typical web JPEG (quality 75–95).

If you embed in the DC coefficient (corner [0,0]), the change is highly visible. If you embed in the high-frequency tail, JPEG erases it. Mid-band is the only place that satisfies both invisibility and survival — which is exactly why it's where every real-world image watermark goes.

4. SynthID for images (Google DeepMind, 2024)

Google's SynthID for image generation is a learned watermark — a small neural network is trained to perturb generated images such that another neural network can later detect the watermark with high probability.

Key differences from classical schemes:

  • The watermark isn't in a fixed frequency band; it's distributed across the image in whatever way the encoder learns is robust.
  • Detection requires Google's classifier — they retain the asymmetry.
  • Robustness against common edits is trained directly: rotation, crop, JPEG, color shifts, etc., are all in the training augmentation pipeline.
  • Capacity is a few bits (model identifier, generation flag).

The strategic move here is keeping the detector closed-source. They publish the fact that their images are watermarked; they don't publish how to verify them — that's a service they offer. This avoids enabling adversarial removal but means third-party tools can't independently verify a SynthID watermark today.

5. TreeRing (Wen et al., 2024)

TreeRing is a watermark designed specifically for diffusion models. The watermark is injected into the initial latent (the random noise used to seed the denoising process) rather than into the final image.

The clever part: the watermark is radial-symmetric in the Fourier domain — concentric "rings" of phase-modulated noise. This makes it invariant to image rotation (the rings rotate with the image). Detection inverts the diffusion process numerically and looks for the ring pattern in the Fourier transform of the recovered noise.

Robustness is excellent — this is one of the few schemes that survives moderate rotation and re-cropping. The catch: it requires access to the generation pipeline; you can't add a TreeRing watermark to an image after the fact.

This is the family that's going to dominate over the next 3 years. If you're building anything serious, expect every major image generator to have something like this baked in.

6. StegaStamp (Tancik et al., 2020) and the print-photograph problem

StegaStamp solves a wilder version of the problem: the watermark must survive being printed and re-photographed. This is the "concert poster" or "ID card" use case.

The architecture is a small encoder/decoder neural network, trained against a differentiable physical augmentation pipeline that simulates printing (color quantization, ink dot patterns) and photography (camera blur, perspective distortion, lighting variation). The encoder learns to embed a watermark that survives the brutal pipeline.

This is the only family of digital watermarks that reliably survives analog attacks. It's also computationally expensive and requires a trained model; you can't bolt it onto an image with a few hundred lines of math like our DCT scheme.

7. Text watermarking — adjacent but very different

A note for completeness: text watermarking (Kirchenbauer et al. 2023, SynthID-Text from Google 2024) shares the statistical core of image watermarking but operates in token space.

The principle: bias the LLM's sampling toward a "green list" of tokens at each step. Detection runs the same hash function over the candidate text, counts the green-list hits, and computes a z-score against the null. Same statistical machinery as our DCT detector — different signal carrier.

We'll wire SynthID-Text-style detection into /check/text in a future build. The math will look very familiar after this article.

The attack landscape

Every watermarking paper has an attacks section. The common attacks, ranked by difficulty:

  1. Re-encode — re-save the image as a different format / quality. Defeats spatial-domain watermarks; survived by mid-band frequency-domain schemes.
  2. Crop — slice off edges. Mostly survived by schemes with redundant embedding across many regions.
  3. Resize / scale — survived by schemes with multi-scale embedding or rotation-invariant designs.
  4. Color and contrast adjustment — survived if the watermark lives in luminance only.
  5. Median filter / denoising — kills high-frequency watermarks; mid-band schemes are partially affected.
  6. Geometric distortion (rotation, shear, perspective) — defeats most synchronization-dependent schemes; only sophisticated designs survive (TreeRing).
  7. Print and rephotograph — defeats almost everything except StegaStamp and friends.
  8. Targeted statistical attack — adversary knows the scheme and can mathematically subtract the signal. Defeats demo-grade systems; production systems use key secrecy and per-image randomization to slow this.

This ladder explains why the modern story is moving toward provenance signatures (C2PA) instead of (only) watermarks. A C2PA signature is cryptographic: it either verifies or it doesn't. There's no "robustness" question — only "is the file modified, yes or no." Watermarks remain useful for the unsigned majority of content, but signatures will eventually become the trust anchor.

What our detector actually shows you

When you upload an image at /check/image, the new "Invisible watermark scan" panel renders:

  • A per-block sign-match heatmap: 8×8 blocks colored green where the observed mid-band coefficient sign matched the expected (key-derived) sign, red otherwise.
  • A z-score against the binomial null hypothesis. z < 1: random noise. z > 2.33: 99% confidence the watermark is present.
  • A block count + match-rate so you can audit the test directly.
  • An embed playground: click one button to embed our demo pattern into your uploaded image, download the result, re-upload it. The detector flips from z ≈ 0 to z > 4 in front of you.

This is the round-trip you can use to teach yourself how watermark detection actually feels. Drop any photo, see the speckled "no watermark" heatmap, embed our pattern, drop the result back in, see the green-grid "detected" heatmap.

What we will and won't ship

We will ship:

  • DCT spread-spectrum demo (live now) — for education and as a foundation for licensable watermarks
  • Multi-pattern detection (next iteration) — scan against several published academic schemes, not just our demo
  • C2PA signature reading (already live) — for the cases where there's a cryptographic anchor

We won't ship:

  • Targeted SynthID detector — Google holds the key, and reverse-engineering it would just turn detection into a cat-and-mouse game we'd lose
  • Tools to remove or forge watermarks — same conflict-of-interest problem we wrote about in the about page

The honest meta-lesson

Watermarking is a discipline of carefully chosen tradeoffs. The marketing claim ("our images are watermarked, so you can trust the source") usually papers over which attacks the watermark survives and which it doesn't. Reading the actual paper for any production scheme will tell you what's robust, what isn't, and (most importantly) what the assumed adversary model is. There's no such thing as "the watermark survives everything" — only "the watermark survives the attacks the designers tested it against."

The path to durable trust on the internet is a layered one: cryptographic signatures (C2PA) for the cases where they apply, robust watermarks (TreeRing-class neural schemes) for content we can sign at generation time, forensic ensembles (ELA, FFT, channel decomposition) for everything else, and provenance-aware UI in the apps we use to read the web. We're building the third and fourth layers. The first two are the work of the model labs and the platform vendors.

If you read this far and want to dig into the implementation, the working code is at lib/watermark-dct.ts (~250 lines). It's small enough to read in one sitting, and it's the same shape as the production schemes you'll find in academic papers — just simpler so you can see the math.

Further reading

  • I. Cox, J. Kilian, F. Leighton, T. Shamoon. Secure Spread Spectrum Watermarking for Multimedia. IEEE Trans. Image Proc., 1997.
  • B. Chen, G. Wornell. Quantization Index Modulation: A Class of Provably Good Methods for Digital Watermarking and Information Embedding. IEEE Trans. Inf. Theory, 2001.
  • M. Tancik, B. Mildenhall, R. Ng. StegaStamp: Invisible Hyperlinks in Physical Photographs. CVPR 2020.
  • J. Kirchenbauer et al. A Watermark for Large Language Models. ICML 2023.
  • Y. Wen et al. Tree-Ring Watermarks: Fingerprints for Diffusion Images that are Invisible and Robust. NeurIPS 2024.
  • Coalition for Content Provenance and Authenticity (C2PA) Specification. v2.0, 2024.

For a quicker tour through what C2PA signatures look like in real images, see What C2PA Content Credentials Look Like in Real Images.