How detection works

Methodology

Most detectors hand you a percentage and ask you to trust it. We show our work instead. Here is exactly what every check measures, signal by signal — and, just as important, what it can't tell you.

The short version

These are deterministic, explainable heuristics — transparent measurements of known artifacts, not a black-box neural classifier and not a polygraph. Each signal is scored from 0 to 1 and combined into a verdict band: likely real, ambiguous, or likely AI. Every signal is shown alongside the verdict so you can audit the call yourself. When a file carries cryptographic provenance (C2PA), that evidence outranks every heuristic below.

Human writing is uneven — it varies its rhythm, reaches for surprising words, and repeats itself in messy ways. LLM prose trends smooth and predictable. We score five structural signals plus a token-predictability estimate. Nothing is uploaded; the analysis runs in your browser.

Burstiness

The coefficient of variation of sentence lengths. People mix long and short sentences; models drift toward a uniform mid-length rhythm. Low variation reads machine-even.

Lexical tics

Density of filler phrases and “tell-words” that appear far more in generic model output than in human drafts — hedges, throat-clearing transitions, and stock connective phrases.

N-gram repetition

Repeated short word sequences. Models sometimes reuse a phrasing a human editor would have varied; unusually high repetition is a weak synthetic signal.

Sentence-start variety

How often consecutive sentences open the same way. Templated, parallel openings skew machine-like; human paragraphs start more unpredictably.

Punctuation pattern

The rhythm and distribution of punctuation. An overly even comma/period cadence is a mild tell; human punctuation is more erratic.

Token predictability (perplexity proxy)

A self-contained estimate of how predictable each token is from its neighbours — no external language model is called. Uniformly predictable text is more AI-like; we render it as a per-token heatmap so you can see which spans drove the score.

Real photographs carry sensor and compression fingerprints that generators struggle to reproduce. We reconstruct several of those fingerprints visually, so you see the artifact — not just a number. Decoding and analysis happen in your browser.

ELA uniformity (Error Level Analysis)

We re-compress the image and diff it against the original. Camera photos show uneven error concentrated along edges; a flat, uniform error field can indicate synthesis or heavy global editing.

Noise residual variance

A high-pass map isolating sensor noise (a PRNU-style proxy). Real cameras leave structured, consistent noise; many generated images have noise that is too clean or unnaturally distributed.

Frequency spectrum (FFT)

The magnitude spectrum of the image. Some generative pipelines leave periodic or atypical energy in specific frequency bands that a lens-and-sensor capture would not.

RGB channel decomposition

Per-channel statistics. Mismatches between the red, green, and blue planes can flag compositing or manipulation.

Synthetic speech — text-to-speech and vocoders — tends to be too stable and too smooth in ways a human vocal tract is not. We run a short-time Fourier analysis and score temporal stability. Audio is decoded and analyzed in your browser.

Spectral flatness uniformity

How tonal versus noise-like the spectrum is, tracked over time. Vocoded and synthetic audio trends toward unnatural, uniform flatness.

Harmonic stability

How steady the harmonic structure stays. Real voices wobble; many synthetic voices hold their harmonics too rigidly.

Spectral centroid jumps

Frame-to-frame shifts in spectral “brightness.” Real speech is jumpy and transient-rich; synthesis is often smoother than a microphone in a real room.

We sample frames and look for inconsistencies that should survive across time — noise, lighting, and motion that real cameras keep coherent but generated or heavily edited footage tends to break. Frames are sampled and analyzed in your browser.

Noise consistency

Whether per-frame sensor noise stays coherent across the clip. Real footage is consistent; spliced or generated frames drift.

Lighting plausibility

Frame-to-frame luminance changes. Implausible jumps in overall lighting can indicate editing or synthesis.

Motion plausibility

How realistic inter-frame motion is. Generated video can produce motion that does not match physical continuity.

Provenance — Content Credentials (C2PA)

When a file carries Content Credentials(a C2PA manifest), we read and cryptographically verify it directly in the browser. A valid manifest — who signed it, when, and whether AI generation was declared — is stronger evidence than any forensic heuristic, so when it is present it leads the verdict. Absence is not suspicious on its own: most files simply don't carry credentials yet, and we fall back to the forensic signals above. We never fabricate provenance.

Watermark detection

Our detectors can also scan for embedded watermarks: a statistical green-list mark in text, a DCT-domain mark in images, echo-hiding in audio, and a temporal mark in video. These demonstrate the watermarking technique end-to-end — you can embed one in the playground, then detect it. Production schemes from model vendors, such as Google's SynthID, require the issuer's keys, so we do not claim to detect those.

What this can — and can't — tell you

  • No detector — ours included — reliably beats careful adversarial editing or the newest generators. We don't publish a single headline accuracy number because it would be misleading across the range of real-world inputs.
  • Short inputs carry less signal. A two-sentence paragraph or a three-second clip can't be scored with confidence, and the result says so rather than guessing.
  • A verdict is evidence, not proof. False positives have caused real harm — students wrongly accused, photographers wrongly flagged. Treat every result as the start of a closer look, not the end of one.
  • Nothing is uploaded for the in-browser checks. Your text, images, audio, and video are decoded and analyzed locally on your device.