AI Detector Accuracy: What '87% AI' Actually Means
Detector confidence scores are widely misunderstood. Here's what '87%' really tells you, why the same number means different things in different conditions, and how to read a score without overcommitting.
Every AI detector in 2026 outputs a number. "87% AI." "92% likely human." "Confidence: high." Buyers, students, teachers, journalists, and judges read those numbers and draw conclusions — usually the wrong ones. Here's what the math actually says.
"Confidence" is not "probability you're right"
When a detector tells you "87% AI," it almost never means "there's an 87% chance this content was AI-generated." It means: under the model's training distribution, content that looks like this scored 87 on a 0–100 internal scale.
That scale is not calibrated to real-world prevalence. If a detector has 87 confidence on a piece of writing, the actual probability that the writing was AI-generated depends on:
- The base rate — how common AI content is in the population you're sampling from
- The detector's true and false positive rates at that confidence threshold
- The text length — most detectors degrade hard below 200 words
- Whether the AI was lightly edited — a 5-minute human revision cuts most signals
The base rate fallacy in one paragraph
Suppose a detector flags AI text correctly 95% of the time and falsely flags human text as AI 5% of the time. Sounds great. Now apply it to a class of 1,000 student essays where 100 were AI-written. Of the 100 AI essays it correctly flags 95. Of the 900 human essays it incorrectly flags 45. So when the detector says "AI," it's right 95 / (95 + 45) = 68% of the time. You're going to falsely accuse 45 students unless you treat the score as evidence rather than proof.
This is the calculation almost no detector tutorial walks you through. The number you trust is positive predictive value, not raw accuracy. And positive predictive value depends on the base rate, which you usually don't know.
What length does to the number
Most academic-grade detectors publish accuracy figures on text of 300+ words. Below 200 words, detection is essentially chance. Detectors don't tell users this on the result page. They show you 87% on a 50-word email and let you draw the wrong conclusion.
A useful rule of thumb:
- Under 100 words: detection is unreliable — both false positives and false negatives jump
- 100–200 words: signal is weak, treat any verdict as a hint, not a call
- 200–500 words: most-cited accuracy ranges land here
- 500+ words: best accuracy, but heavy editing still defeats most detectors
What editing does to the number
Five minutes of human editing — rephrasing, breaking long sentences, adding personal asides, removing em-dashes — drops most detectors from 95% accurate to barely above chance. This is partly a feature (the user is now actually involved in producing the text) and partly a bug (a quick edit doesn't make the underlying generation invisible if you check the right artifacts).
The honest framing: detection accuracy refers to the unedited output of the specific generators in the test set. Apply that number to a different generator (especially newer ones) or to edited text and the number is meaningless.
Why the same number means different things
Consider two cases, both flagged at 87% AI:
-
Case A: A 1,200-word college essay, never edited, written entirely by GPT-4 from a one-line prompt. The detector saw thousands of similar samples in training. The score is reliable; treat 87% as roughly 85% likely AI.
-
Case B: A 320-word email written by a non-native English speaker, polished by a grammar checker. No AI involved. The detector misfires on certain phrasing patterns common in ESL writing. The score is meaningless; the text is human.
Same number. Wildly different probabilities of being right. The detector cannot tell you which case you're in — only context can.
How an ensemble fixes (some of) this
We run six independent forensic signals. Each has its own failure modes — burstiness misfires on technical writing, lexical tics misfires on certain genres, repetition misfires on poetry. No single signal triggers across all of them at once unless the writing is truly AI.
The math behind this: if signals fail independently with rates p₁, p₂, … p₆, the probability of all failing simultaneously is the product, which gets small fast. Reality is messier (the failures are correlated), but the ensemble penalty is real and the visualization makes the failure modes visible. You can audit which signals fired.
This is also why we never show a single number without the per-signal breakdown. The single number flatters us with false confidence; the breakdown forces honesty.
What to do with a score
Practical rules:
- Below 30%: treat as a confident "not AI." Most detectors agree at this end.
- 30–60%: ambiguous. Read the source yourself. Don't act on this number alone.
- 60–85%: hint of AI. Worth investigating with non-detector evidence (drafts, version history, conversation).
- Above 85% with multiple independent signals firing: strong evidence. Still not proof — false positives have ended careers and student records. Always combine with process-based checks.
The C2PA exit ramp
Detection is statistical inference. C2PA is cryptographic verification. The detector says "this looks AI." A C2PA signature says "this is AI, signed by OpenAI's DALL-E 3 at 14:32 UTC on May 1, 2026." There is no comparable error rate.
When a piece of content carries a valid C2PA signature, no detector score should override it. When it doesn't, you're stuck doing inference, and all the caveats above apply.
The summary
"87% AI" is not "87% probably AI." It's "this content is in the 87th percentile of AI-likeness on this detector's internal scale." The actual probability of being AI depends on length, editing, generator, base rate, and which signals fired. None of that is on the result page. All of it should be.
If you want the breakdown — every signal, every misfire condition, every reason a score is what it is — you can run any text through our detector and see all five forensic signals plus the perplexity heatmap, with no marketing-grade single-number theater.