guide··8 min read

How to Spot AI-Written Text: 9 Signs That Actually Work

A field guide to recognizing AI-generated writing — from burstiness and perplexity to the telltale tics of GPT, Claude, and Gemini. Practical examples, not theory.

Most AI-detection advice on the internet is either wishful thinking or someone selling a tool. This is neither. It's nine signals you can actually use, with examples, and an honest note on where each one breaks. None of them is a silver bullet. Combined, they're a decent forensic kit.

1. Burstiness — the rhythm test

Human writing oscillates. We pile a long sentence onto a short one, then drop a one-word fragment for emphasis. AI writing — especially from instruction-tuned models like GPT or Claude — tends to settle into a uniform pace. Eighteen words. Twenty words. Sixteen words. A flat line, sentence after sentence.

Measure it: count words per sentence and compute the standard deviation divided by the mean (the coefficient of variation). Human casual writing typically lands at CV ≈ 0.5–0.9. AI prose clusters at CV ≈ 0.2–0.4.

Where it breaks: heavily-edited AI prose, or short snippets where there aren't enough sentences to compute variance.

2. Lexical tics — the "delve" problem

Modern LLMs over-use a recognizable vocabulary. The most-flagged offenders, in roughly the order you'll encounter them: delve, tapestry, navigate, leverage, robust, comprehensive, multifaceted, intricate, pivotal, groundbreaking, seamless, fostering, ever-evolving, myriad, plethora, underscore. None of these words is forbidden in good writing. The signal is density — five tells in a single paragraph is a strong tell on its own.

3. Filler phrases

These are the phrases AI uses to pad transitions: "in today's fast-paced digital landscape," "it is important to note," "in conclusion," "moreover," "furthermore," "in the realm of." Real humans occasionally use them, but rarely stack them. If you see three or more in a single piece, the probability of an LLM rises sharply.

4. N-gram repetition

Look at trigrams (three-word sequences) and count how many appear more than once. AI-generated text tends to recycle phrasing across paragraphs at a higher rate than human draft writing. The mechanism is boring: the model is sampling from a probability distribution that favors what it has already produced.

5. Sentence-start variety

Pull the first word of every sentence and count uniques. A 12-sentence paragraph by a human typically uses 9–12 different opening words. The same paragraph by GPT-4 might open with "The" six times and "Moreover" twice.

6. Punctuation patterns

Recent LLMs have a specific tic: heavy use of em-dashes (—) and semicolons. Most casual writers use neither. If you see an em-dash in every other sentence and the writer claims to be a tenth-grader, raise an eyebrow.

7. Hedging and disclaiming

AI text is structurally polite. It hedges ("some might argue"), disclaims ("it is worth noting that"), and avoids strong claims unless prompted. A piece with zero unhedged opinions across 800 words is suspicious.

8. Structural over-balancing

If every section has three bullet points; every list contains exactly five items; every comparison has two sides given equal weight — that's the model's love of symmetry. Human writing is lopsided.

9. Confident wrongness on niche facts

AI hallucinates fluently. If a piece is technically smooth but confidently wrong on a verifiable specific (a name, a date, a statistic), that's a strong signal. Humans hedge when uncertain; AI doesn't.

How to combine the signals

No single signal is definitive. Three independent signals lighting up on the same piece is. We run all of these in parallel in our text detector and produce an aggregate score plus a per-signal breakdown, so you can see exactly which signals fired and which didn't.

Where AI-text detection fails — be honest about it

  • Short text. Below 200 words, signals are too noisy.
  • Edited AI. Five minutes of human revision destroys burstiness and lexical tics. Detection drops to chance.
  • Non-native English speakers. Some heuristics fire falsely on writing by ESL authors. This has caused real harm in academic settings — never use a single detector to accuse a student.
  • Newer models. Each model release adjusts the tics. Detectors have to adapt continuously.

The takeaway: detection is evidence, not proof. Treat it like a smoke alarm — useful for triggering closer inspection, never sufficient to convict.