research··10 min read

Open-Source AI Detectors You Can Actually Self-Host in 2026

A production-oriented shortlist of MIT / Apache / BSD AI-content detectors: RoBERTa text classifiers, DIRE diffusion reconstruction, UniversalFakeDetect, AASIST, RawNet3, Wav2Vec2 anti-spoof, XceptionNet, LipForensics. With realistic VRAM, latency, ONNX export status, and how each maps to Cloudflare Workers AI vs Modal vs RunPod vs client-side WebGPU.

There's a wide gap in 2026 between "models with a paper" and "models you can ship behind an API tomorrow." This post is the latter — production-oriented OSS AI-content detectors with commercial-safe licenses, realistic deployment estimates, and concrete integration sketches against a Next.js + Cloudflare stack.

GPL and "research-only / non-commercial" repos are excluded as do-not-use for any SaaS. The bias is toward MIT, Apache-2.0, and BSD.

Latency assumptions

All numbers are realistic production estimates from published benchmarks + community deployments (not optimistic lab runs). Assumed: A10G / L4 class GPU (24 GB) unless noted; 1 MP image (1024×1024); 30 fps single-frame inference; 10 s mono 16 kHz audio; 1k tokens text.

Text detection — the 2025–2026 reality

Pure "LLM watermark detection" is fragile because most commercial models strip or rephrase watermarks. Production text detection in 2026 relies on classifier ensembles trained on multi-model corpora plus stylometry plus perplexity deltas. The strongest OSS options are classifier-based, not watermark-based.

Top 3 for commercial deployment

1. OpenAI RoBERTa-based AI Text Classifiers (community forks). Hugging Face Transformers fine-tunes, MIT (model dependent). 125M–355M parameters, ~1.2–2.5 GB FP16 VRAM. 1k tokens → p50 ~18 ms / p99 ~40 ms on A10G. ONNX export yes, GGUF via llama.cpp, WASM via ONNX-Runtime-Web. Integration effort: LOW — runs on Workers AI or client-side. Overfits to specific model fingerprints and breaks under heavy human editing. Confidence: high. Dissent: vulnerable to adversarial paraphrasing.

2. DetectGPT-style curvature classifiers (open implementations). Paper: arXiv:2301.11305 with 2025 reimplementations, MIT. Depends on base LLM (1–3B local), 6–12 GB VRAM. 1k tokens → p50 ~300 ms / p99 ~600 ms. Integration effort: MEDIUM — host 1–3B LLM via RunPod or Modal, cache perplexity vectors in KV. Sensitive to temperature. Confidence: medium. Dissent: costly at scale compared to simple classifiers.

3. FastText stylometric baselines. facebookresearch/fastText, MIT. ~100 MB CPU-only model. 1k tokens → p50 ~3 ms on CPU. Integration effort: VERY LOW — run inside a Worker via WASM build. Confidence: high as ensemble component. Dissent: weak standalone.

Image detection

State of play: pure CNN classifiers still work against mainstream diffusion models; frequency + residual hybrids outperform single-backbone models; provenance (C2PA) verification is the critical differentiator.

Top 3

1. DIRE (Diffusion Reconstruction Error). arXiv:2303.09295, 2025 forks, MIT. Depends on diffusion backbone (Stable Diffusion 1.5), 8–10 GB VRAM. 1 MP → p50 ~180 ms / p99 ~350 ms. Integration effort: MEDIUM — host on Modal or RunPod GPU; not Workers AI-friendly yet. Confidence: high. Dissent: fails on GAN or non-diffusion generators.

2. CNN Spectral Forensics (ResNet50 variants). Multiple MIT repos updated 2025. 25M params, ~1.5 GB VRAM. 1 MP → p50 ~12 ms / p99 ~25 ms. Integration effort: LOW — export to ONNX and run on Workers AI. Confidence: medium-high. Dissent: arms race — artifacts disappearing in SDXL-class models.

3. UniversalFakeDetect (MIT). EfficientNet backbone, ~2 GB VRAM. 1 MP → p50 ~20 ms. Trained on multi-model 2024–2025 corpora. Strong cross-model AUROC ~0.85–0.9. Integration effort: LOW — good Workers AI candidate. Confidence: high. Dissent: needs frequent retraining.

Audio detection — raw-waveform transformers won

Audio deepfake detection improved significantly in 2025 with raw-waveform transformer approaches.

Top 3

1. AASIST (MIT forks). arXiv:2110.01200, 2025 forks. 30M params, 2–4 GB VRAM. 10 s audio → p50 ~60 ms / p99 ~110 ms. Integration effort: LOW — export to ONNX, host on Workers AI or Modal. Confidence: high. Dissent: requires periodic retraining; degrades on unseen TTS engines.

2. RawNet3 (Apache-2.0). Maintained 2025. ~45M params, 3–5 GB VRAM. 10 s → p50 ~90 ms. Strong generalization to new vocoders. Integration effort: MEDIUM — better on GPU host. Confidence: medium-high.

3. Wav2Vec2-based binary classifier (MIT). 95M params, 4–6 GB VRAM. 10 s → p50 ~120 ms. Leverages pretrained SSL embeddings. Integration effort: MEDIUM. Larger and more expensive than the alternatives.

Video detection — frame-level + temporal consistency wins

Best practical approach for 2026 is frame-level ensemble + temporal consistency model. Fully 3D CNNs are heavy and rarely justify the cost.

Top 3

1. XceptionNet Deepfake Detector (MIT forks). 23M params, 2 GB VRAM. Per frame → 15 ms. Still strong baseline for face swaps. Integration effort: LOW. Weak against diffusion video; fails on non-face AI video.

2. TimeSformer-based detector (Apache-2.0 forks). 121M params, 8–12 GB VRAM. 30 frames → p50 ~350 ms. Captures temporal inconsistencies. Integration effort: HIGH. Heavy and expensive.

3. LipForensics (MIT). Lightweight temporal CNN, 2–3 GB VRAM. 30 frames → p50 ~120 ms. Good for talking-head deepfakes. Integration effort: MEDIUM. Narrow scope.

Client-side (WASM / WebGPU) — what's viable today

Yes:

  • RoBERTa-base AI text classifier via ONNX-Runtime-Web
  • ResNet spectral image detector quantized INT8
  • FastText stylometry
  • Lightweight Xception frame classifier

No:

  • Diffusion-based detectors (DIRE class)
  • RawNet3 full precision
  • TimeSformer

Do-not-use (license conflicts)

  • GPTZero models (proprietary, not OSS)
  • Turnitin detector (proprietary)
  • Any GPL-3.0 repo without commercial exception
  • Models marked "research only" — common in deepfake repos

Integration architecture for Next.js + Cloudflare

Workers AI — image CNN detector, text RoBERTa classifier, audio AASIST (if memory fits). Modal / RunPod GPU — diffusion reconstruction detector, temporal video models. Client-side (WebGPU) — FastText stylometry, lightweight image FFT detector.

Store artifacts in R2. Scores + feature vectors in D1. Cache model outputs in KV. Queue heavy jobs via Cloudflare Queues.

What we ship next

1. Image Ensemble (Highest ROI). FFT-ResNet + UniversalFakeDetect + ELA heuristic. All ONNX on Workers AI. Heatmap visualization. Aligns directly with "shows its work" positioning. Effort: 2–3 weeks.

2. Audio AASIST Deployment. ONNX version on Modal GPU endpoint. Returns waveform anomaly visualization + mel-spectrogram attention maps. Effort: 2 weeks.

3. Text Ensemble (RoBERTa + Stylometry + Perplexity delta). RoBERTa via Workers AI. Perplexity using a small 1B LLM on RunPod. Feature breakdown stored for the transparency panel. Effort: 3–4 weeks.

Open questions

  • How robust are 2025 detectors against multimodal generation pipelines (image edited in Photoshop after diffusion)?
  • Can lightweight client-side models meaningfully reduce server cost without hurting trust?
  • Will EU AI Act watermark mandates (2026–2027 enforcement) reduce the need for forensic detection?
  • Are diffusion-reconstruction methods sustainable as SDXL-class models remove spectral artifacts?
  • What benchmark should we publicly anchor to for credibility — ASVspoof 2025, DFDC, or a proprietary corpus?

That last question is the one we answer with our monthly benchmark dashboard.