research··12 min read

Citable, Tamper-Evident Result Pages for AI-Detection — A 2026 Implementation Spec

How to build /check/<id> result pages that journalists can cite in print and courts can trust. Concrete spec for permalink design, hash chains, Sigstore + Rekor anchoring, dual Ed25519 + keyless signing, multi-archive redundancy via Internet Archive and IPFS, and deterministic PDF export — all on Cloudflare Workers + R2 + D1.

If your detection results can't be cited in print or audited by an opposing expert, you don't have evidence — you have a dashboard. This post is our working implementation spec for /check/<id> result pages that are immutable, cryptographically anchored, and citable like a DOI.

Patterns drawn from Sigstore (keyless signing + Rekor transparency log), C2PA (content provenance), IPFS (content-addressed identifiers), VirusTotal (immutable analysis IDs), DOI (versioned permanence), Have I Been Pwned (stable breach pages), and arXiv 2403.04121 on scalable provenance verification pipelines.

Four design principles

1. Content addressing first. The URL must be derived from a stable cryptographic digest of canonicalized input plus model bundle metadata. IPFS popularized CID-based addressing; the relevant pattern is to treat the URL as a commitment to immutable bytes. VirusTotal analysis IDs are similarly content-derived.

2. Transparent signing. Sigstore's keyless flow and Rekor transparency log have become the default for software artifacts. The key idea is public verifiability plus inclusion proof in a transparency log — not just a detached signature.

3. Separation of frozen verdict vs. evolving knowledge. Academic DOI pages freeze the cited artifact while allowing later versions. Snopes and JSTOR maintain a stable citation for the version cited, even if updates are appended. That versioning model is appropriate for courts and journalists.

4. Deterministic rendering. For PDF citation parity, the rendering pipeline must be reproducible byte-for-byte given the same manifest. Mirrors reproducible builds in software supply chain security.

URL and identifier design

Route: /check/<analysis_id>

Where analysis_id is multibase(base58btc) of:

SHA-256(
  canonical_input_hash
  || model_bundle_hash
  || analysis_config_hash
  || created_at_utc_truncated_to_seconds
)

canonical_input_hash — SHA-256 over canonicalized input bytes. Text: UTF-8 NFC normalized, whitespace normalized. Image / audio / video: exact uploaded container bytes (no transcoding). Canonicalization prevents trivial slug drift.

model_bundle_hash — SHA-256 over a JSON manifest:

{
  "models": [
    { "name": "...", "version": "...", "weights_digest": "sha256:..." }
  ],
  "feature_extractors": [...],
  "thresholds": "...",
  "commit": "git sha"
}

analysis_config_hash covers runtime flags: temperature, segmentation window, FFT params, ELA params, etc.

Slug best practice: use multibase + multihash semantics inspired by IPFS CIDv1. Example: /check/bafybeigdyrzt6.... Even without running IPFS, adopting CIDv1 format communicates immutability literacy to technical audiences.

Frozen vs. live semantics

Adopt explicit versioning.

The primary permalink /check/<analysis_id> is immutable. It renders the original verdict, original model versions, original visualizations, the embedded signed manifest, and the transparency log proof.

If models improve later, create a new analysis with a new ID. On the old page, show a non-intrusive banner: "Newer analyses of this input exist: <link>."

A convenience route /recheck/<analysis_id> recomputes with current models and generates a new /check/<new_id>. It never mutates the old page.

For legal defensibility, do not auto-rerun on load. Courts prefer fixed artifacts. Journalists need citation stability.

Signed manifest and verifiable claims

On every /check/<id>, embed a signed JSON manifest:

{
  "analysis_id": "...",
  "input_hash": "sha256:...",
  "input_media_type": "...",
  "model_bundle_hash": "sha256:...",
  "analysis_config_hash": "sha256:...",
  "output_hash": "sha256:...",
  "verdict": { ... structured ... },
  "created_at": "...",
  "service_version": "..."
}

output_hash is SHA-256 over canonical JSON of numeric scores, structured findings, references to stored artifacts (R2 object digests), and visualization data arrays.

Signing options

Primary: Sigstore keyless signing with Rekor inclusion. Generate manifest JSON. Sign with cosign using OIDC identity for your service account. Record signature + certificate + Rekor log index. Store bundle (manifest, signature, certificate, Rekor inclusion proof) in R2 and hash it.

Why Sigstore. Transparency log gives tamper-evidence beyond your own database. Courts increasingly accept transparency log inclusion proofs in software supply chain contexts. Active ecosystem through 2024–2026.

Alternative / additional: JWS (RFC 7515) with your own Ed25519 key for offline verification. Publish your public key at /.well-known/couldthisbetrue.pub.

Best pattern: dual layer.

  • Internal Ed25519 detached JWS for lightweight verification
  • Sigstore keyless signature for public verifiability and log anchoring

C2PA. If the input is an image/video, optionally emit a C2PA claim referencing your analysis manifest hash and signature as an assertion. C2PA 2.1 supports multiple assertions and external references. Positions you within emerging newsroom provenance pipelines.

Storage layout on Cloudflare

D1 holds the relational index — metadata only:

analyses(analysis_id, input_hash, model_bundle_hash, analysis_config_hash, output_hash, manifest_hash, created_at, verdict_summary_json, rekor_log_index, signature_type, previous_analysis_id)

artifacts(analysis_id, artifact_type, r2_key, sha256_digest, byte_length, content_type)

R2 holds immutable object storage:

/inputs/<input_hash>
/analyses/<analysis_id>/manifest.json
/analyses/<analysis_id>/manifest.sig
/analyses/<analysis_id>/rekor.bundle
/analyses/<analysis_id>/artifacts/<name>.bin or .png
/analyses/<analysis_id>/report.pdf

KV caches high-traffic public analyses. Durable Objects can host an optional rate-limited verifier service that fetches manifest, verifies JWS, verifies Rekor inclusion, returns status JSON.

Hash chain model

User input bytes        → canonical_input_hash
Model bundle JSON       → model_bundle_hash
Config JSON             → analysis_config_hash
Structured output JSON  → output_hash
Manifest JSON           → manifest_hash (includes all above)
Signature over manifest → signature + certificate + Rekor log entry

Any tampering at any stage invalidates output_hash, manifest_hash, signature verification, or Rekor inclusion proof. This mirrors in-toto and supply chain attestation patterns.

Cite-this-analysis affordance

On the result page, include a "Cite this analysis" button providing the stable permalink, manifest hash, date accessed, BibTeX, plain-text citation, and a short newsroom blurb.

Plain text:

CouldThisBeTrue. "AI Content Analysis Report." Analysis ID: bafybeigdyrzt6abc123. SHA-256(input)=3f2a… Model bundle hash=91ac… Generated 2026-05-28. Available at: https://couldthisbetrue.com/check/bafybeigdyrzt6abc123

BibTeX:

@misc{couldthisbetrue_bafybeigdyrzt6abc123_2026,
  author       = {{CouldThisBeTrue}},
  title        = {AI Content Analysis Report},
  howpublished = {\url{https://couldthisbetrue.com/check/bafybeigdyrzt6abc123}},
  year         = {2026},
  note         = {Analysis ID: bafybeigdyrzt6abc123. SHA256(input)=3f2a... ModelBundle=91ac...}
}

For high-value enterprise tier, optionally mint a DOI via Crossref. Adds operational overhead but materially increases courtroom gravitas.

Archive integration

After successful generation and signing, trigger a background job to submit the URL to Internet Archive's "Save Page Now" endpoint, store the returned snapshot URL in D1, and optionally push the manifest JSON to IPFS and store the CID in the manifest. Display:

  • "Archived at Internet Archive: <link>"
  • "IPFS CID: …"

Mirrors Bellingcat's multi-archive redundancy in investigations. Do not block page rendering on archive confirmation — make it asynchronous and append status later.

Deterministic PDF export

Route: /check/<id>/report.pdf

Constraints: server-side render with fixed headless Chromium version, fixed fonts embedded, no timestamps except those already in manifest, all images embedded from R2 via digest-verified fetch.

PDF metadata fields: analysis_id, manifest_hash, input_hash, signature fingerprint, rekor_log_index.

Before serving: compute SHA-256 of generated PDF, store in R2, add its hash to artifacts table, optionally include PDF hash inside a secondary signed addendum manifest. The PDF becomes independently citable.

Journalist workflow

  1. Upload media.
  2. Receive /check/<analysis_id>.
  3. On the page, see verdict + visual evidence panels + "View signed manifest" + "Verify signature" + "Archived snapshot".
  4. Click "Cite this analysis."
  5. Paste citation into article.

In a story:

According to an independent analysis by CouldThisBeTrue (Analysis ID: bafybeigdyrzt6abc123), the image shows statistical artifacts consistent with diffusion-based synthesis. Full signed report: https://couldthisbetrue.com/check/bafybeigdyrzt6abc123

A court or opposing expert can download the manifest, verify the JWS against the public key, verify Rekor inclusion, and recompute hashes if the original input is available.

What we ship next

1. Immutable manifest + dual signature (Ed25519 JWS + Sigstore), no DOI yet. ROI: very high. Effort: moderate. Add a manifest builder in the Workers pipeline. After analysis completes, canonicalize JSON and compute hashes. Sign first with an internal Ed25519 key stored in Cloudflare Secrets. Then run cosign keyless signing via a secure microservice to produce Rekor inclusion proof. Store signature bundle in R2 and surface verification UI. Delivers courtroom-grade tamper evidence without DOI registration overhead.

2. Immutable permalink semantics with explicit recheck flow. ROI: high. Effort: low. Refactor routing so /check/<id> only reads from stored manifest and artifacts. Add a "Re-run with current models" button that calls a new job and generates a new analysis_id. Link them via previous_analysis_id in D1. Banner if newer analyses exist. Prevents silent drift, aligns with academic citation norms.

3. Deterministic PDF with embedded manifest hash and archive auto-snapshot. ROI: medium-high. Effort: moderate. Fixed Dockerized headless Chromium for byte-reproducible PDFs. Embed manifest_hash and signature fingerprint in PDF metadata and footer. After storing PDF, asynchronously call Internet Archive Save Page Now API and persist returned snapshot URL. Display archive badge once available.

Open questions

  • Whether courts will accept Sigstore transparency log inclusion as equivalent to traditional PKI timestamping remains jurisdiction-dependent.
  • Whether to anchor manifest_hash periodically to a public blockchain for additional timestamp assurance is unresolved; cost and optics may outweigh practical evidentiary gain.
  • How to handle user-deleted inputs in privacy-sensitive contexts while maintaining public verifiability requires a policy decision. One approach is to retain only hashes publicly while encrypting original input with customer-managed keys.
  • Whether to pursue formal DOI issuance depends on newsroom demand and enterprise willingness to pay for citation prestige.

The core architectural decision is clear: treat every analysis as a signed, content-addressed artifact with transparent inclusion proofs, immutable permalinks, and reproducible exports. That posture aligns the product with modern supply chain integrity norms and materially differentiates from score-only detectors.