Skip to content

MorkMindy74/Busted

Repository files navigation

Busted

A multi-agent AI-text-detection framework with cross-examination debate.

Busted wraps published AI-generated-text (AIGT) detectors — DivEye, BENADV, DeTeCtive, DetectGPT, LLM-as-judge — inside a structured debate protocol (cross-examination, novelty gates, anti-groupthink steelman, dissent quotas). A moderator agent aggregates verdicts via skill-proportional weighted voting and issues a final ruling with consensus zones, irreducible tensions, and a minority report.

⚠️ This is an engineering case study, not a SOTA detector. See Honest Benchmark Comparison below. The single best detector (DivEye, 97.7% on HC3) outperforms our 4-agent ensemble (81.2% on a 32-text validation set). The value of this repo is in the negative results, the methodology, and the reproducible engineering log.


What this project actually delivers

  1. A working multi-agent debate framework for AIGT detection, with FastAPI backend, WebSocket streaming, and a D3 graph-viz frontend.
  2. A reproducible subset benchmark showing that a 4-detector ensemble beats a 7-detector ensemble (81.2% vs 78.1%) on a held-out set — evidence that "more agents ≠ better" for cross-exam protocols.
  3. A Cohen's d feature-gate methodology that falsified four candidate detectors (MFD, BENATTEN, fractals, "semantic curvature") before integration cost.
  4. Documented negative results — the kind that papers don't usually publish, but that save downstream researchers weeks of work.

Architecture

┌──────────────┐     ┌─────────────────────────────────────────┐
│ POST /api/    │────▶│ TEXT_INPUT node (TemporalKnowledgeGraph) │
│ analyze       │     └────────────────────┬────────────────────┘
└──────────────┘                          │ EventBus broadcast
                                          ▼
        ┌──────────────┬──────────────┬──────────────┬──────────────┐
        ▼              ▼              ▼              ▼              ▼
   DivEyeAgent    BENADVAgent   LLMJudgeAgent  DetectiveAgent  (others
   (XGBoost on   (RandomForest  (NIM Nemotron  (SimCSE+FAISS    disabled)
   surprisal     on multi-      reasoning)    KNN)
   stats)        encoder        
                 Benford)
        │              │              │              │
        └──────────────┴──────┬───────┴──────────────┘
                              ▼
              ┌─────────────────────────────────┐
              │  Phase 1: blind-first verdicts   │
              │  Phase 2: groupthink / dissent   │
              │  Phase 3: cross-examination      │
              │           (max 2 rounds, with    │
              │           novelty gate &         │
              │           PROTECTED-PAIR rule)   │
              │  Phase 4: weighted aggregation   │
              │           + FINAL_RULING node    │
              └─────────────┬───────────────────┘
                            │
                            ▼
                      WebSocket stream → frontend
                      (D3 dagre graph + per-agent cards)

The 4 active agents (diveye, benadv, llm_judge, detective) are configurable via the BUSTED_DETECTORS environment variable; the disabled ones (statistical, stylometric, logprob, plus archived mfd, benatten, zitnh, roberta_detector) remain in the codebase for further experimentation.


Subset benchmark — more agents ≠ better

Each row was run end-to-end (server restart, full validation pipeline) against the same 32-text validation set (20 base + 12 adversarial register- flipped). Validation set is described in tests/validation_set.py.

Subset N Overall Base Adversarial Time/text
solo_diveye 1 65.6 % 70 % 58.3 % 4.7 s
diveye + benadv 2 65.6 % 70 % 58.3 % 12 s
diveye + benadv + llm_judge 3 65.6 % 75 % 50 % 16 s
diveye + benadv + llm_judge + detective 4 81.2 % 95 % 58.3 % 15 s
Full 7-detector ensemble 7 78.1 % 95 % 50 % 22 s

Per-text wall-clock includes debate, LLM rounds, and graph commits.

Why 1–3 agents collapse to DivEye-solo accuracy: weights were calibrated to per-detector accuracy (DivEye = 4.5, BENADV = 2.5, others ≤ 1.7). With only 1–3 agents, DivEye dominates the vote; cross-exam needs at least one counterweight set to function.

Why the 7th agent destabilizes the system: documented across six candidates (MFD, BENATTEN, ZiTNH, RoBERTa-Hello-SimpleAI, DivEye when added to a stable 6-set, and "semantic curvature"). Each one degraded adversarial recall by 8–25 pp. We hypothesise that the cross-exam protocol has a structural ceiling on ensemble size.

Reproduce: python tests/subset_benchmark.py


Methodology highlights

Cohen's d gate before integration

Every candidate feature is evaluated on 3000 HC3 samples before any classifier training. We require |Cohen's d| ≥ 0.5 on at least one feature plus an ablation against length / register confounds.

Falsified candidates (saved for posterity in docs/negative_results.md):

Candidate Best |d| Verdict
Fractals (Hurst) 0.42 Below gate
MFD register-inv. ~0.2 Below gate
BENATTEN aggregate ~0.5 Centroid CV 68 % (below ensemble)
Semantic curvature 1.44 Failed length-confound ablation

Passed candidates went into the active ensemble: BENADV (|d|=1.1), DivEye (|d|=2.84 on mean_surprisal).

Skill-proportional weights

weight = 5 × max(0.05, accuracy − 0.5) from per-detector HC3 cross- validation. The mapping pushes accurate detectors to ~4-5× the weight of mediocre ones, which is necessary once you have a strong standalone detector like DivEye but creates the small-ensemble collapse documented above.

Cross-examination debate protocol

Inspired by the Council of High Intelligence skill, adapted to AIGT detection:

  • Phase 1 — Blind-first: all detectors emit a DETECTION_VERDICT node without seeing each other (prevents anchoring bias).
  • Phase 2 — Disagreement detection: moderator detects polarity pairs and applies anti-groupthink (≥ 70 % agreement → forced steelman of opposite).
  • Phase 3 — Cross-exam (max 2 rounds): each dissenter must respond to ≥ 1 specific opposing evidence and introduce ≥ 1 new claim (novelty gate). Strong-pair rule (PROTECTED_PAIR) prevents the two highest-weight detectors from being flipped by majority pressure when in the minority.
  • Phase 4 — Weighted aggregation with explicit consensus_zones, irreducible_tensions, and minority_report in the final ruling.

Honest benchmark comparison

We do not claim state of the art. The table is what it is:

System HC3 accuracy Adversarial Notes
DivEye standalone (paper) 97.7 % n/a Strongest single detector
Binoculars (2024) ~92 % ~80 % Cross-perplexity ratio
DeTeCtive (NeurIPS 2024) ~91 % ~75 % Multi-level contrastive
RADAR (2023) ~88 % ~85 % Adversarial-trained
Busted 4-agent ensemble 81.2 % 58.3 % This repo (32-text test set)
GPTZero / Originality (commercial) 75–85 % 60–70 % Industry baseline

Caveats: our validation set is 32 texts; HC3 has 24,000+. Numbers in the "Adversarial" column use register-flipped prompts (formal-AI / casual-human) and are not directly comparable across papers. Treat the table as order-of-magnitude orientation.


Quickstart

Requirements

  • Python 3.11+
  • CUDA 12.4 GPU recommended (CPU works but is 10–30× slower)
  • NVIDIA NIM API key (free tier sufficient for testing) — or a local Ollama server with llama3.2

Install

git clone https://github.com/MorkMindy74/Busted.git
cd Busted
pip install -r requirements.txt

# Set your NIM key
cp .env.example .env
# edit .env and set NVIDIA_API_KEY

# Pull third-party deps (DeTeCtive, diveye)
# See VENDOR.md for instructions

# Build HC3 subset (only if you want to retrain classifiers)
python tests/build_hc3_subset.py

Run

# Default: 4-agent winning ensemble
uvicorn backend.main:app --host 127.0.0.1 --port 8765 --reload

# Custom subset
BUSTED_DETECTORS=diveye,benadv uvicorn backend.main:app --port 8765

Open http://127.0.0.1:8765 in your browser. Paste text, hit Analizza, watch the debate unfold in real time.

Reproduce the subset benchmark

python tests/subset_benchmark.py
# results -> tests/subset_benchmarks/summary.json

Allow ~3 hours for the full 5-subset sweep on an RTX 2050.


Repo layout

Busted/
├── backend/              # FastAPI app, agents, detectors, event bus, KG
│   ├── agents/           #   one wrapper per detector + moderator
│   ├── detectors/        #   pure detection logic (no framework)
│   ├── events/           #   pub/sub EventBus
│   ├── graph/            #   TemporalKnowledgeGraph + schema
│   ├── llm/              #   NIMScheduler with fallback pool
│   └── routes/           #   /api/analyze, /api/graph, WebSocket
├── frontend/             # vanilla HTML/JS + D3 graph viz
├── detective_models/     # our trained classifiers (joblib)
├── docs/                 # research log, methodology, negative results
├── tests/                # benchmark + extraction + training scripts
├── tasks/                # internal todo + lessons (development log)
├── config.py             # detector registry, weights, model pools
├── requirements.txt
├── LICENSE               # MIT (original code)
├── NOTICE.md             # third-party attribution
└── VENDOR.md             # how to fetch third-party deps

What's not in this repo (and why)

  • vendor/DeTeCtive — upstream has no LICENSE file at time of writing. Clone yourself; see VENDOR.md.
  • vendor/diveye — CC BY-NC-SA 4.0 from IBM. Clone yourself.
  • detective_models/M4_monolingual_best.pth (~476 MB) — third-party DeTeCtive checkpoint; download from HuggingFace.
  • hc3_data/ — Hello-SimpleAI HC3 raw dump. Build with tests/build_hc3_subset.py.
  • API keys.env is git-ignored. Use .env.example as a template.

Contributing

Issues and PRs welcome. Ideas particularly worth exploring:

  • A 5th detector that doesn't trip the Nth-agent destabilization (theory: it must contribute orthogonal evidence — Cohen's d on independent features, not just any signal).
  • Evaluation on a larger held-out set (HC3 test split, RAID, M4-monolingual test).
  • A "fast" mode without the LLM judge (would cut latency from ~15 s to ~3 s per text).

Please run pytest and the subset benchmark before submitting changes that touch detector logic.


Citation

If you use this work, please cite the underlying detectors (the real science) in addition to this repo:

@misc{busted2026,
  title  = {Busted: A multi-agent AIGT detection framework with cross-
            examination debate},
  author = {Rossi, Marco},
  year   = {2026},
  url    = {https://github.com/MorkMindy74/Busted}
}

For the wrapped methods, cite their original papers (DivEye, DeTeCtive, DetectGPT, etc.) — see NOTICE.md.


License

MIT for the original code in this repo. Third-party components retain their respective licenses; see NOTICE.md.

About

Multi-agent AIGT detection framework with cross-examination debate. Engineering case study with reproducible negative results and Cohen's d gate methodology.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors