Neural Polygraph

Detecting hallucinations in language models using Sparse Autoencoder (SAE) spectral signatures and geometric analysis.

Quick Start

# 1. Install
pip install -e .

# 2. Verify (quick check)
python verify_setup.py

# 3. Run experiment
python run_experiment.py 01_spectroscopy

# 4. Visualize
python experiments/visualize_spectroscopy.py

Note: Use python test_setup.py for a comprehensive verification of all dependencies.

Structure

neural-polygraph/
├── src/hallucination_detector/    # Core package
│   ├── sae_utils.py                # SAE feature extraction
│   ├── geometry.py                 # Geometric analysis
│   ├── data_loader.py              # HB-1000 benchmark loader
│   └── storage.py                  # Experiment storage
│
├── experiments/                    # Experiment protocols
│   ├── 01_spectroscopy.py          # Experiment A
│   ├── visualize_spectroscopy.py   # Visualization
│   └── data/                       # HB-1000 benchmark (~1000 samples)
│
├── run_experiment.py               # Universal runner
├── test_setup.py                   # Setup verification
└── TESTING-PLANS.MD                # Research plan

Usage

Run Experiments

# List available experiments
python run_experiment.py --list

# Run Experiment A: Spectroscopy
python run_experiment.py 01_spectroscopy

# View results
python run_experiment.py --view 01_spectroscopy

Programmatic Usage

from hallucination_detector import (
    HB_Benchmark,
    ExperimentStorage,
    compute_inertia_tensor,
)

# Load benchmark
benchmark = HB_Benchmark("experiments/data")
benchmark.load_datasets()
benchmark.load_model_and_sae(layer=5, width="16k")

# Get activations
activations = benchmark.get_activations("The Eiffel Tower is in Paris")
print(f"L0 Norm: {activations.l0_norm}")
print(f"Reconstruction Error: {activations.reconstruction_error:.4f}")

# Save results
from pathlib import Path
storage = ExperimentStorage(Path("experiments/my_experiment"))
storage.write_manifest({"experiment": "my_experiment"})
storage.write_metrics({"metric": [...]})

Experiments

Experiment A: Spectroscopy ✅

Goal: Demonstrate distinct spectral signatures of hallucinations

Metrics: L0 Norm, Reconstruction Error, Gini Coefficient

Run: python run_experiment.py 01_spectroscopy

Experiment B: Geometry 🚧

Goal: Measure the "shape" of thoughts using inertia tensors

Status: Coming soon

Experiment C: Ghost Features 🚧

Goal: Identify features unique to hallucinations

Status: Coming soon

Data: HB-1000 Benchmark

Dataset	Samples	Description
Entity Swaps	230	Geographic/entity errors
Temporal Shifts	270	Temporal errors
Logical Inversions	250	Logical flips
Adversarial Traps	250	High-probability misconceptions

Total: ~1,000 fact/hallucination pairs in experiments/data/

Dependencies

Core: torch, transformer-lens, sae-lens, numpy, polars

Viz: matplotlib, seaborn, plotly

Analysis: scikit-learn, umap-learn

See pyproject.toml for complete list.

Troubleshooting

Import errors: pip install -e .

Memory issues: Use CPU mode or smaller batches

Model download: Models download from Hugging Face (~2GB)

Test setup: python test_setup.py verifies everything

Research Plan

See TESTING-PLANS.MD for detailed experimental protocols and hypotheses.

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
.cursor		.cursor
experiments		experiments
exports		exports
src/hallucination_detector		src/hallucination_detector
tutorials		tutorials
.gitignore		.gitignore
03_ghost_features_20251229_013356.parquet		03_ghost_features_20251229_013356.parquet
06_comprehensive_analysis_20251229_021806.csv		06_comprehensive_analysis_20251229_021806.csv
CLI_SUMMARY.md		CLI_SUMMARY.md
LICENSE		LICENSE
QUICK_START.md		QUICK_START.md
README.md		README.md
README_CLI.md		README_CLI.md
TESTING-PLANS.MD		TESTING-PLANS.MD
pyproject.toml		pyproject.toml
run_cli.py		run_cli.py
run_experiment.py		run_experiment.py
test_setup.py		test_setup.py
verify_setup.py		verify_setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Neural Polygraph

Quick Start

Structure

Usage

Run Experiments

Programmatic Usage

Experiments

Experiment A: Spectroscopy ✅

Experiment B: Geometry 🚧

Experiment C: Ghost Features 🚧

Data: HB-1000 Benchmark

Dependencies

Troubleshooting

Research Plan

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Languages

License

ariaxhan/neural-polygraph

Folders and files

Latest commit

History

Repository files navigation

Neural Polygraph

Quick Start

Structure

Usage

Run Experiments

Programmatic Usage

Experiments

Experiment A: Spectroscopy ✅

Experiment B: Geometry 🚧

Experiment C: Ghost Features 🚧

Data: HB-1000 Benchmark

Dependencies

Troubleshooting

Research Plan

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Languages

Packages