ADAG

ADAG (Automatically Describing Attribution Graphs) is Transluce's circuit tracing library, developed and maintained by Aryaman Arora and Zhengxuan Wu (and, of course, Claude Code). This library includes circuit tracing code for MLP neurons in Llama and Qwen-family Transformer LLMs, along with a whole analysis pipeline which automates circuit interpretation.

The pipeline goes like this:

Prepare a circuit -- trace neuron attributions across a dataset
Cluster neurons -- group neurons by their activation/contribution patterns
Generate descriptions -- explain what each cluster does (input attribution + output contribution)
Summarize -- produce short labels for each cluster
Analyze -- steering, correlation, visualization

We also adapt the interactive frontend from Decode Research's excellent circuit-tracer library to enable viewing of all this information. We explain how to do each step below, along with downstream analysis like steering.

If you use this code, please cite:

@article{arora2026sparse,
    title={Language Model Circuits Are Sparse in the Neuron Basis},
    author={Aryaman Arora and Zhengxuan Wu and Jacob Steinhardt and Sarah Schwettmann},
    year={2026},
    journal={arXiv:2601.22594},
    url={https://arxiv.org/abs/2601.22594},
}

@article{arora2026adag,
    title={ADAG: Automatically Describing Attribution Graphs},
    author={Aryaman Arora and Zhengxuan Wu and Jacob Steinhardt and Sarah Schwettmann},
    year={2026},
    journal={arXiv:2604.07615},
    url={https://arxiv.org/abs/2604.07615},
}

Quick Start

# Install dependencies
uv sync

# Copy and configure environment
cp .env.template .env
# Fill in API keys in .env

# Trace a circuit from a config
uv run python scripts/circuit_prep/prep.py --config configs/capitals.yaml

# Or submit as a SLURM job
sbatch --export=CONFIG=configs/capitals.yaml scripts/circuit_prep/prep.sbatch

Pipeline Overview

Prepare a circuit -- trace neuron attributions across a dataset
Cluster neurons -- group neurons by their activation/contribution patterns
Generate descriptions -- explain what each cluster does (input attribution + output contribution)
Summarize -- produce short labels for each cluster
Analyze -- steering, correlation, visualization

1. Preparing a Circuit

Dataset format

Create a Python module in scripts/circuit_prep/data/ that exports:

# scripts/circuit_prep/data/my_dataset.py
prompts = ["What is the capital of Texas?", ...]
seed_responses = ["Answer:"] * len(prompts)  # prefix for model response
labels = [" Austin", ...]                    # target tokens to trace

For dynamic datasets that need model access, export a function instead:

def get_dataset(model, tokenizer) -> tuple[list[str], list[str], list[str]]:
    # ... generate prompts dynamically ...
    return prompts, seed_responses, labels

Config file

# scripts/circuit_prep/configs/my_dataset.yaml
dataset: my_dataset
output_path: results/case_studies/output_circuit.pkl
model-id: meta-llama/Llama-3.1-8B-Instruct
percentage_threshold: 0.005  # attribution pruning cutoff
batch_size: 4
k: 5                        # top-k logits to trace
apply-blacklist: true

Tracing

# Single GPU
uv run python scripts/circuit_prep/prep.py --config configs/my_dataset.yaml

# Multi-GPU data parallelism (e.g. 4 GPUs, 1 per worker = 4 workers)
sbatch --gres=gpu:4 --export=CONFIG=configs/my_dataset.yaml scripts/circuit_prep/prep.sbatch

# Large model across 2 GPUs per copy (e.g. 8 GPUs = 4 workers)
uv run python scripts/circuit_prep/prep.py --config configs/my_dataset.yaml --gpus-per-model 2

This produces a CircuitData pickle containing per-neuron attributions, activation maps, and edge weights.

Tracing API

from circuits.tracing.trace import convert_inputs_to_circuits, CircuitData
from circuits.tracing.clja import ADAGConfig

config = ADAGConfig(
    device="cuda:0",
    percentage_threshold=0.005,
    use_relp_grad=True,
    apply_blacklist=True,
    # ... see ADAGConfig for all options
)

data = convert_inputs_to_circuits(
    model, tokenizer, prompts,
    config=config,
    seed_responses=seed_responses,
    labels=labels,
    batch_size=4,
    k=5,
)
data.save_to_pickle("circuit.pkl")

2. Clustering

Load the circuit and cluster neurons into interpretable groups:

from circuits.analysis.circuit_ops import Circuit
from transformers import AutoTokenizer

c = Circuit.load_from_pickle("circuit.pkl")
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.1-8B-Instruct")
c.set_tokenizer(tokenizer, num_layers=32)

# Multi-view spectral clustering
c.cluster_multiview(n_clusters=20, get_desc=True, combine="harmonic")

# Save cluster assignments
c.save_cluster_state("cluster_state.json")

To reload cluster assignments later:

c = Circuit.load_from_pickle("circuit.pkl")
c.set_tokenizer(tokenizer, num_layers=32)
c.load_cluster_state("cluster_state.json")

3. Generating Descriptions

Generate natural-language explanations for each cluster. There are two types:

Attribution (attr): what input tokens cause this cluster to activate
Contribution (contrib): what output tokens this cluster promotes/suppresses

result = c.label_clusters_simulator_v2(
    score_explanations=True,
    num_expl_samples=5,
    attr_backend="vllm",                       # local finetuned model (no refusals)
    contrib_model_name="claude-haiku-4-5-20251001",
    verbose=True,
)
attr_results, contrib_results, attr_exemplars, contrib_exemplars, cluster_to_neurons = result

Important: Use attr_backend="vllm" (not "api") for attribution descriptions. The finetuned local model won't refuse on adversarial or safety-related content, whereas the Claude API may refuse to analyze jailbreak-style excerpts.

This requires 2 GPUs: one for the vllm explainer/simulator, one for scoring.

4. Summarizing Clusters

Generate short (1--3 word) labels for each cluster:

# Best results: attr + contrib descriptions only, no neuron descriptions
labels = c.summarize_clusters(mode="rich", attr_only=True)

# Full rich mode: attr + contrib + individual neuron descriptions
c.fetch_descriptions()  # fetch per-neuron descriptions from database
labels = c.summarize_clusters(mode="rich")

# Include top dataset examples where cluster is most active
labels = c.summarize_clusters(mode="batch", top_k_examples=10)

# Save (labels are stored in cluster state)
c.save_cluster_state("cluster_state.json")

summarize_clusters() uses claude-opus-4-6 with adaptive thinking by default.

Mode	Description	Best for
`mode="rich", attr_only=True`	Per-cluster API call with attr + contrib only	Task-specific circuits
`mode="rich"`	Per-cluster with attr + contrib + neuron descriptions	General circuits
`mode="rich", neurons_only=True`	Per-cluster with neuron descriptions only	When attr/contrib are missing
`mode="batch"`	All clusters in one prompt	Quick/cheap labeling

There's also a standalone script:

uv run python scripts/case_studies/sensitivity_analysis/3b_summarize_clusters.py \
    --mode rich --attr-only \
    --circuit circuit.pkl --cluster-state cluster_state.json

5. Visualization

Circuit Tracer frontend

The built-in web UI shows the full circuit interactively -- clusters, neurons, edges, token attributions, and descriptions. Uses the circuit-tracer frontend.

# Serve directly (launches local HTTP server)
c.serve(port=8032, slug="my_circuit")
# Open http://localhost:8032/index.html?slug=my_circuit_0

# Or export JSON files for hosting elsewhere
c.export_to_circuit_tracer("output_dir/", slug="my_circuit")

Circuit graph (Graphviz)

For a static cluster-level graph of a single circuit instance:

uv run python scripts/case_studies/capitals/plot_circuit_graph.py circuit.pkl \
    --label 0 --cluster-state cluster_state.json --dot-only
dot -Tpdf circuit_graph.dot -o circuit_graph.pdf

6. Downstream Analysis

Steering

Intervene on cluster activations and measure effects on model output:

# Ablate a cluster (0x multiplier)
diversity_stats, cluster_to_cluster, cluster_to_output = c.steer(
    model, multiplier=0.0, store_results=True
)

# Amplify a cluster (2x multiplier)
c.steer(model, multiplier=2.0, store_results=True)

For more controlled steering with ASR (attack success rate) measurement and LLM judging, see the sensitivity analysis scripts.

Correlation analysis

Correlate cluster attribution with a metric (e.g., ASR) across circuit instances:

from scipy import stats
import numpy as np

# Per-cluster: sum attribution across neurons, correlate with metric
for cl, attr_vec in cluster_attr.items():
    r, p = stats.pearsonr(attr_vec, metric_vec)

See scripts/case_studies/sensitivity_analysis/3_describe_and_correlate.py for a full example.

Case Studies

Directory	Description
`sensitivity_analysis/`	Medical safety jailbreak analysis on Llama-3.1-8B-Instruct
`capitals/`	State capitals factual recall
`capitals_qwen3/`	Capitals with Qwen3
`math/`	Arithmetic circuits
`user_modelling/`	User behavior modeling
`causalgym/`	CausalGym SVA evaluation
`benchmark/`	Tracing throughput benchmarks

Sensitivity analysis pipeline (numbered scripts)

1_serve.py                     Browse circuit interactively
2_correlate_asr.py             Per-neuron attribution vs ASR correlation
3_describe_and_correlate.py    Cluster, describe, correlate
3b_summarize_clusters.py       Generate summary labels (standalone)
4_steer_and_judge.py           Steer individual neurons, judge ASR
5_steer_cluster_and_judge.py   Steer clusters, judge ASR
6_sweep_clusters.py            Sweep all clusters (0x/2x), report ASR
7_plot_asr_vs_attribution.py   Scatter plots
8_make_steering_table.py       LaTeX table for paper

Performance

Benchmarked on H100 80GB with Llama-3.1-8B-Instruct (capitals, 50 prompts, k=5):

Batch Size	1 GPU (s/prompt)	2 GPU (s/prompt)	Peak GPU (1 GPU)
1	22.3	11.2	16.7 GB
2	13.7	7.1	17.7 GB
4	10.0	5.5	22.1 GB
8	15.1	8.0	40.9 GB

Batch size 4 is optimal. Batch size 8 is slower due to memory pressure.

Multi-GPU uses data parallelism: each GPU gets its own model copy and processes a shard of the dataset. For models that don't fit on a single GPU (e.g. Qwen3-32B), use --gpus-per-model 2.

Project Structure

circuits/
  tracing/              Core tracing algorithm
    trace.py            High-level API: convert_inputs_to_circuits()
    clja.py             ADAGConfig, core jacobian attribution
    attribution.py      Gradient attribution computation
    grad/               Model-specific gradient wrappers (RelP, stop-grad)
  analysis/
    circuit_ops.py      Circuit class: clustering, labeling, steering, serving
    steer.py            Activation steering implementation
    label.py            Batch summarization
  descriptions/         Description generation
    label.py            Orchestration: generate + score explanations
    prompts.py          All LLM prompts (attr, contrib, summary)
    exemplars.py        Token highlighting and exemplar formatting
    vllm_backend.py     Local finetuned explainer/simulator
    api_backend.py      Anthropic API-based explainer
  frontend/             Web UI for interactive circuit exploration
scripts/
  circuit_prep/         Dataset loading, config, tracing entry point
  case_studies/         Per-task analysis scripts

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
circuits		circuits
data		data
lib		lib
scripts		scripts
sweep/fc_evaluation		sweep/fc_evaluation
tests		tests
.env.template		.env.template
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
test_release.sh		test_release.sh
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ADAG

Quick Start

Pipeline Overview

1. Preparing a Circuit

Dataset format

Config file

Tracing

Tracing API

2. Clustering

3. Generating Descriptions

4. Summarizing Clusters

5. Visualization

Circuit Tracer frontend

Circuit graph (Graphviz)

6. Downstream Analysis

Steering

Correlation analysis

Case Studies

Sensitivity analysis pipeline (numbered scripts)

Performance

Project Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ADAG

Quick Start

Pipeline Overview

1. Preparing a Circuit

Dataset format

Config file

Tracing

Tracing API

2. Clustering

3. Generating Descriptions

4. Summarizing Clusters

5. Visualization

Circuit Tracer frontend

Circuit graph (Graphviz)

6. Downstream Analysis

Steering

Correlation analysis

Case Studies

Sensitivity analysis pipeline (numbered scripts)

Performance

Project Structure

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages