Skip to content

awei05/ecb-deliberation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Beyond DeGroot: LLM Deliberation Preserves Heterogeneity Despite Consensus

Code and data for the NeurIPS 2026 submission.

Overview

This repository contains the code to reproduce the experiments in the paper. The framework simulates structured multi-agent deliberation using persona-grounded LLM agents and benchmarks their belief dynamics against DeGroot and Friedkin-Johnsen (F-J) opinion-dynamics models.

Two institutional settings are studied:

  • ECB Governing Council (16 agents, 4 deliberation rounds): Consensus-based monetary policy with the ECB's informal "consensus without formal vote" tradition.
  • FOMC (6 or 12 agents, 3 deliberation rounds): Formal voting-based monetary policy, replicating and extending the design of Kazinnik & Sinclair (2025).

The key finding is that LLM agents preserve belief heterogeneity across rounds of deliberation (non-zero spread), unlike DeGroot which converges to unanimity, and that this preservation is asymmetric (hawks entrench, doves accommodate), unlike Friedkin-Johnsen which preserves spread symmetrically.

Repository Structure

code_release/
├── README.md                  # This file
├── requirements.txt           # Python dependencies
├── config/
│   ├── ecb_config.yaml        # ECB simulation parameters and committee composition
│   └── fomc_config.yaml       # FOMC simulation parameters and committee composition
├── data/
│   └── fomc_july2025_macro.txt  # Macroeconomic briefing data for FOMC simulation
├── prompts/
│   ├── ecb/                   # Jinja2 prompt templates for ECB deliberation (4 rounds)
│   │   ├── round1_initial.j2
│   │   ├── round2_discussion.j2
│   │   ├── round3_engagement.j2
│   │   ├── round4_synthesis.j2
│   │   └── extended_round.j2
│   └── fomc/                  # Jinja2 prompt templates for FOMC deliberation (3 rounds)
│       ├── fomc_round1.j2
│       ├── fomc_round2.j2
│       └── fomc_round3.j2
├── results/
│   ├── summary/               # Aggregated results for paper tables
│   │   ├── table1_ecb_results.csv       # Table 1: ECB consolidated results
│   │   ├── table3_fomc_results.csv      # Table 3: FOMC replication results
│   │   ├── cgp_calibration.json         # CGP model calibration parameters
│   │   └── fj_benchmark_results.json    # Friedkin-Johnsen sensitivity sweep
│   └── processed/             # Per-agent per-round rate data
│       ├── ecb_rates_panel.csv          # ECB baseline: 47 runs x 16 agents x 4 rounds
│       └── fomc_rates_panel.csv         # FOMC main: 10 runs x 12 agents x 3 rounds
├── scripts/
│   ├── reproduce_table1.sh    # Instructions for reproducing Table 1
│   └── reproduce_table3.sh    # Instructions for reproducing Table 3
└── src/
    ├── simulation/            # LLM deliberation engine
    ├── experiments/           # Experiment runners (multi-model, ablation, etc.)
    ├── benchmarks/
    │   └── friedkin_johnsen.py   # DeGroot and F-J opinion dynamics implementation
    └── utils/
        ├── llm_adapter.py        # Multi-provider LLM adapter (OpenAI, Anthropic, DeepSeek)
        ├── aggregation.py        # Rate aggregation and consensus mechanisms
        └── dissent_calculator.py # Role-based dissent cost calculation

Setup

pip install -r requirements.txt

Set API keys as environment variables:

export OPENAI_API_KEY="your-key"
export ANTHROPIC_API_KEY="your-key"  # for Claude Sonnet experiments
export DEEPSEEK_API_KEY="your-key"   # for DeepSeek experiments

Reproducing Results

From processed data (no API keys needed)

Tables and figures can be reproduced from the pre-computed results in results/:

File Description
results/summary/table1_ecb_results.csv Table 1: ECB consolidated results across all conditions
results/summary/table3_fomc_results.csv Table 3: FOMC replication results
results/summary/fj_benchmark_results.json Friedkin-Johnsen sensitivity sweep (stubbornness 0.0--0.99)
results/summary/cgp_calibration.json CGP model calibration (matched moments)
results/processed/ecb_rates_panel.csv Per-agent per-round rates for ECB baseline (47 runs)
results/processed/fomc_rates_panel.csv Per-agent per-round rates for FOMC main (10 runs)

The processed outputs in results/processed/ and results/summary/ are sufficient to reproduce all numerical tables reported in the paper. Re-running the full API-based deliberation experiments requires user-provided API keys and may not exactly reproduce the original raw generations because commercial model snapshots can change. For example, ecb_rates_panel.csv has columns run_id, agent, round, rate and can be used to compute spread, compression, and per-agent trajectories.

Re-running experiments (requires API keys)

ECB baseline (Table 1, row 1):

python -m src.experiments.multi_model_runner --model gpt-4o --n-runs 47

Multi-model comparison (Table 1, rows 2--4):

python -m src.experiments.multi_model_runner --model claude-sonnet-4-20250514 --n-runs 15
python -m src.experiments.multi_model_runner --model gpt-4o-mini --n-runs 15
python -m src.experiments.multi_model_runner --model deepseek-chat --n-runs 15

Temperature robustness (Table 1, rows 5--7):

python -m src.experiments.temperature_robustness --temperature 0.0 --n-runs 10
python -m src.experiments.temperature_robustness --temperature 0.3 --n-runs 10
python -m src.experiments.temperature_robustness --temperature 0.7 --n-runs 10

Order randomization (Table 1, row 8):

python -m src.experiments.order_randomization --n-runs 15 --seed 42

Persona ablation (Table 1, rows 9--10):

python -m src.experiments.persona_ablation --mode name_only --n-runs 10
python -m src.experiments.persona_ablation --mode bio_only --n-runs 10

Component ablation (Table 1, rows 11--12):

python -m src.experiments.component_ablation --mode framework_only --n-runs 10
python -m src.experiments.component_ablation --mode bio_institutional --n-runs 10

FOMC replication (Table 3):

python -m src.simulation.run_fomc --n-runs 10 --full
python -m src.simulation.run_fomc --n-runs 10 --pilot
python -m src.simulation.run_fomc --n-runs 5 --full --temperature 0.0

Friedkin-Johnsen benchmark:

python -m src.benchmarks.friedkin_johnsen --sweep

See scripts/reproduce_table1.sh and scripts/reproduce_table3.sh for convenience wrappers.

Prompt Templates

All prompts used in the deliberation are stored as Jinja2 templates in prompts/.

  • ECB deliberation uses 4 rounds: initial assessment, peer discussion, direct engagement, and synthesis.
  • FOMC deliberation uses 3 rounds, following the structure of Kazinnik & Sinclair (2025).

Each round's prompt template receives the agent's persona, macroeconomic data, and (for rounds 2+) the previous round's statements from other agents. The agent returns a policy rate recommendation along with its reasoning.

Key Metrics

All spread values are reported in basis points (bp). Key metrics:

  • R1/R2/R3 spread: Max minus min of agent beliefs in round 1/2/3 (or round 4 for ECB R3)
  • Compression: Change in spread from round 1 to final round (negative = convergence)
  • Hawk/dove delta: Change in distance of extreme agents from the mean across rounds

Models Used

Model Provider Role
gpt-4o (May 2024 snapshot) OpenAI Baseline and all robustness checks
claude-sonnet-4-20250514 Anthropic Multi-model comparison
gpt-4o-mini OpenAI Multi-model comparison
deepseek-chat DeepSeek Multi-model comparison

Data

Macroeconomic briefing data is drawn from publicly available sources (ECB SDW, Eurostat, FRED) as of the target meeting date. The data/ directory contains the briefing documents provided to agents.

The full persona JSON files are withheld during anonymous review and will be released upon acceptance. The prompt templates and anonymized processed outputs are included. The paper appendix describes the persona construction procedure and provides representative examples. The released processed outputs allow verification of all reported spread, compression, benchmark, and calibration results without requiring access to raw persona files.

License

MIT License

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors