Skip to content

MONISMALIK1/self_consistency

Repository files navigation

Self-Consistency with Chain-of-Thought

tests

A from-scratch Python implementation of "Self-Consistency Improves Chain of Thought Reasoning in Language Models" (Wang et al., ICLR 2023).


The Idea

Standard CoT prompting samples one reasoning chain at temperature 0.
Self-Consistency samples N chains at temperature 0.7, extracts the numeric answer from each, then majority-votes across them.
The intuition: correct reasoning paths all converge on the same answer, while errors tend to be diverse and cancel out.

question ──► [prompt + few-shot CoT]
                │
                ├──► sample 1 ──► parse ──► 8
                ├──► sample 2 ──► parse ──► 8
                ├──► sample 3 ──► parse ──► 9  ← outlier
                ├──► ...
                └──► sample N ──► parse ──► 8

majority_vote([8, 8, 9, 8, ...]) ──► answer = 8  ✓

Wang et al. report +17.9 pp accuracy on GSM8K (67.9 % → 85.7 %) over standard CoT using PaLM 540B, and consistent gains across 9 benchmarks without any training or extra supervision.


Features

  • Zero external dependencies — pure Python 3.11 stdlib
  • OpenRouter back-end — drop in any model via OPENROUTER_API_KEY
  • Parallel sampling — N calls via ThreadPoolExecutor (respects rate limits)
  • Robust answer parsing — handles $8, 8,000, 3/4, 50%, decimal, signed
  • GSM8K benchmark mode — compare SC vs CoT vs direct I/O side-by-side
  • 29 offline unit tests — no LLM calls needed for the test suite

Installation

git clone https://github.com/MONISMALIK1/self_consistency
cd self_consistency
pip install -e .
export OPENROUTER_API_KEY="sk-or-..."   # https://openrouter.ai/keys

Quick Start

# Solve a single problem (N=8 samples, majority vote)
python -m self_consistency "Janet's ducks lay 16 eggs per day. She eats 3 for breakfast and bakes 4 into muffins. She sells the remainder for $2 each. How much does she make daily?"

# Plain CoT baseline (N=1)
python -m self_consistency "..." --n 1

# Direct I/O baseline (no reasoning)
python -m self_consistency "..." --io

# Download the GSM8K test set (no API calls)
python -m self_consistency --download

# Benchmark: SC vs CoT on 25 problems
python -m self_consistency --bench --num 25 --methods sc,cot

# All three methods on 10 problems
python -m self_consistency --bench --num 10 --methods sc,cot,io

Python API

from self_consistency import solve, is_correct, load_test

sol = solve("Janet's ducks lay 16 eggs...", n=8)
print(sol.answer)        # Fraction(18)
print(sol.votes)         # 6   (6 of 8 chains agreed)
print(sol.distribution)  # [('18', 6), ('16', 1), ('20', 1)]

# Benchmark
for p in load_test(n=20):
    sol = solve(p["question"], n=8)
    print(is_correct(sol.answer, p["gold"]))

Project Structure

self_consistency/
├── llm.py        OpenRouter HTTP wrapper + parallel n-sampling
├── prompts.py    8-shot GSM8K CoT prompt (Wei et al. 2022)
├── gsm8k.py      Dataset loader + Fraction-based answer parser
├── core.py       majority_vote() + solve() → Solution
├── __init__.py   Public API
├── __main__.py   CLI
└── tests/
    ├── test_core.py    Voting logic + mocked solve()
    └── test_gsm8k.py   Parser edge cases + dataset loader

Configuration

Env var Default Purpose
OPENROUTER_API_KEY (required) OpenRouter API key
SC_MODEL openai/gpt-oss-120b:free Model slug
SC_CONCURRENCY 4 Parallel threads for sample_n
SC_DATA_DIR ~/.cache/self_consistency GSM8K cache location

Running Tests

No API key required:

python -m unittest discover -s self_consistency/tests -t . -v

Citation

@inproceedings{wang2023selfconsistency,
  title     = {Self-Consistency Improves Chain of Thought Reasoning in Language Models},
  author    = {Xuezhi Wang and Jason Wei and Dale Schuurmans and Quoc Le and
               Ed Chi and Sharan Narang and Aakanksha Chowdhery and Denny Zhou},
  booktitle = {ICLR},
  year      = {2023},
  url       = {https://arxiv.org/abs/2203.11171}
}

About

Self-Consistency with CoT (Wang et al., ICLR 2023) — sample N reasoning chains, majority-vote the answer

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages