Self-Consistency with Chain-of-Thought

A from-scratch Python implementation of "Self-Consistency Improves Chain of Thought Reasoning in Language Models" (Wang et al., ICLR 2023).

The Idea

Standard CoT prompting samples one reasoning chain at temperature 0.
Self-Consistency samples N chains at temperature 0.7, extracts the numeric answer from each, then majority-votes across them.
The intuition: correct reasoning paths all converge on the same answer, while errors tend to be diverse and cancel out.

question ──► [prompt + few-shot CoT]
                │
                ├──► sample 1 ──► parse ──► 8
                ├──► sample 2 ──► parse ──► 8
                ├──► sample 3 ──► parse ──► 9  ← outlier
                ├──► ...
                └──► sample N ──► parse ──► 8

majority_vote([8, 8, 9, 8, ...]) ──► answer = 8  ✓

Wang et al. report +17.9 pp accuracy on GSM8K (67.9 % → 85.7 %) over standard CoT using PaLM 540B, and consistent gains across 9 benchmarks without any training or extra supervision.

Features

Zero external dependencies — pure Python 3.11 stdlib
OpenRouter back-end — drop in any model via OPENROUTER_API_KEY
Parallel sampling — N calls via ThreadPoolExecutor (respects rate limits)
Robust answer parsing — handles $8, 8,000, 3/4, 50%, decimal, signed
GSM8K benchmark mode — compare SC vs CoT vs direct I/O side-by-side
29 offline unit tests — no LLM calls needed for the test suite

Installation

git clone https://github.com/MONISMALIK1/self_consistency
cd self_consistency
pip install -e .
export OPENROUTER_API_KEY="sk-or-..."   # https://openrouter.ai/keys

Quick Start

# Solve a single problem (N=8 samples, majority vote)
python -m self_consistency "Janet's ducks lay 16 eggs per day. She eats 3 for breakfast and bakes 4 into muffins. She sells the remainder for $2 each. How much does she make daily?"

# Plain CoT baseline (N=1)
python -m self_consistency "..." --n 1

# Direct I/O baseline (no reasoning)
python -m self_consistency "..." --io

# Download the GSM8K test set (no API calls)
python -m self_consistency --download

# Benchmark: SC vs CoT on 25 problems
python -m self_consistency --bench --num 25 --methods sc,cot

# All three methods on 10 problems
python -m self_consistency --bench --num 10 --methods sc,cot,io

Python API

from self_consistency import solve, is_correct, load_test

sol = solve("Janet's ducks lay 16 eggs...", n=8)
print(sol.answer)        # Fraction(18)
print(sol.votes)         # 6   (6 of 8 chains agreed)
print(sol.distribution)  # [('18', 6), ('16', 1), ('20', 1)]

# Benchmark
for p in load_test(n=20):
    sol = solve(p["question"], n=8)
    print(is_correct(sol.answer, p["gold"]))

Project Structure

self_consistency/
├── llm.py        OpenRouter HTTP wrapper + parallel n-sampling
├── prompts.py    8-shot GSM8K CoT prompt (Wei et al. 2022)
├── gsm8k.py      Dataset loader + Fraction-based answer parser
├── core.py       majority_vote() + solve() → Solution
├── __init__.py   Public API
├── __main__.py   CLI
└── tests/
    ├── test_core.py    Voting logic + mocked solve()
    └── test_gsm8k.py   Parser edge cases + dataset loader

Configuration

Env var	Default	Purpose
`OPENROUTER_API_KEY`	(required)	OpenRouter API key
`SC_MODEL`	`openai/gpt-oss-120b:free`	Model slug
`SC_CONCURRENCY`	`4`	Parallel threads for sample_n
`SC_DATA_DIR`	`~/.cache/self_consistency`	GSM8K cache location

Running Tests

No API key required:

python -m unittest discover -s self_consistency/tests -t . -v

Citation

@inproceedings{wang2023selfconsistency,
  title     = {Self-Consistency Improves Chain of Thought Reasoning in Language Models},
  author    = {Xuezhi Wang and Jason Wei and Dale Schuurmans and Quoc Le and
               Ed Chi and Sharan Narang and Aakanksha Chowdhery and Denny Zhou},
  booktitle = {ICLR},
  year      = {2023},
  url       = {https://arxiv.org/abs/2203.11171}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.github/workflows		.github/workflows
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
__main__.py		__main__.py
core.py		core.py
gsm8k.py		gsm8k.py
llm.py		llm.py
prompts.py		prompts.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Self-Consistency with Chain-of-Thought

The Idea

Features

Installation

Quick Start

Python API

Project Structure

Configuration

Running Tests

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Self-Consistency with Chain-of-Thought

The Idea

Features

Installation

Quick Start

Python API

Project Structure

Configuration

Running Tests

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages