Ball-by-ball analytics across every major T20 competition, fronted by CAIS — a context-adjusted impact score that re-prices every delivery.
13 competitions. ~2.1 million legal deliveries. One lens that refuses to let a death-over six against an elite attack count the same as a middle-over single off the fifth bowler in a dead game.
Conventional batting and bowling tables flatten the thing that actually matters in T20: context. A strike rate of 140 is average in the powerplay and elite at the death. A 30-run spell is cheap in the 18th over and expensive in the middle. A four-for against Sri Lanka and Namibia at a World Cup is not a four-for against Sri Lanka and India.
Most public databases paper over this with career averages and raw economy rates. The T20 Analyst doesn't. Every legal delivery in the dataset is scored with context attached — phase of the innings, batter quality, bowler type, match pressure, tournament stage, opponent quality — and rolled up from there.
The result is a leaderboard where the order actually reflects the quality of what happened, not just the volume.
| Code | Competition |
|---|---|
psl |
Pakistan Super League |
ipl |
Indian Premier League |
bbl |
Big Bash League |
cpl |
Caribbean Premier League |
ntb |
Vitality Blast |
hnd |
The Hundred |
sa20 |
SA20 |
ilt |
International League T20 |
mlc |
Major League Cricket |
lpl |
Lanka Premier League |
bpl |
Bangladesh Premier League |
t20is |
Men's T20 Internationals (Full Members only) |
wc |
T20 World Cups |
The T20I view is filtered to the 12 ICC Full Members — Afghanistan, Australia, Bangladesh, England, India, Ireland, New Zealand, Pakistan, South Africa, Sri Lanka, West Indies, Zimbabwe — so leaderboards reflect first-tier opposition, not tier-6 warm-up fixtures.
- CAIS — batting and bowling leaderboards with the context-adjusted score, a bar chart of the top 20, a CAIS-vs-raw-SR scatter, season filters, and a team filter. The flagship view; everything else is supporting cast.
- Batting — runs, average, strike rate, and an average-vs-SR bubble plot.
- Bowling — wickets, economy, and a wickets-vs-economy scatter.
- Records — highest scores, century makers, team win-rates, dismissal breakdowns.
- Player profile modal — career stats, per-season trends, innings history, and the player's CAIS rank within the current competition. Opens from any name in any leaderboard.
- Flask dev server — dynamic, reads the parquet cache, recomputes on demand. Used locally.
- Static GitHub Pages — precomputed per-competition JSON under
static/data/, served as plain files. No backend, no pandas, sub-100ms switching. Used for the public deployment.
The same index.html handles both: it hits /api/... first when running under
Flask, falls back to static/data/<comp>/<file>.json otherwise.
Every legal delivery is re-priced before it lands on the scorecard. The weights
are stored in CricketAnalyser.py — search for _build_enriched — but the
gist is:
CAIS = Σ (runs × phase × pressure × stage × opponent) / balls × 100 × form
| factor | value |
|---|---|
| phase | 0.95 powerplay · 1.15 middle · 1.35 death |
| pressure | 1.0 → 1.5, combining wickets-lost and 2nd-innings required rate |
| stage | 1.00 group · 1.10 QF · 1.20 SF · 1.30 final (tournaments only) |
| opponent | WC only: 1.20 associate-vs-Test, 0.85 Test-vs-associate |
| form | rolling-5 innings ratio vs the field mean, clipped 1.0–1.45 |
The phase weight is flipped on purpose — strike rate in the powerplay is inflated by fielding restrictions, death bowlers aim at your toes. A 20-ball 30 at the death is genuinely harder than a 20-ball 30 in the first six.
wicket_value = 30 × (phase × role) × batter_tier × form × pressure
× partnership × early_wicket × stage × opponent
run_cost = runs_conceded × phase_bowl_weight × 0.5 × stage × opponent
CAIS = Σ (is_wicket × wicket_value − run_cost) / overs
| factor | value |
|---|---|
| phase × role | pace: 2.0 / 1.2 / 1.8 · spin: 1.5 / 1.5 / 1.2 (PP/mid/death) |
| batter_tier | derived per competition from runs + SR percentile |
| form | per-batter rolling form at the time of the wicket |
| pressure | same wicket + chase pressure as batting |
| partnership | 1.0 at 20-run stand, ramps to 1.6 at 70+ |
| early_wicket | 1.35 in first over, 1.15 in overs 2-3, 1.0 otherwise |
| stage | same group/QF/SF/final ladder as batting |
| opponent | WC only: 1.20 associate-vs-Test, 0.85 Test-vs-associate |
| phase_bowl_weight | 1.2 powerplay · 1.0 middle · 1.3 death (run cost only) |
A pace powerplay wicket that breaks a 60-run stand against an elite batter in a high-RRR chase can score 5–6× more than a tailender in a dead middle over. That's the whole point.
Cricsheet's CSV schema stores match_number as plain integers even for finals,
so the stage is inferred from match dates: within each (competition, season)
the latest match is the Final (1.30), the 1–3 matches within 7 days of it are
the Semis / Qualifiers (1.20), and the 4–5 matches within 10 days are
Quarters / Eliminators (1.10). Everything else is group stage (1.00).
Bilateral T20Is are excluded from the stage ladder — the last match of a calendar year is not a "final" in any meaningful sense.
Only applies inside the T20 World Cup, where the main draw mixes Full Members and qualified associates:
- Test batter vs associate bowling → 0.85 (less credit)
- Associate batter vs Test bowling → 1.20 (more credit)
- Test bowler vs associate batting → 0.85 (less credit)
- Associate bowler vs Test batting → 1.20 (more credit)
Everywhere else this stays at 1.00.
Inside the CAIS view on the World Cup competition, a Pool chip lets you further restrict the leaderboard to the 12 Test nations. Useful for apples-to-apples comparisons when you want the knockout-heavyweight view without Namibia or USA diluting the pool.
pip install flask pandas numpy pyarrow
python app.py # http://localhost:5000
First request loads data/all.parquet (~2.8s, or ~10s on first-ever run while
the parquet cache is built from CSV). Switching competitions after that is
sub-second thanks to per-competition memoisation inside CricketAnalyser.
open index.html # any modern browser
The page detects it's not running under Flask, reads from
static/data/<comp>/*.json, and works identically. This is how the GitHub
Pages deployment is served.
python precompute.py # every competition (~10 min)
python precompute.py ipl psl wc # just the named ones
precompute.py bakes a per-competition folder containing career and
per-season JSON for every view, plus a top-level competitions.json
manifest. Commit the output and it's live.
Gotcha: running
precompute.pywith specific codes will overwritestatic/data/competitions.jsonwith a manifest covering only those codes. If you want to rebuild one competition without losing the manifest, rungit checkout HEAD -- static/data/competitions.jsonafter.
pip install pandas
python ingest.py # every supported league
python ingest.py ipl bbl # just these
ingest.py downloads and unpacks Cricsheet's per-competition ball-by-ball
zips into data/<code>.csv, concatenates them into data/all.csv, derives
data/wc.csv from T20I events whose name contains "World Cup", and filters
T20Is to ICC Full Members.
The t20is season is overridden to the calendar year of the first match in
each fixture so that Cricsheet's 2025/26-style season labels don't
disappear behind a 2025 truncation.
.
├── index.html # Single-page UI (Plotly + vanilla JS). Dual-mode.
├── app.py # Flask backend — /api routes for dynamic mode.
├── CricketAnalyser.py # Analytics engine. Enrichment + CAIS v3.
├── ingest.py # Cricsheet zips → project CSV schema.
├── precompute.py # Pre-bakes every static/data/<comp>/*.json file.
├── CAIS_Methodology.docx # Longer methodology write-up.
├── data/ # (gitignored)
│ ├── raw/ # cached Cricsheet zips
│ ├── <code>.csv # per-competition ball-by-ball
│ ├── all.csv # concatenated ball-by-ball
│ └── all.parquet # 5× faster sidecar, auto-built on first load
└── static/
└── data/
├── competitions.json # top-level manifest
└── <comp>/
├── cais-batting.json # career CAIS
├── cais-batting-2024.json # per-season CAIS
├── cais-bowling.json
├── batting.json
├── strike-rates.json
├── bowling.json
├── teams.json
├── highest-scores.json
├── centuries.json
├── dismissals.json
├── over-heatmap.json
├── phase-stats.json
├── matchup.json
├── seasons.json
└── players.json # search list
The first version of this app across 2M rows was visibly sluggish — full CSV
parse on every request, full groupby scan on every league switch. The
current hot path:
- Parquet sidecar cache —
pandas.read_parquetis ~5× faster thanread_csvon this dataset. Built on first boot, reused every time. - Per-competition memoisation —
_build_enriched,_batter_tiers,_batter_form_scores, and_infer_bowler_roleseach cache per-comp output keyed off the competition code. Switching leagues doesn't retouch the full 2M-row frame; it slices ~70k–300k rows once and reuses everything after. - Static mode — every view reads a 10–200 KB JSON. No server, no pandas, no runtime cost. This is what GitHub Pages serves.
Measured on an M2 laptop after the parquet cache is warm:
| operation | time |
|---|---|
| app boot (parquet) | ~2.8s |
| first CAIS for PSL | ~0.35s |
| repeat CAIS for PSL | ~0.13s |
| first CAIS for IPL (new comp) | ~1.6s |
| repeat CAIS for IPL | ~0.14s |
| any static JSON fetch | <50ms |
- CAIS returns a
teamfield. Every aggregated row carries the player's most-frequent team (balls-weighted). This powers both the client-side team chip and the WC Pool filter without an extra round-trip. - Client-side re-ranking after any filter. When you narrow the board by team or pool, the visible list is re-numbered so it starts at #1. The raw CAIS is never recomputed, just re-sorted and re-ranked.
- Multi-season merge. Selecting multiple seasons in the CAIS view fetches each season's JSON and combines them per-player using an endpoint-specific merger (balls-weighted CAIS, additive runs/wickets, weighted SR/economy). A player who played three of the selected seasons appears once, aggregated, not three times.
match_idcollision guard. WC matches also appear inside the raw t20is frame under the samematch_id. The stage-multiplier map is keyed on(competition, match_id)so knockout bumps from the WC can't leak into a bilateral view.
All ball-by-ball data is from Cricsheet — the definitive open archive for international and T20 league cricket scorecards.
CAIS, the enrichment pipeline, the UI, and every line of analysis code are original to this project.
MIT — see LICENSE if present; otherwise treat as MIT with attribution.
Data belongs to Cricsheet under their
ODbL-style terms.