An end-to-end Python pipeline that downloads, cleans, enriches, analyses, and visualises the Caldara & Iacoviello Geopolitical Risk Index — the most widely cited daily measure of geopolitical risk in academic economics and finance.
- Background & Data Source
- Repository Structure
- Pipeline Overview
- Key Variables
- Feature Engineering
- Exploratory Analysis & Key Findings
- Visualisations
- Installation & Usage
- Output Files
- Citation
The Geopolitical Risk (GPR) Index was developed by Dario Caldara and Matteo Iacoviello and published in the American Economic Review (2022). It measures adverse geopolitical events and associated risks by counting newspaper articles related to geopolitical tensions in 10 major newspapers.
| Property | Detail |
|---|---|
| Source | https://www.matteoiacoviello.com/gpr.htm |
| Frequency | Daily |
| Coverage | 1 January 1985 → present |
| Baseline | 1985–2019 average = 100 |
| Reference | Caldara, D. & Iacoviello, M. (2022). Measuring Geopolitical Risk. American Economic Review, 112(4), 1194–1225. |
The headline GPRD index is decomposed into two sub-components:
- GPRD_ACT — Geopolitical Acts: articles covering actual adverse geopolitical events (wars, terrorist attacks, military conflicts).
- GPRD_THREAT — Geopolitical Threats: articles covering threatened or feared geopolitical events (war threats, diplomatic crises, sanctions rhetoric).
This decomposition is analytically valuable: threat spikes tend to be sharper and more transient, while act-driven spikes correspond to sustained periods of elevated risk.
GeoPolitical Risk/
│
├── GPR_Pipeline.ipynb ← Main pipeline notebook
├── requirements.txt ← Python dependencies
├── .gitignore ← Git exclusions
├── README.md ← This file
│
└── output/ ← Generated on first run (git-ignored content)
├── gpr_daily_clean.csv ← Enriched dataset (CSV)
├── gpr_daily_clean.parquet ← Enriched dataset (Parquet)
├── plot_01_full_timeseries.png
├── plot_02_decomposition.png
├── plot_03_distribution.png
├── plot_04_annual_bar.png
├── plot_05_heatmap.png
├── plot_06_volatility.png
├── plot_07_correlation.png
├── plot_08_acts_threats.png
├── plot_09_events_spotlight.png
├── plot_10_regime.png
└── plot_11_articles_gpr.png
The
output/directory is tracked by Git (via.gitkeep) but its contents are listed in.gitignoreto avoid committing large binary and data files.
The notebook GPR_Pipeline.ipynb runs a full end-to-end pipeline in 7 steps:
Sets up all dependencies, global plot aesthetics (seaborn whitegrid theme, 130 dpi), a shared colour palette, and creates the output/ directory if it does not exist.
The GPRFiles class provides a clean interface to the official data files. Two frequencies are supported:
gpr = GPRFiles()
raw = gpr.download('daily') # 15,000+ daily rows
raw = gpr.download('monthly') # monthly seriesThe daily Excel file (data_gpr_daily_recent.xls) is fetched directly from matteoiacoviello.com on every run, ensuring the data is always up to date.
The raw file embeds a variable-label lookup table in the last few rows. The cleaning step:
- Selects only the nine core columns defined in
VAR_LABELS - Drops metadata rows (identified by null values in the
datecolumn) - Parses dates to
datetime, casts numeric columns, and convertseventto string - Sets
dateas the DataFrame index and sorts chronologically
After cleaning: 15,099 daily observations spanning 1985-01-01 → 2026-05-04 (zero missing values in core series).
Eleven new features are derived from the clean series:
| Feature group | Variables created |
|---|---|
| Calendar | year, month, month_name, dow, week, decade, decade_lbl |
| Rolling averages | MA7_calc, MA30_calc, MA90, MA365 |
| Volatility | GPRD_roll_std30, GPRD_roll_std90 |
| Change | GPRD_pct_chg, GPRD_diff |
| Normalisation | GPRD_z (z-score), GPRD_rel (re-indexed to baseline = 100) |
| Regime | regime — three-level categorical: Normal / Elevated / Crisis |
| Spread | act_threat_spread = GPRD_ACT − GPRD_THREAT |
Regime thresholds are defined by the empirical distribution:
- Normal: GPRD < 75th percentile (~123.7)
- Elevated: 75th ≤ GPRD < 90th percentile (~164.2)
- Crisis: GPRD ≥ 90th percentile
Five analytical tables are produced:
- Descriptive statistics — mean, std, min/max, skewness, kurtosis, median for GPRD, GPRD_ACT, GPRD_THREAT, N10D
- Annual summary — yearly mean/max/std, sub-index averages, article counts, crisis-day counts
- Pearson correlation matrix — between GPRD, ACT, THREAT, N10D, and 30-day rolling volatility
- Top-10 spike days — the ten highest single-day GPRD readings in the history
- Monthly seasonality — average GPRD by calendar month
- Statistical tests — Shapiro-Wilk normality test + Augmented Dickey-Fuller stationarity test
Eleven publication-quality figures are generated and saved to output/. See Section 7 for details.
The enriched dataset is written to output/ as:
gpr_daily_clean.csv— human-readable, compatible with Excel and Rgpr_daily_clean.parquet— compressed columnar format, optimal for large-scale analysis
| Column | Type | Description |
|---|---|---|
date |
DatetimeIndex | Calendar date |
GPRD |
float | Daily GPR Index (baseline 1985–2019 = 100) |
GPRD_ACT |
float | GPR Acts sub-index |
GPRD_THREAT |
float | GPR Threats sub-index |
GPRD_MA7 |
float | Official 7-day moving average |
GPRD_MA30 |
float | Official 30-day moving average |
N10D |
int | Number of relevant articles in 10 newspapers |
event |
str | Label for named major events (sparse) |
MA90 |
float | Computed 90-day moving average |
MA365 |
float | Computed 365-day moving average |
GPRD_pct_chg |
float | Day-over-day percentage change |
GPRD_roll_std30 |
float | 30-day rolling standard deviation |
GPRD_roll_std90 |
float | 90-day rolling standard deviation |
GPRD_z |
float | Z-score of GPRD |
regime |
category | Normal / Elevated / Crisis |
act_threat_spread |
float | GPRD_ACT minus GPRD_THREAT |
The three-tier regime classification is based on the full-sample empirical distribution of GPRD:
Normal (< p75): ~75% of all trading days
Elevated (p75–p90): ~15% of all trading days
Crisis (> p90): ~10% of all trading days
Recent years show a structural shift: the share of Crisis days has risen sharply since 2022, with 2024 and 2025 each recording over 90 crisis days — a level previously seen only around 9/11 and the 2003 Iraq War.
The spread GPRD_ACT − GPRD_THREAT is a useful leading indicator:
- A positive spread (Acts > Threats) signals that the market has moved from fear to actual conflict, typically corresponding to peaks in the overall index.
- A negative spread (Threats > Acts) is characteristic of periods of prolonged diplomatic tension, cold-war style standoffs, and pre-conflict phases.
| Statistic | GPRD | GPRD_ACT | GPRD_THREAT |
|---|---|---|---|
| Mean | 103.5 | 101.5 | 106.4 |
| Median | 91.6 | 83.0 | 92.9 |
| Std Dev | 61.3 | 90.3 | 64.6 |
| Max | 1045.6 | 1627.4 | 809.5 |
| Skewness | 3.80 | 5.74 | 2.41 |
| Kurtosis | 29.5 | 59.1 | 11.8 |
The series is heavily right-skewed with fat tails, reflecting the rarity but extreme magnitude of geopolitical shock events.
- Shapiro-Wilk test decisively rejects normality (p ≈ 1.6 × 10⁻⁶⁸), consistent with fat-tailed financial time series.
- Augmented Dickey-Fuller test rejects the unit-root null hypothesis (ADF = −9.70, p ≈ 1.1 × 10⁻¹⁶), confirming the series is stationary in levels.
The ten highest single-day readings all fall in September–October 2001 in the aftermath of the 9/11 attacks, with the peak reading of 1045.6 on 25 September 2001. This was 10× the long-run baseline and remains the single largest shock in the dataset.
Other notable single-event peaks:
- Gulf War – Operation Desert Storm (Jan 1991): 572.3
- Russia / Ukraine (Feb 2022): 515.9
- U.S. Invades Afghanistan (Oct 2001): 819.0
- Beginning of the Iraq War (Mar 2003): 595.0
The annual mean GPRD has been above the 1985–2019 baseline of 100 in most years since 2016, with a marked structural break in 2022 (annual mean: 153.0) driven by the Russia–Ukraine war. Years 2023–2026 remain elevated, suggesting a new higher-risk regime.
March shows the highest average GPRD (112.6), while May–July and November show the lowest readings (97–100). This seasonal pattern partially reflects the historical clustering of military and diplomatic events in Q1.
| Pair | Pearson r |
|---|---|
| GPRD ↔ GPRD_ACT | 0.87 |
| GPRD ↔ GPRD_THREAT | 0.84 |
| GPRD_ACT ↔ GPRD_THREAT | 0.48 |
| GPRD ↔ GPRD_roll_std30 | 0.72 |
| GPRD ↔ N10D | −0.05 |
The low GPRD–N10D correlation confirms that it is the nature of coverage (geopolitical content), not the total volume of news, that drives the index. The moderate ACT–THREAT correlation (0.48) indicates meaningful divergence between the two sub-components over time.
All 11 plots are saved to output/ at 130 dpi.
| File | Description |
|---|---|
plot_01_full_timeseries.png |
Full daily GPRD time-series (1985–present) with MA-7, MA-30, MA-365 overlays and annotated major events |
plot_02_decomposition.png |
Three-panel stacked decomposition: Overall · Acts · Threats with 90-day MA |
plot_03_distribution.png |
Histogram + KDE, Q-Q plot vs. normal, and boxplot by decade |
plot_04_annual_bar.png |
Annual mean GPRD bar chart colour-coded by regime (Normal / Elevated / Crisis) |
plot_05_heatmap.png |
Month × Year heatmap of mean daily GPRD |
plot_06_volatility.png |
Dual-panel: GPRD level vs. 30 and 90-day rolling standard deviation |
plot_07_correlation.png |
Lower-triangular Pearson correlation heatmap |
plot_08_acts_threats.png |
Scatter of Acts vs. Threats (coloured by year) + Acts–Threats spread time-series |
plot_09_events_spotlight.png |
Individual ±90-day windows around each named major event |
plot_10_regime.png |
Stacked bar chart showing the annual share of Normal / Elevated / Crisis days |
plot_11_articles_gpr.png |
Dual-axis: mean daily article count (bars) vs. mean daily GPRD (line) |
- Python 3.10 or higher
pippackage manager- Internet access (to download data from matteoiacoviello.com)
# 1. Clone the repository
git clone https://github.com/<your-username>/GeoPolitical-Risk.git
cd GeoPolitical-Risk
# 2. (Recommended) Create and activate a virtual environment
python -m venv .venv
source .venv/bin/activate # macOS / Linux
.venv\Scripts\activate # Windows
# 3. Install dependencies
pip install -r requirements.txt
# 4. Launch Jupyter
jupyter notebook GPR_Pipeline.ipynbOpen GPR_Pipeline.ipynb in Jupyter and run all cells (Kernel → Restart & Run All).
The pipeline will:
- Download the latest daily data directly from the official source
- Create the
output/directory if needed - Generate all 11 plots and save them to
output/ - Export the enriched dataset to
output/gpr_daily_clean.csvandoutput/gpr_daily_clean.parquet
Each run fetches fresh data, so the pipeline is fully reproducible and always up to date.
After running the notebook, the output/ directory will contain:
| File | Format | Description |
|---|---|---|
gpr_daily_clean.csv |
CSV | Enriched daily dataset (25 columns, ~15 000 rows) |
gpr_daily_clean.parquet |
Parquet | Same dataset in compressed columnar format |
plot_01_full_timeseries.png |
PNG 130 dpi | Full time-series chart |
plot_02_decomposition.png |
PNG 130 dpi | Acts / Threats decomposition |
plot_03_distribution.png |
PNG 130 dpi | Distribution analysis |
plot_04_annual_bar.png |
PNG 130 dpi | Annual bar chart |
plot_05_heatmap.png |
PNG 130 dpi | Month × Year heatmap |
plot_06_volatility.png |
PNG 130 dpi | Rolling volatility |
plot_07_correlation.png |
PNG 130 dpi | Correlation heatmap |
plot_08_acts_threats.png |
PNG 130 dpi | Acts vs Threats |
plot_09_events_spotlight.png |
PNG 130 dpi | Event spotlights |
plot_10_regime.png |
PNG 130 dpi | Regime distribution |
plot_11_articles_gpr.png |
PNG 130 dpi | Articles vs GPR |
If you use this pipeline or the underlying data in your work, please cite the original paper:
Caldara, D. & Iacoviello, M. (2022). Measuring Geopolitical Risk.
American Economic Review, 112(4), 1194–1225.
https://doi.org/10.1257/aer.20191823
Pipeline authored by Narcisse. Data © Caldara & Iacoviello — see the official data page for terms of use.