Skip to content

sandwidinarcisse/GPR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🌍 Geopolitical Risk (GPR) Daily Pipeline

An end-to-end Python pipeline that downloads, cleans, enriches, analyses, and visualises the Caldara & Iacoviello Geopolitical Risk Index — the most widely cited daily measure of geopolitical risk in academic economics and finance.


Table of Contents

  1. Background & Data Source
  2. Repository Structure
  3. Pipeline Overview
  4. Key Variables
  5. Feature Engineering
  6. Exploratory Analysis & Key Findings
  7. Visualisations
  8. Installation & Usage
  9. Output Files
  10. Citation

1. Background & Data Source

The Geopolitical Risk (GPR) Index was developed by Dario Caldara and Matteo Iacoviello and published in the American Economic Review (2022). It measures adverse geopolitical events and associated risks by counting newspaper articles related to geopolitical tensions in 10 major newspapers.

Property Detail
Source https://www.matteoiacoviello.com/gpr.htm
Frequency Daily
Coverage 1 January 1985 → present
Baseline 1985–2019 average = 100
Reference Caldara, D. & Iacoviello, M. (2022). Measuring Geopolitical Risk. American Economic Review, 112(4), 1194–1225.

Sub-indices

The headline GPRD index is decomposed into two sub-components:

  • GPRD_ACT — Geopolitical Acts: articles covering actual adverse geopolitical events (wars, terrorist attacks, military conflicts).
  • GPRD_THREAT — Geopolitical Threats: articles covering threatened or feared geopolitical events (war threats, diplomatic crises, sanctions rhetoric).

This decomposition is analytically valuable: threat spikes tend to be sharper and more transient, while act-driven spikes correspond to sustained periods of elevated risk.


2. Repository Structure

GeoPolitical Risk/
│
├── GPR_Pipeline.ipynb       ← Main pipeline notebook
├── requirements.txt         ← Python dependencies
├── .gitignore               ← Git exclusions
├── README.md                ← This file
│
└── output/                  ← Generated on first run (git-ignored content)
    ├── gpr_daily_clean.csv      ← Enriched dataset (CSV)
    ├── gpr_daily_clean.parquet  ← Enriched dataset (Parquet)
    ├── plot_01_full_timeseries.png
    ├── plot_02_decomposition.png
    ├── plot_03_distribution.png
    ├── plot_04_annual_bar.png
    ├── plot_05_heatmap.png
    ├── plot_06_volatility.png
    ├── plot_07_correlation.png
    ├── plot_08_acts_threats.png
    ├── plot_09_events_spotlight.png
    ├── plot_10_regime.png
    └── plot_11_articles_gpr.png

The output/ directory is tracked by Git (via .gitkeep) but its contents are listed in .gitignore to avoid committing large binary and data files.


3. Pipeline Overview

The notebook GPR_Pipeline.ipynb runs a full end-to-end pipeline in 7 steps:

Step 0 — Setup & Imports

Sets up all dependencies, global plot aesthetics (seaborn whitegrid theme, 130 dpi), a shared colour palette, and creates the output/ directory if it does not exist.

Step 1 — Data Download

The GPRFiles class provides a clean interface to the official data files. Two frequencies are supported:

gpr = GPRFiles()
raw = gpr.download('daily')    # 15,000+ daily rows
raw = gpr.download('monthly')  # monthly series

The daily Excel file (data_gpr_daily_recent.xls) is fetched directly from matteoiacoviello.com on every run, ensuring the data is always up to date.

Step 2 — Data Cleaning

The raw file embeds a variable-label lookup table in the last few rows. The cleaning step:

  • Selects only the nine core columns defined in VAR_LABELS
  • Drops metadata rows (identified by null values in the date column)
  • Parses dates to datetime, casts numeric columns, and converts event to string
  • Sets date as the DataFrame index and sorts chronologically

After cleaning: 15,099 daily observations spanning 1985-01-01 → 2026-05-04 (zero missing values in core series).

Step 3 — Feature Engineering

Eleven new features are derived from the clean series:

Feature group Variables created
Calendar year, month, month_name, dow, week, decade, decade_lbl
Rolling averages MA7_calc, MA30_calc, MA90, MA365
Volatility GPRD_roll_std30, GPRD_roll_std90
Change GPRD_pct_chg, GPRD_diff
Normalisation GPRD_z (z-score), GPRD_rel (re-indexed to baseline = 100)
Regime regime — three-level categorical: Normal / Elevated / Crisis
Spread act_threat_spread = GPRD_ACT − GPRD_THREAT

Regime thresholds are defined by the empirical distribution:

  • Normal: GPRD < 75th percentile (~123.7)
  • Elevated: 75th ≤ GPRD < 90th percentile (~164.2)
  • Crisis: GPRD ≥ 90th percentile

Step 4 — Exploratory Analysis

Five analytical tables are produced:

  1. Descriptive statistics — mean, std, min/max, skewness, kurtosis, median for GPRD, GPRD_ACT, GPRD_THREAT, N10D
  2. Annual summary — yearly mean/max/std, sub-index averages, article counts, crisis-day counts
  3. Pearson correlation matrix — between GPRD, ACT, THREAT, N10D, and 30-day rolling volatility
  4. Top-10 spike days — the ten highest single-day GPRD readings in the history
  5. Monthly seasonality — average GPRD by calendar month
  6. Statistical tests — Shapiro-Wilk normality test + Augmented Dickey-Fuller stationarity test

Step 5 — Visualisations

Eleven publication-quality figures are generated and saved to output/. See Section 7 for details.

Step 6 — Export

The enriched dataset is written to output/ as:

  • gpr_daily_clean.csv — human-readable, compatible with Excel and R
  • gpr_daily_clean.parquet — compressed columnar format, optimal for large-scale analysis

4. Key Variables

Column Type Description
date DatetimeIndex Calendar date
GPRD float Daily GPR Index (baseline 1985–2019 = 100)
GPRD_ACT float GPR Acts sub-index
GPRD_THREAT float GPR Threats sub-index
GPRD_MA7 float Official 7-day moving average
GPRD_MA30 float Official 30-day moving average
N10D int Number of relevant articles in 10 newspapers
event str Label for named major events (sparse)
MA90 float Computed 90-day moving average
MA365 float Computed 365-day moving average
GPRD_pct_chg float Day-over-day percentage change
GPRD_roll_std30 float 30-day rolling standard deviation
GPRD_roll_std90 float 90-day rolling standard deviation
GPRD_z float Z-score of GPRD
regime category Normal / Elevated / Crisis
act_threat_spread float GPRD_ACT minus GPRD_THREAT

5. Feature Engineering

Regime Classification

The three-tier regime classification is based on the full-sample empirical distribution of GPRD:

Normal    (< p75):  ~75% of all trading days
Elevated  (p75–p90): ~15% of all trading days
Crisis    (> p90):  ~10% of all trading days

Recent years show a structural shift: the share of Crisis days has risen sharply since 2022, with 2024 and 2025 each recording over 90 crisis days — a level previously seen only around 9/11 and the 2003 Iraq War.

Acts–Threats Spread

The spread GPRD_ACT − GPRD_THREAT is a useful leading indicator:

  • A positive spread (Acts > Threats) signals that the market has moved from fear to actual conflict, typically corresponding to peaks in the overall index.
  • A negative spread (Threats > Acts) is characteristic of periods of prolonged diplomatic tension, cold-war style standoffs, and pre-conflict phases.

6. Exploratory Analysis & Key Findings

Descriptive Statistics (full sample, 1985–2026)

Statistic GPRD GPRD_ACT GPRD_THREAT
Mean 103.5 101.5 106.4
Median 91.6 83.0 92.9
Std Dev 61.3 90.3 64.6
Max 1045.6 1627.4 809.5
Skewness 3.80 5.74 2.41
Kurtosis 29.5 59.1 11.8

The series is heavily right-skewed with fat tails, reflecting the rarity but extreme magnitude of geopolitical shock events.

Stationarity & Normality

  • Shapiro-Wilk test decisively rejects normality (p ≈ 1.6 × 10⁻⁶⁸), consistent with fat-tailed financial time series.
  • Augmented Dickey-Fuller test rejects the unit-root null hypothesis (ADF = −9.70, p ≈ 1.1 × 10⁻¹⁶), confirming the series is stationary in levels.

Top Geopolitical Shocks (by GPRD)

The ten highest single-day readings all fall in September–October 2001 in the aftermath of the 9/11 attacks, with the peak reading of 1045.6 on 25 September 2001. This was 10× the long-run baseline and remains the single largest shock in the dataset.

Other notable single-event peaks:

  • Gulf War – Operation Desert Storm (Jan 1991): 572.3
  • Russia / Ukraine (Feb 2022): 515.9
  • U.S. Invades Afghanistan (Oct 2001): 819.0
  • Beginning of the Iraq War (Mar 2003): 595.0

Secular Trend

The annual mean GPRD has been above the 1985–2019 baseline of 100 in most years since 2016, with a marked structural break in 2022 (annual mean: 153.0) driven by the Russia–Ukraine war. Years 2023–2026 remain elevated, suggesting a new higher-risk regime.

Monthly Seasonality

March shows the highest average GPRD (112.6), while May–July and November show the lowest readings (97–100). This seasonal pattern partially reflects the historical clustering of military and diplomatic events in Q1.

Correlation Structure

Pair Pearson r
GPRD ↔ GPRD_ACT 0.87
GPRD ↔ GPRD_THREAT 0.84
GPRD_ACT ↔ GPRD_THREAT 0.48
GPRD ↔ GPRD_roll_std30 0.72
GPRD ↔ N10D −0.05

The low GPRD–N10D correlation confirms that it is the nature of coverage (geopolitical content), not the total volume of news, that drives the index. The moderate ACT–THREAT correlation (0.48) indicates meaningful divergence between the two sub-components over time.


7. Visualisations

All 11 plots are saved to output/ at 130 dpi.

File Description
plot_01_full_timeseries.png Full daily GPRD time-series (1985–present) with MA-7, MA-30, MA-365 overlays and annotated major events
plot_02_decomposition.png Three-panel stacked decomposition: Overall · Acts · Threats with 90-day MA
plot_03_distribution.png Histogram + KDE, Q-Q plot vs. normal, and boxplot by decade
plot_04_annual_bar.png Annual mean GPRD bar chart colour-coded by regime (Normal / Elevated / Crisis)
plot_05_heatmap.png Month × Year heatmap of mean daily GPRD
plot_06_volatility.png Dual-panel: GPRD level vs. 30 and 90-day rolling standard deviation
plot_07_correlation.png Lower-triangular Pearson correlation heatmap
plot_08_acts_threats.png Scatter of Acts vs. Threats (coloured by year) + Acts–Threats spread time-series
plot_09_events_spotlight.png Individual ±90-day windows around each named major event
plot_10_regime.png Stacked bar chart showing the annual share of Normal / Elevated / Crisis days
plot_11_articles_gpr.png Dual-axis: mean daily article count (bars) vs. mean daily GPRD (line)

8. Installation & Usage

Prerequisites

  • Python 3.10 or higher
  • pip package manager
  • Internet access (to download data from matteoiacoviello.com)

Setup

# 1. Clone the repository
git clone https://github.com/<your-username>/GeoPolitical-Risk.git
cd GeoPolitical-Risk

# 2. (Recommended) Create and activate a virtual environment
python -m venv .venv
source .venv/bin/activate        # macOS / Linux
.venv\Scripts\activate           # Windows

# 3. Install dependencies
pip install -r requirements.txt

# 4. Launch Jupyter
jupyter notebook GPR_Pipeline.ipynb

Running the pipeline

Open GPR_Pipeline.ipynb in Jupyter and run all cells (Kernel → Restart & Run All).
The pipeline will:

  1. Download the latest daily data directly from the official source
  2. Create the output/ directory if needed
  3. Generate all 11 plots and save them to output/
  4. Export the enriched dataset to output/gpr_daily_clean.csv and output/gpr_daily_clean.parquet

Each run fetches fresh data, so the pipeline is fully reproducible and always up to date.


9. Output Files

After running the notebook, the output/ directory will contain:

File Format Description
gpr_daily_clean.csv CSV Enriched daily dataset (25 columns, ~15 000 rows)
gpr_daily_clean.parquet Parquet Same dataset in compressed columnar format
plot_01_full_timeseries.png PNG 130 dpi Full time-series chart
plot_02_decomposition.png PNG 130 dpi Acts / Threats decomposition
plot_03_distribution.png PNG 130 dpi Distribution analysis
plot_04_annual_bar.png PNG 130 dpi Annual bar chart
plot_05_heatmap.png PNG 130 dpi Month × Year heatmap
plot_06_volatility.png PNG 130 dpi Rolling volatility
plot_07_correlation.png PNG 130 dpi Correlation heatmap
plot_08_acts_threats.png PNG 130 dpi Acts vs Threats
plot_09_events_spotlight.png PNG 130 dpi Event spotlights
plot_10_regime.png PNG 130 dpi Regime distribution
plot_11_articles_gpr.png PNG 130 dpi Articles vs GPR

10. Citation

If you use this pipeline or the underlying data in your work, please cite the original paper:

Caldara, D. & Iacoviello, M. (2022). Measuring Geopolitical Risk.
American Economic Review, 112(4), 1194–1225.
https://doi.org/10.1257/aer.20191823


Pipeline authored by Narcisse. Data © Caldara & Iacoviello — see the official data page for terms of use.

About

Geopolitical Risk (GPR)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors