A scalable, open-source framework for designing balanced geographic experiments at marketing scale.
Geographic ("geo-lift") experiments remain a widely-used methodology for measuring the incremental return on ad spend (iROAS) of large advertising campaigns. Yet their design presents significant challenges: the number of markets is small, heterogeneity is large, and the exact Supergeo partitioning problem is NP-hard.
OSD addresses these challenges with a two-stage approach:
- PCA-based dimensionality reduction creates interpretable geo-embeddings and generates candidate supergeos via hierarchical clustering.
- Mixed-Integer Linear Programming (MILP) selects a treatment/control partition that balances both baseline outcomes and effect modifiers.
- Statistical Parity with Randomization: PCA-based clustering achieves statistical performance equivalent to unit-level randomization (1.07× RMSE ratio, p=0.68, n.s.)
- Operational Benefits: Enables coarser granularity for media buying without sacrificing statistical power
- Scalability: Completes in < 60 seconds on a single core for N=1,000 DMAs
- Excellent Balance: All standardized mean differences < 1%
Under mild community-structure assumptions, OSD's objective value is within
# Clone the repository
git clone https://github.com/shawcharles/osd.git
cd osd
# Create virtual environment and install dependencies
python3 -m venv venv && source venv/bin/activate
pip install -r requirements.txt
# Run the ablation study (generates CSV results)
python src/experiments/ablation_study.py
# Generate all publication plots
python scripts/generate_paper_plots.py
# Run scalability benchmark (optional)
python scripts/benchmark_scalability.py --output scalability_results.csvNote: The ablation study runs 50 Monte Carlo replications at N=40 and N=200, taking approximately 15-20 minutes on a modern laptop.
| Path | Purpose |
|---|---|
latex/ |
LaTeX source for the research paper |
src/osd/ |
Core Python implementation (PCA, clustering, MILP optimization) |
src/experiments/ |
Monte Carlo ablation studies and robustness tests |
scripts/ |
Plotting and visualization tools |
results/ |
Experimental outputs (CSVs) |
tests/ |
Unit tests for core algorithms and statistical methods |
All results in the paper can be reproduced from the codebase. Here's a step-by-step guide:
# Run 50 Monte Carlo replications comparing methods at N=40 and N=200
python src/experiments/ablation_study.pyOutputs:
ablation_results_small_n.csv— Aggregated metrics (RMSE, bias, SMD, bootstrap CIs)ablation_per_rep_small_n.csv— Per-replication data for all methodsablation_stats_small_n.csv— Statistical test results (Holm-Bonferroni adjusted p-values)
Runtime: ~15-20 minutes on a modern laptop
# Create all publication-ready plots
python scripts/generate_paper_plots.pyOutputs: (saved to latex/figures/)
ablation_comparison.pdf— RMSE and runtime comparison (Section 8.3)ablation_boxplot.pdf— Error distributions across replicationscovariate_balance.pdf— SMD balance metricspower_analysis.pdf— Statistical power curves (illustrative)scalability.pdf— Runtime vs N comparison
cd latex
pdflatex main.tex
bibtex main
pdflatex main.tex
pdflatex main.texOutput: main.pdf — Full research paper with all figures and tables
This codebase implements rigorous statistical methodology:
- Bootstrap Confidence Intervals: Percentile method with 1,000 resamples
- Multiple Testing Correction: Holm-Bonferroni for family-wise error rate control
- Effect Sizes: Cohen's d for practical significance
- True Random Baseline: Unit-level randomization for unbiased comparison
- Proper SMD Calculation: Pooled within-group std with Bessel's correction
Reproducibility Score: 9/10 (per comprehensive zero-trust review)
Run unit tests to verify core algorithms:
# Run all tests
pytest tests/ -v
# Run specific test modules
pytest tests/test_smd.py -v
pytest tests/test_statistical_methods.py -vPull requests are welcome! Please open an issue first to discuss major changes.
See CONTRIBUTING.md for detailed guidelines.
Apache 2.0 — see LICENSE for details.
If you use this codebase in academic work, please cite:
@article{Shaw2025OSD,
title = {Optimized Supergeo Design: A Scalable Framework for Geographic Marketing Experiments},
author = {Charles Shaw},
journal = {Under Review},
year = {2025}
}