CrashScope

A forecasting and decision-support app built on 776,000+ Chicago traffic crashes (2016-2023). Pick any historical date, forecast the next N days, and compare predictions against what actually happened.

Architecture

Traffic_Crashes.csv
        │
        ▼
  ┌─────────────┐     ┌──────────────┐     ┌──────────────┐
  │   Ingest     │────▶│   Features   │────▶│   H3 Index   │
  │  (clean,     │     │ (wet weather,│     │ (lat/lon →   │
  │   filter)    │     │  time, speed)│     │  hex cells)  │
  └─────────────┘     └──────────────┘     └──────────────┘
                                                   │
                              ┌─────────────────────┤
                              ▼                     ▼
                    ┌──────────────┐      ┌──────────────┐
                    │  City Panel  │      │  Cell Panel   │
                    │ (1 row/day,  │      │ (1 row/day/  │
                    │  city totals)│      │  H3 cell)    │
                    └──────┬───────┘      └──────┬───────┘
                           │                     │
                           ▼                     ▼
                    ┌──────────────┐      ┌──────────────┐
                    │  City Model  │      │  Cell Model  │
                    │  (LightGBM)  │      │  (LightGBM)  │
                    └──────┬───────┘      └──────┬───────┘
                           │                     │
                           ▼                     ▼
                    ┌────────────────────────────────────┐
                    │          FastAPI Service            │
                    │  /forecast/city  /hotspot/{cell}   │
                    │  /hotspots/top   /health           │
                    └──────────────┬─────────────────────┘
                                  │
                                  ▼
                    ┌────────────────────────────────────┐
                    │        React Frontend              │
                    │  deck.gl H3 map + Recharts         │
                    │  date picker + hotspot drill-down  │
                    └────────────────────────────────────┘

Data Flow Trace

A typical request: user picks as_of_date=2022-06-15, horizon=7

Frontend calls GET /forecast/city?as_of_date=2022-06-15&horizon=7
API slices city panel to rows <= 2022-06-15
City model takes the last row's features (lag_1, lag_7, rolling_7_mean, day_of_week, etc.)
Recursive predict: for each of 7 days, advance calendar features, update lags from prior predictions, predict next day's crash count
Actuals lookup: API fetches real crash counts from 2022-06-16 through 2022-06-22
Response: [{date, predicted_value, actual_value}, ...]
Chart: renders predicted (red) vs actual (green dashed) lines

Setup

cd .worktrees/crashscope   # or wherever the project lives

# 1. Install dependencies
make install                # creates venv + installs deps
cd frontend && npm install && cd ..

# 2. Symlink the data (if needed)
ln -s /path/to/Traffic_Crashes.csv Traffic_Crashes.csv

# 3. Build data + train
make data                   # ~1 min — creates data/*.parquet
make train                  # ~30s — trains cell + city LightGBM models

# 4. Run
make serve                  # API on http://localhost:8000
cd frontend && npm run dev  # Frontend on http://localhost:5173

API Endpoints

Endpoint	Description
`GET /forecast/city?horizon=7&as_of_date=2022-06-15`	City-wide daily forecast with actuals
`GET /hotspot/{h3_cell}?horizon=7&as_of_date=2022-06-15`	Per-cell forecast with actuals
`GET /hotspots/top?n=20`	Top N cells by total crash count
`GET /health`	Health check

Models

Two LightGBM regressors, both trained on 2016-2022 data (2023 held out for hindcast evaluation):

Model	Target	Training rows	Features
City-level (`lgbm_city_v1.txt`)	Daily city-wide crash count	~2,500	Calendar (day_of_week, month, is_weekend, day_of_year) + lag (1,7,14,28 day) + rolling (7,14,28 day mean/sum)
Cell-level (`lgbm_cell_v1.txt`)	Daily per-H3-cell crash count	~2.2M	Same feature set, per cell

Recursive forecasting: Each step advances calendar features and feeds the prediction back into lag/rolling features for the next step.

Tests

make test    # 49 tests, ~3s

Module	Tests	What's covered
`test_ingest.py`	5	CSV loading, year filtering, null handling
`test_features.py`	4	Binary flags, time periods, speed categories
`test_h3_index.py`	4	H3 cell assignment, bounds checking, null coords
`test_panel.py`	8	Zero-filling, lags, rolling features, city panel aggregation
`test_lgbm.py`	6	Fit, predict shape, non-negative, save/load, horizon variation
`test_naive.py`	3	Seasonal naive, moving average baselines
`test_evaluate.py`	5	MAE, RMSE, WAPE, rolling backtest
`test_api.py`	9	All endpoints, as_of_date, 404s, actuals
`test_schemas.py`	3	Pydantic serialization
`test_settings.py`	2	Config defaults

Project Structure

src/
  ingest.py          # CSV loading + cleaning
  features.py        # Feature engineering (flags, categories)
  h3_index.py        # H3 spatial indexing
  panel.py           # Daily panel builder (cell + city level)
  pipeline.py        # End-to-end orchestrator
  models/
    naive.py         # Baseline models (seasonal naive, moving average)
    lgbm.py          # LightGBM forecaster with recursive predict
    evaluate.py      # MAE, RMSE, WAPE, rolling backtest
  api/
    app.py           # FastAPI factory with lifespan loader
    routes.py        # Endpoint handlers
    schemas.py       # Pydantic request/response models
    deps.py          # Dependency injection
config/
  settings.py        # Pydantic settings (paths, params, API config)
frontend/
  src/
    components/      # Map, Filters, ForecastChart, HotspotPanel
    api/client.ts    # Typed fetch wrapper
    types/api.ts     # TypeScript interfaces
tests/               # 49 tests across 10 modules

Data

Chicago Traffic Crashes — 785,000+ crash records from the City of Chicago open data portal.

Original Analysis

The original statistical analysis (run_analysis.py) is preserved in the repo. It generates 11 charts, 2 interactive maps, and a 24-slide PowerPoint covering temporal patterns, weather-severity relationships, hit-and-run analysis, and geographic hotspots. See answerslogic.md for detailed methodology.

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
config		config
docs		docs
frontend		frontend
output		output
src		src
tests		tests
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
answerslogic.md		answerslogic.md
create_notebook.py		create_notebook.py
implementation_plan.md		implementation_plan.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run_analysis.py		run_analysis.py
traffic_crash_analysis.ipynb		traffic_crash_analysis.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CrashScope

Architecture

Data Flow Trace

Setup

API Endpoints

Models

Tests

Project Structure

Data

Original Analysis

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CrashScope

Architecture

Data Flow Trace

Setup

API Endpoints

Models

Tests

Project Structure

Data

Original Analysis

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages