This repository contains two complementary approaches to analyzing and predicting accessibility barriers in urban environments:
- GNN Hotspot Detection - Graph Neural Network-based spatial hotspot identification
- Temporal Forecasting - Transformer-based predictive modeling for accessibility trends
Both projects analyze the Access to Everyday Life Dataset (~82K accessibility barriers across Seattle neighborhoods) to support urban planning and infrastructure improvement decisions.
datathon26/
├── data/ # Shared dataset
│ └── Access_to_Everyday_Life_Dataset.csv
├── GNN/ # Graph Neural Network Hotspot Detection
│ ├── src/ # Source code
│ ├── outputs/ # Results, visualizations, dashboards
│ ├── models/ # Trained models
│ ├── dashboard.html # Interactive dashboard
│ ├── RESULTS_SUMMARY.md # Detailed results
│ └── GNN-README.md # Project documentation
├── temporal_calc/ # Temporal Forecasting
│ ├── outputs/ # Results, plots, forecasts
│ ├── checkpoints/ # Model checkpoints
│ ├── dashboard.html # Interactive dashboard
│ └── transformer_README.md # Project documentation
└── README.md # This file
Research Question: Identify high-risk accessibility hotspots using clustering and/or spatial modeling.
Approach: Hybrid Graph Neural Network (GAT) that combines:
- Spatial Modeling: KNN-based graph construction, spatial autocorrelation analysis
- Clustering: DBSCAN on learned embeddings with multi-factor risk scoring
- Graph-Based Learning: Contrastive learning to capture spatial-contextual patterns
Results (test set, 70/15/15 train/val/test split):
- 255 hotspots identified (3.74% of spatial units on test set)
- 100% coverage of high-severity problems (3,634 issues with severity ≥ 4)
- Spatial coherence: Moran's I = 0.2376 (moderate positive spatial autocorrelation)
- Baseline comparison: Compared against 8 baseline methods (KDE, Getis-Ord Gi*, DBSCAN, etc.) with Jaccard similarity ranging from 0.01-0.05, indicating the GNN identifies different spatial patterns than classical methods
Rationale for Graph Structure: Accessibility barriers exhibit spatial dependencies. Nearby issues may share common causes (infrastructure age, neighborhood planning, terrain). Classical spatial statistics (KDE, Getis-Ord Gi*) treat locations as independent points, which may miss these contextual relationships. Graph structure explicitly models spatial neighborhoods, allowing the GNN to learn representations that encode these dependencies. However, this comes with computational cost and requires careful graph construction (KNN-based, k=15 in our implementation).
View GNN Dashboard | Read Full Results
Research Question: Predict future neighborhood accessibility scores based on historical barrier patterns.
Approach: Transformer-based sequence-to-sequence model for multi-step forecasting:
- Architecture: Transformer encoder (6 layers, 12 heads, d_model=192) with sinusoidal positional encoding
- Features: Temporal lags (t-1, t-2, t-3), rolling statistics (3 and 5-period), cyclical time encoding (sin/cos), demographic equity scores
- Training: Multi-step prediction (3 steps ahead) with Huber loss. Data split: temporal train/test split (not random, to preserve temporal structure)
- Limitation: Dataset lacks true timestamps; synthetic time bins created from
attribute_idordering as temporal proxy, which may not reflect actual temporal dynamics
Results (test set, temporal train/test split):
- R² = 0.8372 (Transformer, test set)
- MAE = 16.01 points (vs. LSTM: 19.13, XGBoost: 40.69 on same test set)
- 55.56% threshold accuracy (±15 points on accessibility score scale)
- Model comparison: Transformer achieves higher R² than LSTM (0.7893) and XGBoost (0.1008) on the test set. Note: XGBoost's poor performance (R² = 0.10) suggests it struggles with the temporal structure, while LSTM and Transformer both capture temporal dependencies, with Transformer showing better fit.
Model Comparison:
| Model | MAE | R² | Pct Accuracy | Threshold Accuracy |
|---|---|---|---|---|
| Transformer | 16.01 | 0.8372 | 57.92% | 55.56% |
| LSTM | 19.13 | 0.7893 | 60.67% | 48.15% |
| XGBoost | 40.69 | 0.1008 | -23.91% | 0.00% |
View Temporal Dashboard | Read Full Documentation
| Aspect | GNN Hotspot Detection | Temporal Forecasting |
|---|---|---|
| Goal | Identify where high-risk areas are now | Predict how accessibility will change |
| Time Horizon | Current state analysis | Future trend prediction |
| Output | Spatial hotspot map (255 regions) | Neighborhood risk forecasts |
| Use Case | Prioritize infrastructure investment | Plan long-term interventions |
| Method | Graph-based spatial clustering | Sequence-to-sequence forecasting |
Together, they provide:
- Spatial Analysis: Current spatial distribution of high-risk areas (GNN)
- Temporal Projections: Forecasted accessibility trends (Temporal), with caveats about synthetic time bins
- Complementary Information: Combining current hotspot locations with predicted trends may inform resource allocation, though both models have limitations (GNN: parameter sensitivity, Temporal: synthetic time proxy) that should be considered in decision-making
cd GNN
# Install dependencies
pip install -r requirements.txt
# Run full pipeline (train, evaluate, visualize)
python -m src.main --train --eval --visualize
# Generate map data for dashboard
python generate_map_data.py
# View dashboard (start local server)
python -m http.server 8000
# Open http://localhost:8000/dashboard.htmlcd temporal_calc
# Install dependencies
pip install torch numpy pandas scikit-learn xgboost matplotlib seaborn
# Run full pipeline
bash run_all.sh
# Or run individual steps:
python max_performance.py # Train and compare models
python evaluate.py # Evaluate best model
python generate_map_data.py # Generate dashboard data
# View dashboard (start local server)
python -m http.server 8000
# Open http://localhost:8000/dashboard.html- Detection Coverage: Identifies 255 hotspots on test set (3.3x more than best baseline method) while maintaining 100% coverage of high-severity issues. This suggests the GNN captures a broader set of high-risk areas compared to conservative methods like Getis-Ord Gi* (17 hotspots).
- Pattern Divergence: Low Jaccard similarity (0.01-0.05) with all baselines indicates the GNN identifies different spatial patterns than classical methods. This may reflect learned spatial-contextual relationships, though it could also indicate overfitting or parameter sensitivity. Further validation would be needed to confirm generalizability.
- Spatial Coherence Tradeoff: Moderate spatial coherence (Moran's I = 0.2376) suggests meaningful spatial clustering without being overly conservative. Getis-Ord Gi* achieves higher coherence (0.75) but identifies far fewer hotspots (17), illustrating the tradeoff between coverage and spatial coherence.
- Graph Structure Justification: Graph structure enables modeling spatial dependencies that point-based methods (KDE, thresholding) cannot capture. However, this requires careful graph construction (KNN k=15 chosen empirically) and comes with computational overhead. The necessity claim is supported by low baseline Jaccard similarity, but alternative graph constructions (e.g., radius-based) were not exhaustively explored due to 24-hour datathon constraints.
- Model Performance: Transformer achieves R² = 0.8372 on test set, higher than LSTM (0.7893) and XGBoost (0.1008). The improvement over LSTM is modest (~6% relative), while XGBoost's poor performance (R² = 0.10) suggests it fails to capture temporal dependencies. The Transformer's attention mechanism may better model long-range dependencies in the sequence, though this comes with increased model complexity (~71K parameters vs. simpler baselines).
- Multi-step Forecasting: Forecasts 3 steps ahead with 55.56% threshold accuracy (±15 points on accessibility score scale). This indicates the model captures general trends but has limited precision for point predictions. The model is appropriate for identifying neighborhoods at risk rather than exact score prediction.
- Feature Engineering Impact: Temporal lags, rolling statistics, and cyclical encoding provide the model with historical context and periodic patterns. Ablation studies were not conducted within the 24-hour constraint to quantify each feature's contribution.
- Limitations: Predictions are based on synthetic time bins (derived from
attribute_idordering), not true temporal data. This limits the model's ability to capture real-world temporal dynamics. Additionally, the dataset's crowdsourced nature may introduce reporting biases that affect generalizability.
- Hotspots Detected: 255
- Spatial Coverage: 3.74% of units
- High-Severity Coverage: 100% (3,634 issues)
- Spatial Coherence: Moran's I = 0.2376
- Prediction Accuracy: R² = 0.8372
- Mean Absolute Error: 16.01 points
- Threshold Accuracy: 55.56% (±15 points)
- Forecast Horizon: 3 steps ahead
- Deep Learning: PyTorch, PyTorch Geometric
- Spatial Analysis: scikit-learn, scipy, libpysal
- Visualization: Folium, Plotly, Matplotlib
- Clustering: DBSCAN, HDBSCAN, OPTICS
- Deep Learning: PyTorch (Transformer architecture)
- Time Series: Custom sequence-to-sequence model
- Baselines: LSTM, XGBoost
- Visualization: Matplotlib, Seaborn, Leaflet
hotspot_map.html- Interactive map of detected hotspotsresults.json- Evaluation metrics and resultsbaseline_comparison.json- Comparison with 8 baseline methodsembeddings.png- Learned embedding visualizationdashboards/- Training curves, evaluation dashboardsmap_data.json- Data for interactive dashboard
model_results.json- Model performance metricsneighborhood_risk_forecast.csv- Future accessibility predictionsaccessibility_scores.csv- Current neighborhood scoresplots/- Performance comparisons, geospatial analysismap_data.json- Data for interactive dashboardeval/- Detailed evaluation metrics and plots
- Evaluates whether graph structure improves spatial dependency modeling compared to point-based methods (baseline comparison suggests it identifies different patterns, though necessity is not definitively proven)
- Implements a hybrid approach (spatial modeling + clustering + graph learning) and compares against 8 baseline methods using Jaccard similarity and spatial coherence metrics
- Provides baseline comparison (8 methods) with interpretable metrics (Jaccard similarity, Moran's I, coverage)
- Achieves 100% coverage of high-severity issues (severity ≥ 4) on test set, though hotspot boundaries depend on DBSCAN clustering parameters (eps, min_samples) which were selected empirically
- Compares Transformer, LSTM, and XGBoost architectures for accessibility forecasting (Transformer shows higher R² on test set, though improvement over LSTM is modest)
- Implements multi-step prediction (3 steps ahead) with 55.56% threshold accuracy (±15 points), indicating trend-level rather than point-prediction accuracy
- Integrates temporal features (lags, rolling stats, cyclical encoding) and demographic equity scores, though feature importance analysis was not conducted within the 24-hour constraint
- Provides forecast outputs that may inform planning, with important caveats: predictions are based on synthetic time bins (not true temporal data) and crowdsourced data may have reporting biases
- GNN Project README - Complete GNN documentation
- GNN Results Summary - Detailed results and analysis
- GNN Dashboard Guide - Dashboard usage guide
- Temporal Project README - Complete temporal documentation
- Temporal Data Loading Guide - Data preparation guide
- GNN: Identify priority areas for immediate infrastructure investment
- Temporal: Forecast which neighborhoods will need future interventions
- GNN: Understand current spatial distribution of accessibility barriers
- Temporal: Plan long-term budget allocation based on predicted trends
- GNN: Explore graph-based spatial analysis methods
- Temporal: Study transformer architectures for urban time series
Both projects are self-contained and can be run independently. For questions or issues:
- Check the respective project README files
- Review the dashboard documentation
- Examine the output files for detailed results
This project is part of a datathon submission analyzing accessibility data for urban planning purposes.
This repository presents two complementary machine learning approaches to accessibility analysis:
- GNN Hotspot Detection identifies where high-risk areas are located using graph-based spatial analysis
- Temporal Forecasting predicts how accessibility will change over time using transformer-based sequence modeling
Together, they provide complementary analyses of current spatial patterns and projected temporal trends. Both approaches have methodological limitations (GNN: parameter sensitivity, graph construction choices; Temporal: synthetic time proxy, limited validation) that should be considered when interpreting results for infrastructure planning decisions.
Start exploring:
- GNN Dashboard - Interactive hotspot visualization
- Temporal Dashboard - Interactive forecasting dashboard
- Swastik Singh - @swassingh - Swastik.Singh@gmail.com
- Navneeth Dhamotharan - @Navneethd8 - nd17@uw.edu