Automated detection of active fault zones in California using high-resolution LiDAR DEMs and transformer-based semantic segmentation. Built as part of the SCEC (Southern California Earthquake Center) seismic hazard research project at SJSU.
π Live Dashboard: prithvi-app-88345939641.us-west2.run.app
SegFormer-B2 predictions on Owens Valley test set. Columns: Hillshade input β Ground Truth β Predicted probability β Overlay. Best test IoU: 0.486.
- What This Is
- Latest Results
- Setup Instructions
- Full Pipeline
- Repository Structure
- System Design
- Inference Service
- Cloud Data Storage
- Geological Insights
This project trains AI models to automatically detect earthquake fault traces from LiDAR-derived terrain data. Instead of geologists spending months physically surveying terrain, the model learns to identify fault signatures (linear scarps, offset features) directly from 0.5m resolution Digital Elevation Models.
Two research objectives:
- Objective 1B: Detect fault zones from LiDAR-derived DEM (hillshade + slope) using semantic segmentation
- Objective 1A: Detect fault zones from HLS Sentinel-2 satellite imagery using Prithvi-EO-2.0 foundation model
The core idea is that active faults leave physical scars on the landscape β visible as linear shadow boundaries in hillshade images and as abrupt slope changes. A well-trained segmentation model can learn to find these signatures automatically.
Study Regions (6 total):
- Carrizo Plain (San Andreas Fault β strike-slip)
- Owens Valley (ECSZ β normal fault) β Best performance
- Mojave (ECSZ β oblique-slip)
- Imperial Valley (Southern San Andreas β strike-slip)
- Pauma Valley (Peninsular Ranges β reverse)
- Sierra Pelona (Transverse Ranges β reverse)
| Rank | Region | Test IoU | F1 | Fault Type | System |
|---|---|---|---|---|---|
| π₯ | Owens Valley | 0.486 | 0.654 | Normal | ECSZ |
| π₯ | Mojave | 0.393 | 0.564 | Oblique | ECSZ |
| π₯ | Carrizo (baseline) | 0.369 | β | Strike-slip | SAF |
| 4 | Imperial Valley | 0.324 | 0.489 | Strike-slip | SAF |
| 5 | Pauma Valley | 0.238 | 0.384 | Reverse | Peninsular |
| 6 | Sierra Pelona | 0.121 | 0.215 | Reverse | Transverse |
Key finding: Owens Valley achieved IoU 0.486, a +31.7% improvement over the Carrizo baseline.
| Approach | IoU | Why it Failed |
|---|---|---|
| Phase 0: All regions combined | 0.032 | Mixed terrain types prevent convergence |
| B5 Multi-region (82M params) | 0.069 | Model too large for available data |
| B5 Carrizo only | 0.012 | Too few fault patches (392) for large model |
| B2 Multi-region | 0.087 | Same terrain mixing problem persists |
Conclusion: Per-region specialist models significantly outperform multi-region approaches.
Python >= 3.10
CUDA-capable GPU (recommended: 16GB+ VRAM for SegFormer-B2 training)
Cloud storage for patches and checkpoints (~10GB per experiment)
QGIS >= 3.x (for fault mask generation β one-time preprocessing only)
pip install -r dgx/requirements.txtKey packages:
transformers(for SegFormer)torch,torchvisionalbumentations(augmentation)rasterio(LiDAR I/O)segmentation-models-pytorch(U-Net baseline)
1. Raw LiDAR Data (OpenTopography)
ββ Download DEM tiles for each study region
β
2. QGIS Preprocessing (one-time per region)
ββ Compute hillshade + slope from DEM
ββ Load USGS Quaternary Fault Database
ββ Extract by extent β Reproject to EPSG:32611
ββ Buffer 10m β Rasterize at 0.5m resolution
ββ Output: region_fault_mask.tif (binary, byte type)
β
3. Patch Extraction (Python)
ββ Cut hillshade + slope + mask into 256Γ256 patches
ββ Stride = 128 (50% overlap)
ββ Hard Negative Mining: keep 10% of background patches
ββ 70/15/15 train/val/test split
ββ Output: train.npz / val.npz / test.npz
β
4. Per-Region Training (GPU)
ββ SegFormer-B2 from nvidia/mit-b2 (ADE20K pretrained)
ββ Weighted CE + Dice loss (fault_weight=5.0)
ββ WeightedRandomSampler (50x fault oversampling)
ββ Enhanced augmentation (Elastic + GaussNoise + ShiftScaleRotate)
ββ AdamW differential LR (encoder=1e-5, decoder=1e-4)
ββ Early stopping (patience=15)
ββ Output: best.pth + test_results.json
β
5. Threshold Tuning + Test Evaluation
ββ Search threshold 0.10~0.50 (step 0.02)
ββ Maximize IoU_fault on validation set
ββ Apply best threshold to test set
prithvi/
βββ dgx/ # GPU training scripts
β βββ segformer_train.py # B2 single-region (Carrizo baseline)
β βββ segformer_b5_train.py # B5 enhanced (experimental)
β βββ segformer_b2_per_region.py # B2 per-region (final, best results)
β βββ per_region_summary.json # All region test metrics
β βββ requirements.txt
βββ notebook/ # Exploratory + baseline notebooks
β βββ DEM_Unet.ipynb # U-Net ResNet34 baseline
β βββ DEM_buffer1m.ipynb # 1m buffer ablation
β βββ USGS_1m_dataset.ipynb # Ground truth dataset construction
β βββ obj1A_256x256.ipynb # Prithvi HLS approach
β βββ segformer_fault_detection.ipynb # SegFormer fine-tuning
β βββ README.md # Notebook results overview
βββ README.md # This file
Three-tier architecture:
βββββββββββββββββββββββββββββββββββββββββββββββ
β Local Workstation (Preprocessing) β
β β’ QGIS: fault mask generation β
β β’ Python: patch extraction β
β β’ Output: .npz files β
βββββββββββββββββββ¬ββββββββββββββββββββββββββββ
β upload (cloud sync)
β
βββββββββββββββββββββββββββββββββββββββββββββββ
β GPU Compute (Training) β
β β’ SegFormer-B2 fine-tuning β
β β’ Per-region specialist models β
β β’ Output: model checkpoints + metrics β
βββββββββββββββββββ¬ββββββββββββββββββββββββββββ
β upload (cloud sync)
β
βββββββββββββββββββββββββββββββββββββββββββββββ
β Cloud Inference (Dashboard) β
β β’ Google Cloud Run (stateless containers) β
β β’ Google Cloud Storage (results CSV) β
β β’ Public Dash app for visualization β
βββββββββββββββββββββββββββββββββββββββββββββββ
Design principles:
- Separation of concerns: Each tier handles one role (data prep / training / serving)
- Stateless inference: Cloud Run scales horizontally, results loaded from GCS at runtime
- Reproducibility: All preprocessing scripts version-controlled, training configs in code
- Portability: Cloud-agnostic data formats (NPZ, JSON, CSV); training script runs on any CUDA GPU
A live Dash dashboard hosted on Google Cloud Run visualizes:
- Per-region experiment progress (IoU comparisons)
- Patch-size ablation results
- Pipeline visualization (QGIS preprocessing steps)
- Architecture comparison (U-Net vs SegFormer)
URL: prithvi-app-88345939641.us-west2.run.app
The dashboard reads experiment results from GCS (cs163class.appspot.com/fault_detection_results.csv) at runtime, so it always reflects the latest experiments without redeployment.
Google Cloud Storage: gs://cs163class.appspot.com/
images/
hillshade_lidar.png # Carrizo LiDAR hillshade example
qgis_usmap.png # USGS Quaternary Faults overview
qgis_faultlines.png # Clipped + buffered fault lines
qgis_combined.png # Final binary mask on basemap
satellite_carrizo.png # HLS Sentinel-2 over Carrizo
prithvi_huggingface.png # Prithvi foundation model demo
phase0_bad_prediction.png # Failed Phase 0 example
unet_test_predictions.png # U-Net DEM test predictions
fault_detection_results.csv # All experiment metrics
All files publicly accessible via https://storage.googleapis.com/cs163class.appspot.com/<file>.
The model's performance correlates strongly with fault kinematics (movement type):
ECSZ (Vertical displacement):
Owens Valley: IoU 0.486
Mojave: IoU 0.393
Average: 0.440
San Andreas (Strike-slip):
Carrizo: IoU 0.369
Imperial: IoU 0.324
Average: 0.347
Reverse fault systems:
Pauma Valley: IoU 0.238
Sierra Pelona: IoU 0.121
Average: 0.180
Hypothesis: Normal/oblique faults with strong vertical displacement create clear shadow boundaries in LiDAR hillshade data, making them easier for the model to detect. Pure strike-slip faults (horizontal motion only) and reverse faults with high sediment cover produce weaker visual signals.
Implication for future work: ECSZ region appears to be the most promising area for LiDAR-based automated fault mapping. Strike-slip and reverse fault detection may benefit from additional data modalities (InSAR, multi-temporal LiDAR).
- OpenTopography β High-resolution LiDAR data access
- USGS β Quaternary Fault and Fold Database
- NASA HLS β Sentinel-2 satellite imagery
- IBM/NASA Prithvi-EO-2.0 β Geospatial foundation model
EarthScope Southern & Eastern California LiDAR Project.
Distributed by OpenTopography. https://doi.org/10.5069/G9G44N6Q
USGS Quaternary Fault and Fold Database of the United States.
https://www.usgs.gov/programs/earthquake-hazards/faults
MIT License β see LICENSE for details.