Skip to content

boseongkang/prithvi

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

24 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Fault Detection with LiDAR DEM + Deep Learning

Automated detection of active fault zones in California using high-resolution LiDAR DEMs and transformer-based semantic segmentation. Built as part of the SCEC (Southern California Earthquake Center) seismic hazard research project at SJSU.

🌐 Live Dashboard: prithvi-app-88345939641.us-west2.run.app

Owens Valley Predictions

SegFormer-B2 predictions on Owens Valley test set. Columns: Hillshade input β†’ Ground Truth β†’ Predicted probability β†’ Overlay. Best test IoU: 0.486.


Table of Contents

  1. What This Is
  2. Latest Results
  3. Setup Instructions
  4. Full Pipeline
  5. Repository Structure
  6. System Design
  7. Inference Service
  8. Cloud Data Storage
  9. Geological Insights

1. What This Is

This project trains AI models to automatically detect earthquake fault traces from LiDAR-derived terrain data. Instead of geologists spending months physically surveying terrain, the model learns to identify fault signatures (linear scarps, offset features) directly from 0.5m resolution Digital Elevation Models.

Two research objectives:

  • Objective 1B: Detect fault zones from LiDAR-derived DEM (hillshade + slope) using semantic segmentation
  • Objective 1A: Detect fault zones from HLS Sentinel-2 satellite imagery using Prithvi-EO-2.0 foundation model

The core idea is that active faults leave physical scars on the landscape β€” visible as linear shadow boundaries in hillshade images and as abrupt slope changes. A well-trained segmentation model can learn to find these signatures automatically.

Study Regions (6 total):

  • Carrizo Plain (San Andreas Fault β€” strike-slip)
  • Owens Valley (ECSZ β€” normal fault) ⭐ Best performance
  • Mojave (ECSZ β€” oblique-slip)
  • Imperial Valley (Southern San Andreas β€” strike-slip)
  • Pauma Valley (Peninsular Ranges β€” reverse)
  • Sierra Pelona (Transverse Ranges β€” reverse)

2. Latest Results

Per-Region SegFormer-B2 Specialist Models

Rank Region Test IoU F1 Fault Type System
πŸ₯‡ Owens Valley 0.486 0.654 Normal ECSZ
πŸ₯ˆ Mojave 0.393 0.564 Oblique ECSZ
πŸ₯‰ Carrizo (baseline) 0.369 β€” Strike-slip SAF
4 Imperial Valley 0.324 0.489 Strike-slip SAF
5 Pauma Valley 0.238 0.384 Reverse Peninsular
6 Sierra Pelona 0.121 0.215 Reverse Transverse

Key finding: Owens Valley achieved IoU 0.486, a +31.7% improvement over the Carrizo baseline.

Failed Approaches (Important Negative Results)

Approach IoU Why it Failed
Phase 0: All regions combined 0.032 Mixed terrain types prevent convergence
B5 Multi-region (82M params) 0.069 Model too large for available data
B5 Carrizo only 0.012 Too few fault patches (392) for large model
B2 Multi-region 0.087 Same terrain mixing problem persists

Conclusion: Per-region specialist models significantly outperform multi-region approaches.


3. Setup Instructions

Requirements

Python >= 3.10
CUDA-capable GPU (recommended: 16GB+ VRAM for SegFormer-B2 training)
Cloud storage for patches and checkpoints (~10GB per experiment)
QGIS >= 3.x (for fault mask generation β€” one-time preprocessing only)

Install Dependencies

pip install -r dgx/requirements.txt

Key packages:

  • transformers (for SegFormer)
  • torch, torchvision
  • albumentations (augmentation)
  • rasterio (LiDAR I/O)
  • segmentation-models-pytorch (U-Net baseline)

4. Full Pipeline

1. Raw LiDAR Data (OpenTopography)
   └─ Download DEM tiles for each study region
        ↓
2. QGIS Preprocessing (one-time per region)
   β”œβ”€ Compute hillshade + slope from DEM
   β”œβ”€ Load USGS Quaternary Fault Database
   β”œβ”€ Extract by extent β†’ Reproject to EPSG:32611
   β”œβ”€ Buffer 10m β†’ Rasterize at 0.5m resolution
   └─ Output: region_fault_mask.tif (binary, byte type)
        ↓
3. Patch Extraction (Python)
   β”œβ”€ Cut hillshade + slope + mask into 256Γ—256 patches
   β”œβ”€ Stride = 128 (50% overlap)
   β”œβ”€ Hard Negative Mining: keep 10% of background patches
   β”œβ”€ 70/15/15 train/val/test split
   └─ Output: train.npz / val.npz / test.npz
        ↓
4. Per-Region Training (GPU)
   β”œβ”€ SegFormer-B2 from nvidia/mit-b2 (ADE20K pretrained)
   β”œβ”€ Weighted CE + Dice loss (fault_weight=5.0)
   β”œβ”€ WeightedRandomSampler (50x fault oversampling)
   β”œβ”€ Enhanced augmentation (Elastic + GaussNoise + ShiftScaleRotate)
   β”œβ”€ AdamW differential LR (encoder=1e-5, decoder=1e-4)
   β”œβ”€ Early stopping (patience=15)
   └─ Output: best.pth + test_results.json
        ↓
5. Threshold Tuning + Test Evaluation
   β”œβ”€ Search threshold 0.10~0.50 (step 0.02)
   β”œβ”€ Maximize IoU_fault on validation set
   └─ Apply best threshold to test set

5. Repository Structure

prithvi/
β”œβ”€β”€ dgx/                                 # GPU training scripts
β”‚   β”œβ”€β”€ segformer_train.py               # B2 single-region (Carrizo baseline)
β”‚   β”œβ”€β”€ segformer_b5_train.py            # B5 enhanced (experimental)
β”‚   β”œβ”€β”€ segformer_b2_per_region.py       # B2 per-region (final, best results)
β”‚   β”œβ”€β”€ per_region_summary.json          # All region test metrics
β”‚   └── requirements.txt
β”œβ”€β”€ notebook/                            # Exploratory + baseline notebooks
β”‚   β”œβ”€β”€ DEM_Unet.ipynb                   # U-Net ResNet34 baseline
β”‚   β”œβ”€β”€ DEM_buffer1m.ipynb               # 1m buffer ablation
β”‚   β”œβ”€β”€ USGS_1m_dataset.ipynb            # Ground truth dataset construction
β”‚   β”œβ”€β”€ obj1A_256x256.ipynb              # Prithvi HLS approach
β”‚   β”œβ”€β”€ segformer_fault_detection.ipynb  # SegFormer fine-tuning
β”‚   └── README.md                        # Notebook results overview
└── README.md                            # This file

6. System Design

Three-tier architecture:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Local Workstation (Preprocessing)          β”‚
β”‚  β€’ QGIS: fault mask generation              β”‚
β”‚  β€’ Python: patch extraction                 β”‚
β”‚  β€’ Output: .npz files                       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                  β”‚ upload (cloud sync)
                  ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  GPU Compute (Training)                     β”‚
β”‚  β€’ SegFormer-B2 fine-tuning                 β”‚
β”‚  β€’ Per-region specialist models             β”‚
β”‚  β€’ Output: model checkpoints + metrics      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                  β”‚ upload (cloud sync)
                  ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Cloud Inference (Dashboard)                β”‚
β”‚  β€’ Google Cloud Run (stateless containers)  β”‚
β”‚  β€’ Google Cloud Storage (results CSV)       β”‚
β”‚  β€’ Public Dash app for visualization        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Design principles:

  • Separation of concerns: Each tier handles one role (data prep / training / serving)
  • Stateless inference: Cloud Run scales horizontally, results loaded from GCS at runtime
  • Reproducibility: All preprocessing scripts version-controlled, training configs in code
  • Portability: Cloud-agnostic data formats (NPZ, JSON, CSV); training script runs on any CUDA GPU

7. Inference Service

A live Dash dashboard hosted on Google Cloud Run visualizes:

  • Per-region experiment progress (IoU comparisons)
  • Patch-size ablation results
  • Pipeline visualization (QGIS preprocessing steps)
  • Architecture comparison (U-Net vs SegFormer)

URL: prithvi-app-88345939641.us-west2.run.app

The dashboard reads experiment results from GCS (cs163class.appspot.com/fault_detection_results.csv) at runtime, so it always reflects the latest experiments without redeployment.


8. Cloud Data Storage

Google Cloud Storage: gs://cs163class.appspot.com/

images/
  hillshade_lidar.png            # Carrizo LiDAR hillshade example
  qgis_usmap.png                 # USGS Quaternary Faults overview
  qgis_faultlines.png            # Clipped + buffered fault lines
  qgis_combined.png              # Final binary mask on basemap
  satellite_carrizo.png          # HLS Sentinel-2 over Carrizo
  prithvi_huggingface.png        # Prithvi foundation model demo
  phase0_bad_prediction.png      # Failed Phase 0 example
  unet_test_predictions.png      # U-Net DEM test predictions
fault_detection_results.csv      # All experiment metrics

All files publicly accessible via https://storage.googleapis.com/cs163class.appspot.com/<file>.


9. Geological Insights

The model's performance correlates strongly with fault kinematics (movement type):

ECSZ (Vertical displacement):
  Owens Valley:  IoU 0.486
  Mojave:        IoU 0.393
  Average: 0.440

San Andreas (Strike-slip):
  Carrizo:        IoU 0.369
  Imperial:       IoU 0.324
  Average: 0.347

Reverse fault systems:
  Pauma Valley:   IoU 0.238
  Sierra Pelona:  IoU 0.121
  Average: 0.180

Hypothesis: Normal/oblique faults with strong vertical displacement create clear shadow boundaries in LiDAR hillshade data, making them easier for the model to detect. Pure strike-slip faults (horizontal motion only) and reverse faults with high sediment cover produce weaker visual signals.

Implication for future work: ECSZ region appears to be the most promising area for LiDAR-based automated fault mapping. Strike-slip and reverse fault detection may benefit from additional data modalities (InSAR, multi-temporal LiDAR).


Data Sources

  • OpenTopography β€” High-resolution LiDAR data access
  • USGS β€” Quaternary Fault and Fold Database
  • NASA HLS β€” Sentinel-2 satellite imagery
  • IBM/NASA Prithvi-EO-2.0 β€” Geospatial foundation model

Citation

EarthScope Southern & Eastern California LiDAR Project.
Distributed by OpenTopography. https://doi.org/10.5069/G9G44N6Q

USGS Quaternary Fault and Fold Database of the United States.
https://www.usgs.gov/programs/earthquake-hazards/faults

License

MIT License β€” see LICENSE for details.

About

Active fault detection in California using LiDAR DEM and SegFormer semantic segmentation.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors