eruption-forecast

This project is forked from Eruption forecast model for Whakaari but has been significantly modified and extended for this specific application. Process raw seismic tremor data, extract time-series features, train machine learning models, and predict volcanic eruptions probability based on seismic patterns.

References and Acknowledgments

Dempsey, D. E., Cronin, S. J., Mei, S., & Kempa-Liehr, A. W. (2020). Automatic precursor recognition and real-time forecasting of sudden explosive volcanic eruptions at Whakaari, New Zealand. Nature Communications, 11(1), 1–8. https://doi.org/10.1038/s41467-020-17375-2 This model implements a time series feature engineering and classification workflow that issues eruption alerts based on real-time tremor data. https://github.com/ddempsey/whakaari

Ardid, A., Dempsey, D., Caudron, C., Cronin, S., Kennedy, B., Girona, T., Roman, D., Miller, C., Potter, S., Lamb, O. D., Martanto, A., Cubuk-Sabuncu, Y., Cabrera, L., Ruiz, S., Contreras, R., Pacheco, J., Mora, M. M., & De Angelis, S. (2025). Ergodic seismic precursors and transfer learning for short term eruption forecasting at data scarce volcanoes. Nature Communications , 16(1), 1–12. https://doi.org/10.1038/s41467-025-56689-x

Ardid, A., Dempsey, D., Caudron, C., & Cronin, S. (2022). Seismic precursors to the Whakaari 2019 phreatic eruption are transferable to other eruptions and volcanoes. Nature Communications, 13(1), 2002. https://doi.org/10.1038/s41467-022-29681-y

Endo, E. T., & Murray, T. L. (1991). Real-time Seismic Amplitude Measurement (RSAM): a volcano monitoring and prediction tool. Bulletin of Volcanology, 53, 533–545.

Caudron, C., et al., 2019, Change in seismic attenuation as a long-term precursor of gas-driven eruptions: Geology, https://doi.org/10.1130/G46107.1

Rey-Devesa, P., Prudencio, J., Benítez, C., Bretón, M., Plasencia, I., León, Z., Ortigosa, F., Gutiérrez, L., Arámbula-Mendoza, R., & Ibáñez, J. M. (2023). Tracking volcanic explosions using Shannon entropy at Volcán de Colima. Scientific Reports, 13(1), 1–11. https://doi.org/10.1038/s41598-023-36964-x

Christ, M., Braun, N., Neuffer, J., & Kempa-Liehr, A. W. (2018). Time Series FeatuRe Extraction on basis of Scalable Hypothesis tests (tsfresh – A Python package). Neurocomputing, 307, 72–77. https://doi.org/10.1016/j.neucom.2018.03.067

Lei, Y., & Wu, Z. (2020). Time series classification based on statistical features. Eurasip Journal on Wireless Communications and Networking, 2020(1). https://doi.org/10.1186/s13638-020-1661-4

Chardot, L., Jolly, A. D., Kennedy, B. M., Fournier, N., & Sherburn, S. (2015). Using volcanic tremor for eruption forecasting at White Island volcano (Whakaari), New Zealand. Journal of Volcanology and Geothermal Research, 302, 11–23. https://doi.org/10.1016/j.jvolgeores.2015.06.001

Time-series feature analysis and eruption forecasting for volcano data. Successor package to Whakaari. This model implements a time series feature engineering and classification workflow that issues eruption alerts based on real-time tremor data. https://github.com/ddempsey/puia

Important Disclaimers

This software is intended for research purposes only.

Probabilistic Predictions: This eruption forecast model provides probabilistic predictions of future volcanic activity, NOT deterministic guarantees. Predictions should be interpreted as likelihood estimates based on historical seismic patterns.
No Guarantee of Accuracy: This model is not guaranteed to predict every future eruption. Volcanic systems are complex and can exhibit unexpected behavior. False negatives (missed eruptions) and false positives (false alarms) are possible.
Software Limitations: This software is not guaranteed to be free of bugs or errors. Users should validate results independently and use this tool as one component of a comprehensive volcano monitoring strategy.
Not for Operational Use: This package is a research tool and should not be used as the sole basis for public safety decisions, evacuation orders, or emergency response without expert volcanological assessment.
Expert Interpretation Required: Results should always be interpreted by qualified volcanologists familiar with the specific volcano being monitored.

Always consult with local volcano observatories and follow official warnings from government agencies.

Pipeline Walkthrough; Sections 1–12, per-stage code examples
API Reference; Constructor and method parameter tables
Visualization & Plotting; All plot types and usage
Configuration; notify decorator, pipeline config save/replay, logging
Output Directory Structure; Full directory tree
Architecture; Component details, design principles, key classes

Features

Tremor Calculation: Process raw seismic data (SDS/FDSN) to calculate RSAM, DSAR, and Shannon Entropy metrics across multiple frequency bands
Label Building: Generate training labels from eruption dates with configurable forecast horizons
Feature Extraction: Extract 700+ time-series features using tsfresh for machine learning
Enhanced Feature Selection: Three-method feature selection; tsfresh statistical, RandomForest permutation importance, or combined two-stage
Model Training: Train 10 classifier types (Random Forest, Gradient Boosting, XGBoost, SVM, Logistic Regression, Neural Networks, Ensembles) across multiple random seeds
Model Evaluation: Comprehensive evaluation with ROC curves, precision-recall curves, confusion matrices, threshold analysis, calibration curves, feature importance, SHAP explainability, seed stability violin plots, frequency band contribution charts, and learning curve plots (plot_learning_curve_grid) via ModelEvaluator and MultiModelEvaluator; cross-classifier comparison plots and ranking tables via ClassifierComparator
Two Training Workflows: evaluate() for in-sample evaluation (80/20 split), train() for full-dataset training with future-data evaluation via ModelPredictor; fit() as a unified entry point that dispatches between the two
Seed Ensemble Merging: Combine all 500 seed models + their feature lists into a single .pkl file via BaseEnsemble.save() / SeedEnsemble / ClassifierEnsemble / merge_seed_models() / merge_all_classifiers(); eliminates per-seed I/O at prediction time and enables predict_proba() directly on the ensemble
Multi-processing: Parallel processing for faster tremor calculations and model training
Interactive HTML Reports: (beta, not fully functional yet) Generate self-contained Plotly-powered reports for every pipeline stage via ForecastModel.generate_report() or the standalone generate_report() function; no external dependencies except an optional weasyprint for PDF export
Telegram Notifications: notify decorator and send_telegram_notification direct function send structured Telegram messages (success/error, elapsed time, file attachments)
Modular Architecture: Clean separation of concerns with focused utility modules

Package Architecture

eruption-forecast/
├── src/eruption_forecast/
│   ├── data_container.py    # BaseDataContainer; shared ABC for TremorData & LabelData
│   ├── tremor/              # Seismic tremor processing
│   ├── label/               # Training label generation
│   ├── features/            # Feature extraction & selection
│   ├── model/               # ML model training & prediction
│   ├── sources/             # SDS and FDSN data source adapters
│   ├── plots/               # Visualization utilities
│   ├── report/              # (beta) Interactive HTML report generation
│   ├── utils/               # Focused utility modules
│   └── decorators/          # Function decorators
└── tests/                   # Unit tests

Full directory tree, design principles, and per-component details: wiki/Architecture.md

Pipeline Overview

Raw Seismic Data (SDS / FDSN)
           │
           ▼
┌─────────────────────┐
│   CalculateTremor   │  RSAM + DSAR + Entropy → tremor.csv
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│    LabelBuilder     │  Binary labels → label_*.csv
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│ TremorMatrixBuilder │  Windowed matrix → tremor_matrix_*.csv
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│   FeaturesBuilder   │  700+ features → all_features_*.csv
└──────────┬──────────┘
           │
           ▼
┌─────────────────────────────────────────────┐
│                 ModelTrainer                │
│  ┌─────────────┐   ┌─────────────────────┐  │
│  │FeatureSelect│   │   ClassifierModel   │  │
│  │   or        │   │ (10 classifiers,    │  │
│  │  combined   │   │  3 CV strategies)   │  │
│  └─────────────┘   └─────────────────────┘  │
│         ↓  evaluate()  ↓ train()            │
│    80/20 split + metrics   Full dataset     │
└─────────┬───────────────────────────────────┘
          │  trained_model_*.csv  +  *.pkl
          │
          │  (optional) trainer.merge_models()
          │  → merged_model_*.pkl  (SeedEnsemble)
          ▼
┌─────────────────────────────────────────────┐
│               ModelPredictor                │
│  ┌──────────────────────────────────────┐   │
│  │ predict_proba()                      │   │
│  │ (forecast mode; no labels needed)    │   │
│  └──────────────────────────────────────┘   │
│  Single model or multi-model consensus      │
└──────────────────────┬──────────────────────┘
                       │
                       ▼
┌─────────────────────────────────────────────┐
│           Report (beta, optional)           │
│   generate_report() / fm.generate_report()  │
│   → self-contained HTML (Plotly, CDN JS)    │
│   Tremor · Labels · Features · Training     │
│   Comparator · Prediction · Pipeline        │
└─────────────────────────────────────────────┘

Installation

This project uses uv as the package manager.

# Clone the repository
git clone https://github.com/martanto/eruption-forecast.git
cd eruption-forecast

# Install dependencies
uv sync

# Install with dev dependencies
uv sync --group dev

Data Sources

The package reads seismic data from two sources, both routed through CalculateTremor.

SDS; SeisComP Data Structure

SDS is a standardized directory and file layout used by SeisComP to store waveform data portably across data servers and analysis tools. See the official SDS specification for full details.

Directory layout:

<sds_dir>/
└── YEAR/
    └── NET/
        └── STA/
            └── CHAN.TYPE/
                └── NET.STA.LOC.CHAN.TYPE.YEAR.DAY

Example for network VG, station OJN, channel EHZ, day 075 of 2025:

/data/
└── 2025/
    └── VG/
        └── OJN/
            └── EHZ.D/
                └── VG.OJN.00.EHZ.D.2025.075

Field reference:

Field	Description	Example
`YEAR`	Four-digit year	`2025`
`NET`	Network code (≤ 8 chars)	`VG`
`STA`	Station code (≤ 8 chars)	`OJN`
`CHAN`	Channel code (≤ 8 chars)	`EHZ`
`LOC`	Location code (≤ 8 chars, may be empty)	`00`
`TYPE`	Data type; `D` = waveform data (most common)	`D`
`DAY`	Three-digit day-of-year, zero-padded	`075`

Files are miniSEED format. Periods in filenames are always present even when a field is empty.

Usage:

from eruption_forecast import CalculateTremor

tremor = CalculateTremor(
    station="OJN",
    channel="EHZ",
    start_date="2025-01-01",
    end_date="2025-01-31",
    n_jobs=4,
).from_sds(sds_dir="/data/sds").run()

FDSN; Web Service

FDSN downloads waveform data from any FDSN-compatible web service (IRIS, GEOFON, etc.) and caches it locally as SDS miniSEED so subsequent runs skip the network.

tremor = CalculateTremor(
    station="OJN",
    channel="EHZ",
    start_date="2025-01-01",
    end_date="2025-01-31",
).from_fdsn(client_url="https://service.iris.edu").run()

Quick Start: Complete Pipeline

Here's a complete end-to-end example from raw seismic data to trained models and eruption forecasting (adapted from main.py):

from eruption_forecast import ForecastModel

# Initialize the forecast model with station and configuration
fm = ForecastModel(
    root_dir="output",
    station="OJN",
    channel="EHZ",
    network="VG",
    location="00",
    window_size=2,
    volcano_id="Lewotobi Laki-laki",
    n_jobs=4,
    verbose=True,
)

# Complete pipeline with method chaining
fm.calculate(
    source="sds",
    sds_dir="/path/to/sds/data",
    methods=["rsam", "dsar", "entropy"],
    plot_daily=True,
    save_plot=True,
    remove_outlier_method="maximum",
).build_label(
    start_date="2025-01-01",
    end_date="2025-07-24",
    day_to_forecast=2,
    window_step=6,
    window_step_unit="hours",
    eruption_dates=[
        "2025-03-20",
        "2025-04-22",
        "2025-05-18",
        "2025-06-17",
        "2025-07-07",
    ],
).extract_features(
    select_tremor_columns=["rsam_f2", "rsam_f3", "rsam_f4", "dsar_f3-f4", "entropy"],
    save_tremor_matrix_per_method=True,
    exclude_features=["agg_linear_trend", "linear_trend_timewise", "length"],
    use_relevant_features=True,
).train(
    classifier="rf",
    cv_strategy="stratified",
    random_state=0,
    total_seed=500,
    with_evaluation=False,
    number_of_significant_features=20,
    sampling_strategy=0.75,
    save_all_features=True,
    plot_significant_features=True,
).forecast(
    start_date="2025-07-28",
    end_date="2025-08-04",
    window_step=10,
    window_step_unit="minutes",
)

What this pipeline does:

Calculate tremor (RSAM, DSAR, Shannon Entropy) from raw seismic data with outlier removal
Build labels from known eruption dates (training period: Jan 1 – Jul 24)
Extract features using tsfresh (700+ features) and select top 20
Train models using Random Forest with 500 random seeds for robust predictions
Forecast future eruptions (Jul 28 – Aug 4) using the trained ensemble

See main.py in the repository for the complete working example.

Full per-stage guide with code examples: wiki/Pipeline-Walkthrough.md

Reports (beta)

The report/ package generates self-contained, interactive HTML reports (powered by Plotly, loaded from CDN) for every pipeline stage. No image files are produced; each report is a single .html file you can open in any browser or share by email.

Integrated; chain after any pipeline stage

fm.calculate(...).build_label(...).train(...).generate_report()
# → output/VG.OJN.00.EHZ/reports/pipeline_report.html

# Select specific sections only
fm.generate_report(sections=["tremor", "label"])

# Export to PDF (requires: uv add weasyprint)
fm.generate_report(fmt="pdf")

Standalone; from an existing output directory

from eruption_forecast.report import generate_report

path = generate_report("output/VG.OJN.00.EHZ")
print(f"Report saved to {path}")

# Specific sections
path = generate_report("output/VG.OJN.00.EHZ", sections=["tremor", "training"])

Individual section reports

Each report class can be used directly for a focused view:

from eruption_forecast.report import (
    TremorReport,
    LabelReport,
    FeaturesReport,
    TrainingReport,
    ComparatorReport,
    PredictionReport,
    PipelineReport,
)

# Tremor: data completeness, band stats, full-range + daily detail chart
path = TremorReport("output/.../tremor.csv", station="OJN").save()

# Labels: window config, class distribution, eruption timeline
path = LabelReport(
    "output/.../label_2025-01-01_....csv",
    eruption_dates=["2025-03-20", "2025-04-22"],
).save()

# Features: feature counts, top-N bar, band contribution
path = FeaturesReport(
    features_csv="output/.../all_extracted_features.csv",
    significant_features_dir="output/.../significant_features/",
    selection_method="combined",
).save()

# Training: per-seed <details>, aggregate mean±std, stability, threshold analysis
path = TrainingReport(
    metrics_dir="output/.../metrics/",
    classifier_name="RandomForestClassifier",
).save()

# Classifier comparison: grouped bar + aggregate table
from eruption_forecast.report import ComparatorReport
path = ComparatorReport(
    classifier_registry={
        "rf":  "output/.../trainings/evaluations/rf/.../trained_model_rf.csv",
        "xgb": "output/.../trainings/evaluations/xgb/.../trained_model_xgb.csv",
    }
).save()

# Forecast probabilities: consensus line, uncertainty band, eruption markers
path = PredictionReport(
    prediction_df=fm.prediction_df,
    eruption_dates=["2025-03-20"],
    threshold=0.7,
).save()

What each report contains:

Report	Charts	Tables
`TremorReport`	Full-range overview, daily detail with date dropdown	Completeness, band stats
`LabelReport`	Class distribution bar, eruption timeline	Window config, class counts
`FeaturesReport`	Top-N features (horizontal bar), band contribution	Feature counts summary
`TrainingReport`	Seed stability, threshold analysis	Per-seed rows + aggregate mean±std
`ComparatorReport`	Grouped bar per metric × classifier	Aggregate metrics table
`PredictionReport`	Probability lines + uncertainty band + eruption markers	Forecast config
`PipelineReport`	All of the above + executive summary	Pipeline stage availability

Advanced Usage

Use FDSN instead of a local SDS archive

fm.calculate(
    source="fdsn",
    client_url="https://service.iris.edu",
).build_label(...).train(...)
# Downloaded miniSEED files are cached locally so subsequent runs skip the network.

Skip tremor calculation if data already exists

fm.load_tremor_data(
    tremor_csv="output/VG.OJN.00.EHZ/tremor/tremor_2025-01-01_2025-12-31.csv"
).build_label(...).extract_features(...).train(...).forecast(...)

Change feature selection method

fm.set_feature_selection_method("combined").train(
    classifier="rf",
    cv_strategy="timeseries",
    total_seed=200,
)

Train with evaluation (80/20 split) for in-sample testing

fm.train(
    classifier="xgb",
    with_evaluation=True,
    total_seed=100,
)

GPU acceleration for XGBoost

XGBoost (xgb) and the voting ensemble (voting) support GPU training via use_gpu=True. Use gpu_id to select a specific device on multi-GPU machines.

# Train on the first GPU (default)
fm.train(
    classifier="xgb",
    cv_strategy="stratified",
    total_seed=500,
    use_gpu=True,          # enable CUDA
    gpu_id=0,              # first GPU (default)
)

# Train on the second GPU
fm.train(
    classifier="xgb",
    use_gpu=True,
    gpu_id=1,              # second GPU
)

Parallelism architecture:

ModelTrainer.fit()
│
├── [outer] n_jobs  → Parallel(loky backend)
│   Each worker runs one full seed independently:
│   resample → feature selection → GridSearchCV → evaluate → save
│   GPU: forced to 1 (seeds share one GPU device)
│
└── [inner, per seed] grid_search_n_jobs
    │
    ├── FeatureSelector  → tsfresh/RandomForest, CPU-only
    │   GPU: unchanged; safe to parallelise
    │
    └── GridSearchCV  → runs XGBoost CV folds
        GPU: forced to 1 (fold workers share one GPU device)

Parallelism rules when use_gpu=True:

Parameter	Normal (CPU)	GPU (`use_gpu=True`)
`n_jobs` (outer seed workers)	Configurable	Forced to `1`; multiple seeds sharing one GPU causes VRAM contention
`grid_search_n_jobs` in `GridSearchCV`	Configurable	Forced to `1`; parallel CV fold workers each try to use the GPU simultaneously
`grid_search_n_jobs` in `FeatureSelector`	Configurable	Unchanged; feature selection is CPU-only (tsfresh/RandomForest) and is safe to parallelise

use_gpu=True has no effect on non-XGBoost classifiers (rf, gb, svm, etc.). Passing it with those classifiers emits a warning and training proceeds on CPU as normal.

Train multiple classifiers and run consensus forecast

Pass a list[str] or comma-separated string to classifier. Each classifier is trained sequentially; all trained model registries are available for multi-model consensus forecasting.

fm.train(
    classifier=["rf", "xgb", "gb"],
    cv_strategy="stratified",
    total_seed=500,
    with_evaluation=False,
).forecast(
    start_date="2025-07-28",
    end_date="2025-08-04",
    window_step=10,
    window_step_unit="minutes",
)

Compare multiple classifiers side-by-side

After training several classifiers with evaluate(), use ClassifierComparator to rank them and produce comparison plots:

from eruption_forecast.model import ClassifierComparator

# From a dict
comparator = ClassifierComparator(
    classifiers={
        "rf": "output/.../trainings/evaluations/rf/stratified/trained_model_rf_...csv",
        "xgb": "output/.../trainings/evaluations/xgb/stratified/trained_model_xgb_...csv",
        "gb": "output/.../trainings/evaluations/gb/stratified/trained_model_gb_...csv",
    },
    metrics=["f1_score", "roc_auc", "recall"],  # or None for all DEFAULT_METRICS
)

# Or from a JSON file  {"ClassifierName": "/path/to/trained_model_*.csv", ...}
comparator = ClassifierComparator.from_json(
    "output/VG.OJN.00.EHZ/evaluations_trained_models.json",
    metrics=["f1_score", "roc_auc", "recall"],
)

# Ranked by recall (default), saved to metrics/ranking_recall.csv
ranking = comparator.get_ranking()

# All plots; saved to figures/
results = comparator.plot_all()
# results["metric_bar"]      → dict[str, Figure]  (one per metric + "all" overview)
# results["seed_stability"]  → dict[str, Figure]  (one per metric + "all" overview)
# results["comparison_grid"] → Figure
# results["roc"]             → Figure
# results["ranking"]         → DataFrame

Merge 500 seed models into one file

After training, collapse all seed models into a single .pkl to remove per-seed I/O overhead:

from eruption_forecast.model.seed_ensemble import SeedEnsemble
from eruption_forecast.utils.ml import merge_seed_models, merge_all_classifiers

# Single classifier
merged_path = trainer.merge_models()
# → .../merged_model_RandomForestClassifier-StratifiedKFold_rs-0_ts-500_top-20.pkl

# Load and predict directly
ensemble = SeedEnsemble.load(merged_path)
mean_p, std, conf, pred = ensemble.predict_with_uncertainty(features_df)

# sklearn-compatible interface
proba = ensemble.predict_proba(features_df)   # shape (n_samples, 2)

# Multi-classifier bundle
bundle_path = trainer.merge_classifier_models({"rf": rf_csv, "xgb": xgb_csv})

# Pass merged pkl directly to ModelPredictor
predictor = ModelPredictor(
    start_date="2025-07-28", end_date="2025-08-04",
    trained_models=merged_path,    # single merged pkl
    # or: trained_models=bundle_path  (multi-classifier bundle)
)

Disable logging

Silence all console and file output; useful for batch jobs or when embedding the package:

from eruption_forecast import disable_logging, enable_logging

disable_logging()
fm.calculate(...).build_label(...).train(...)  # Silent; no output
enable_logging()   # Restore output

# Fine-grained control
from eruption_forecast.logger import set_log_level, set_log_directory
set_log_level("WARNING")              # Only warnings and errors to console
set_log_directory("output/logs")      # Write log files to a custom directory

Save and replay pipeline configuration

fm.save_config()          # YAML → {station_dir}/config.yaml
fm.save_model()           # joblib → {station_dir}/forecast_model.pkl

# Replay the full pipeline from a saved config
fm2 = ForecastModel.from_config("output/VG.OJN.00.EHZ/config.yaml")
fm2.run()

# Resume from a saved model (skip re-training)
fm3 = ForecastModel.load_model("output/VG.OJN.00.EHZ/forecast_model.pkl")
fm3.forecast(start_date="2025-04-01", end_date="2025-04-07",
             window_step=12, window_step_unit="hours")

Full configuration reference: wiki/Configuration.md

Supported Classifiers

Classifier	Description	Imbalance Handling
`rf`	Random Forest (balanced, robust, default)	`class_weight="balanced"`
`gb`	Gradient Boosting (handles imbalance natively)	None (natural)
`xgb`	XGBoost (excellent for imbalanced data, GPU-capable)	`scale_pos_weight` grid search
`svm`	Support Vector Machine (high-dimensional)	`class_weight="balanced"`
`lr`	Logistic Regression (interpretable, fast)	`class_weight="balanced"`
`nn`	Neural Network MLP (complex patterns)	None
`dt`	Decision Tree (interpretable baseline)	`class_weight="balanced"`
`knn`	K-Nearest Neighbors (simple baseline)	None
`nb`	Gaussian Naive Bayes (fast baseline)	None
`voting`	Soft VotingClassifier (RF + XGBoost ensemble, GPU-capable)	Combined
`lite-rf`	Random Forest with a smaller grid for faster training	`class_weight="balanced"`

Hyperparameter grids and overriding them: wiki/Training-Workflows.md

Cross-Validation Strategies

Strategy	Class	Best For
`shuffle`	`StratifiedShuffleSplit`	Random splits with stratification (default)
`stratified`	`StratifiedKFold`	Preserves class distribution across folds
`timeseries`	`TimeSeriesSplit`	Temporal data, strict no-future-leakage

Threshold Optimization & Scoring

Volcanic eruption datasets are severely imbalanced; eruptions are rare events that may represent less than 5-10% of all windows. Choosing the right optimization criterion for the decision threshold is therefore critical.

Scoring Criterion Comparison

Criterion	Formula	Penalizes Missed Eruptions	Penalizes False Alarms	Robust to Imbalance
Accuracy	(TP + TN) / N	No; dominated by majority class	No	No
F1 Score	2·P·R / (P + R)	Partially	Partially	No
Recall	TP / (TP + FN)	Yes	No; can hit 1.0 by predicting all positive	No
Precision	TP / (TP + FP)	No	Yes	No
Balanced Accuracy	(Sensitivity + Specificity) / 2	Partially	Partially	Yes
G-mean	√(Sensitivity × Specificity)	Yes	Yes	Yes

Output Directory Structure

All outputs are organized under {output_dir}/{network}.{station}.{location}.{channel}/ (e.g., output/VG.OJN.00.EHZ/).

output/
└── VG.OJN.00.EHZ/
    ├── tremor/
    │   ├── tremor_*.csv           # Merged tremor data
    │   └── matrix/                # Tremor matrix outputs (TremorMatrixBuilder)
    │       ├── tremor_matrix_unified_*.csv
    │       └── per_method/        # Per-column matrices with date-stamped filenames
    ├── features/
    │   ├── extracted/
    │   │   ├── train/             # Per-column tsfresh CSVs (training mode)
    │   │   └── forecast/          # Per-column tsfresh CSVs (prediction mode)
    │   ├── all_features_*.csv     # Concatenated features
    │   └── label-features_*.csv   # Labels aligned with features
    └── trainings/
        ├── evaluations/
        │   ├── features/          # Shared feature selection outputs
        │   └── classifiers/       # Per-classifier model outputs (evaluations)
        └── predictions/
            ├── features/          # Shared feature selection outputs
            └── classifiers/       # Per-classifier model outputs (predictions)

Full directory tree with all sub-paths: wiki/Output-Structure.md

Requirements

Core Dependencies

Python >= 3.11
pandas >= 3.0.0
numpy
scipy
obspy (seismic data processing)
tsfresh (time-series feature extraction)
scikit-learn
imbalanced-learn
xgboost
joblib
matplotlib
seaborn
loguru
python-dotenv (Telegram notification credential loading)

Development Dependencies

ruff (linting)
ty (type checking)
pytest (testing)

Development

Code Quality Tools

# Lint and auto-fix
uv run ruff check --fix src/

# Type checking
uvx ty check src/

Running Tests

# Run all tests
pytest tests/

# Run with coverage
pytest --cov=src/eruption_forecast tests/

# Run specific test
pytest tests/test_train_model.py

Contributing

Fork the repository
Create a feature branch (git checkout -b feature/my-feature)
Make changes with tests
Ensure code passes linting and type checks (uv run ruff check --fix src/)
Update documentation
Submit a pull request

Commit convention: Use fix/ for bug fixes, ft/ for new features, dev/ as default.

Code style: PEP 8, Google-style docstrings with explicit types, type hints on all functions.

License

MIT License; see LICENSE file for details.

Disclaimer of Liability: This software is provided "as is" without warranty of any kind, express or implied. The authors and contributors shall not be liable for any damages or losses arising from the use of this software. Volcanic eruption forecasting is inherently uncertain, and this software should be used only as a research tool, not for operational volcano monitoring or public safety decisions.

Version: 0.1.0 Status: Active Development Last Updated: 2026-04-26 (docs/ removed; all links now point to wiki/)

Name		Name	Last commit message	Last commit date
Latest commit History 819 Commits
assets		assets
src/eruption_forecast		src/eruption_forecast
tests		tests
wiki		wiki
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
config.example.yaml		config.example.yaml
main.py		main.py
predict.py		predict.py
pyproject.toml		pyproject.toml
ruff.toml		ruff.toml
ty.toml		ty.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

eruption-forecast

References and Acknowledgments

Important Disclaimers

Table of Contents

Features

Package Architecture

Pipeline Overview

Installation

Data Sources

SDS; SeisComP Data Structure

FDSN; Web Service

Quick Start: Complete Pipeline

Reports (beta)

Integrated; chain after any pipeline stage

Standalone; from an existing output directory

Individual section reports

Advanced Usage

Use FDSN instead of a local SDS archive

Skip tremor calculation if data already exists

Change feature selection method

Train with evaluation (80/20 split) for in-sample testing

GPU acceleration for XGBoost

Train multiple classifiers and run consensus forecast

Compare multiple classifiers side-by-side

Merge 500 seed models into one file

Disable logging

Save and replay pipeline configuration

Supported Classifiers

Cross-Validation Strategies

Threshold Optimization & Scoring

Scoring Criterion Comparison

Output Directory Structure

Requirements

Core Dependencies

Development Dependencies

Development

Code Quality Tools

Running Tests

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages