Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file modified .DS_Store
Binary file not shown.
27 changes: 15 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
# das-anomaly
[![DOI](https://zenodo.org/badge/823391484.svg)](https://doi.org/10.5281/zenodo.12747212)
[![Licence](https://www.gnu.org/graphics/lgplv3-88x31.png)](https://www.gnu.org/licenses/lgpl.html)
[![codecov](https://codecov.io/gh/ahmadtourei/das-anomaly/branch/main/graph/badge.svg)](https://codecov.io/gh/ahmadtourei/das-anomaly)
[![codecov](https://codecov.io/gh/dasdae/das-anomaly/branch/main/graph/badge.svg)](https://codecov.io/gh/dasdae/das-anomaly)

_das-anomaly_ is an open-source Python package for unsupervised anomaly detection in distributed acoustic sensing (DAS) datasets using an autoencoder-based deep learning algorithm. It is being developed by Ahmad Tourei under the supervision of Dr. Eileen R. Martin at Colorado School of Mines.

If you use _das-anomaly_ in your work, please cite the following:

> Ahmad Tourei. (2025). ahmadtourei/das-anomaly: latest (Concept). Zenodo. http://doi.org/10.5281/zenodo.12747212
> Ahmad Tourei. (2025). DASDAE/das-anomaly: latest (Concept). Zenodo. http://doi.org/10.5281/zenodo.12747212


## Installation
Expand Down Expand Up @@ -77,7 +77,7 @@ The overall workflow for using the package is illustrated below:
The main steps are:
1. Define constants and create a Spool of data:

Using the _config_user_ script in the das_anomaly directory, define the constants and directory paths for data, power spectral density (PSD) images, detected anomaly results, etc. You would complete adding the values as you go over the steps mentioned below. Then, using DASCore, create an index file for the [spool](https://dascore.org/tutorial/spool.html) of data first time reading the DAS data directory:
Using the _config_user_ script in the das_anomaly directory, define the constants and directory paths for the data, power spectral density (PSD) images, detected anomaly results, etc. You would complete adding the values and paths as you go over the steps mentioned below. Then, using DASCore, create an index file for the [spool](https://dascore.org/tutorial/spool.html) of data first time reading the DAS data directory:

### Example
```python
Expand All @@ -99,17 +99,17 @@ To ensure all PSD images share the same colorbar scale (in RGB), determine an ap
from das_anomaly.psd import PSDConfig, PSDGenerator
from das_anomaly.settings import SETTINGS

# path to one or a few background noise data
# path to one or a few background noise patches
bn_data_path = SETTINGS.BN_DATA_PATH
cfg = PSDConfig(data_path=bn_data_path)
gen = PSDGenerator(cfg)
percentile = 90 # data dependent
percentile = 90 # data dependent - need visual inspection
clip_val = gen.run_get_psd_val(percentile=percentile)
print(f"Mean {percentile}-percentile amplitude across all patches: {clip_val:.3e}")
```
3. Generate PSD plots:

Use the `das_anomaly.psd` module and create PSD plots in RGB format and in plain mode (with no axes or colorbar). The `das_anomaly.psd.PSDGenerator reads DAS data, creates a spool using DASCore library, applies a detrend function to each patch of the chunked spool, and then average the energy over a desired time window and stack all channels together to create a spatial PSD with channels on the X-axis and frequency on the Y-axis. You can use MPI to distribute reading data and plotting PSDs over CPUs.
Use the `das_anomaly.psd` module and create PSD plots in RGB format and in plain mode (with no axes or colorbar). The `das_anomaly.psd.PSDGenerator` reads DAS data, creates a spool using DASCore library, applies a detrend function to each patch of the chunked spool, and then average the energy over a desired time window and stack all channels together to create a spatial PSD image with channels on the X-axis and frequency on the Y-axis. You can use MPI to embarrassingly distribute reading data and plotting PSDs over CPUs.
### Example
```python
from das_anomaly.psd import PSDConfig, PSDGenerator
Expand All @@ -121,23 +121,25 @@ PSDGenerator(cfg).run()
PSDGenerator(cfg).run_parallel()
```
Note: If you'd like to use PSDs for purposes other than training the model, the `hide_axes=False` will plot the PSD with axes and colorbar (default is True).

### Example
```python
from das_anomaly.psd import PSDConfig, PSDGenerator

cfg = PSDConfig(hide_axes=False)
# serial processing with single processor:
PSDGenerator(cfg).run()
# parallel processing with multiple processors using MPI:
# parallel processing with multiple processors using MPI (first, make sure you've installed the package with all dependencies explained above):
PSDGenerator(cfg).run_parallel()
```
4. Select and copy known anomaly PSD plots:

From the generated PSD plots, identify and copy examples of known anomalies to the ANOMALY_IMAGES_PATH specified in the _config_user_ input script. These anomalies can include events such as earthquakes from an existing catalog, instrument noise, anthropogenic disturbances, etc. Including these examples helps improve thresholding during the detection process.
From the generated PSD plots, visually identify and then copy examples of known anomalies to the ANOMALY_IMAGES_PATH specified in the _config_user_ input script. These anomalies can include events such as earthquakes from an existing catalog, instrument noise, anthropogenic disturbances, etc. Including these examples helps improve thresholding during the detection process.

5. Train:

The `das_anomaly.train` module helps with randomly selecting train and test PSD images and training the model (with CPU or GPU) on anomaly-free PSD images.

### Example
```python
from das_anomaly.settings import SETTINGS
Expand All @@ -151,7 +153,7 @@ ImageSplitter(cfg).run()
cfg = TrainAEConfig()
AutoencoderTrainer(cfg).run()
```
Note: Since the `TrainSplitConfig()` function randomly selects PSD images from the generated plots, you must ensure the training and testing datasets do not include obvious anomalies. If you have an excel sheet with time stamp of anomalies (such as a catalog), use the "exclude_known_events_from_training" in examples directory to exclude them. Or, manually inspect both the training and testing sets to ensure they do not contain apparent anomalies. Review their time- and frequency-domain representations, and remove any suspicious samples to maintain the quality of training.
Note: Since the `TrainSplitConfig()` function randomly selects PSD images from the generated plots, you must ensure the training and testing datasets do not include obvious anomalies. If you have an excel sheet with time stamp of anomalies (such as a catalog), use the "exclude_known_events_from_training" in examples directory to exclude them. Or, manually inspect both the training and testing sets to ensure they do not contain apparent anomalies. Review their time- and frequency-domain plots, and remove any suspicious samples to maintain the quality of training.

6. Test and set thresholds:

Expand All @@ -160,6 +162,7 @@ Using the _validate_and_plot_density_ and _thresholding_f_score_ jupyter noteboo
7. Run the trained model:

The `das_anomaly.detect` module uses the trained model to detect anomalies in the PSD images and writes their information (e.g., time stamp). It also copies the detected anomaly to the RESULTS_PATH. MPI can be used to distribute PSDs over CPUs. Then, using the `das_anomaly.count` module, count the number of detected anomalies and display their details and file paths.

### Example
```python
from das_anomaly.count.counter import CounterConfig, AnomalyCounter
Expand All @@ -181,6 +184,6 @@ print(anomalies) # prints info on number of anomalies and path to them
Still under development. Use with caution.

## Contact
Ahmad Tourei, Colorado School of Mines

tourei@mines.edu | ahmadtourei@gmail.com
Ahmad Tourei
Colorado School of Mines
ahmadtourei@gmail.com
9 changes: 5 additions & 4 deletions das_anomaly/utils.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
"""
Utility functions for anomaly detection in DAS datasets using autoencoders.
Utility functions for the package.
"""

from __future__ import annotations
Expand All @@ -10,13 +10,14 @@
import matplotlib

matplotlib.use("Agg")

import matplotlib.pyplot as plt
import numpy as np
import scipy.fftpack as ft
import tensorflow as tf
from PIL import Image
from matplotlib import gridspec
from matplotlib.colors import LinearSegmentedColormap
from PIL import Image
from tensorflow.keras.layers import Conv2D, MaxPooling2D, UpSampling2D
from tensorflow.keras.models import Sequential

Expand Down Expand Up @@ -151,7 +152,7 @@ def decoder(


def density(encoder_model, batch_images, kde):
"""Caulculate the density score."""
"""Caulculate the density score for the a batch of PSDs."""
# Flatten the encoder output because KDE from sklearn takes 1D vectors as input
encoder_output_shape = encoder_model.output_shape
out_vector_shape = (
Expand Down Expand Up @@ -269,7 +270,7 @@ def plot_spec(
hide_axes=True,
save_fig=True,
):
"""Save the power spectral density (Channel-Frequency-Amplitude) plot."""
"""Plot and/or save the spatial power spectral density (Channel-Frequency-Amplitude) image."""
# Get the data
strain_rate = patch_strain.transpose("time", "distance").data # pragma: no cover
# Get coords info
Expand Down
22 changes: 22 additions & 0 deletions examples/bash_jobs/detect.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
#!/bin/bash
#SBATCH --ntasks=1
#SBATCH -t 24:00:00
#SBATCH -A YOUR_ACCOUNT
#SBATCH --mem-per-cpu=128G

# Print start time
echo "Job started at: $(date)"

# Load modules and environment
source activate dasanomaly

# Run the script
python << EOF
from das_anomaly.detect import AnomalyDetector, DetectConfig

cfg = DetectConfig()
AnomalyDetector(cfg).run()
EOF

# Print end time
echo "Job ended at: $(date)"
2 changes: 1 addition & 1 deletion examples/bash_jobs/detect_mpi.sh
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ echo "Job started at: $(date)"
module load openmpi/gcc/64/4.1.5
source activate dasanomaly

# Recommended in a SLURM environment: use srun, not mpirun
# Run with MPI
mpirun -n $SLURM_NTASKS python -u detect_parallel.py

# Print end time
Expand Down
2 changes: 1 addition & 1 deletion examples/bash_jobs/psd_mpi.sh
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ echo "Job started at: $(date)"
module load openmpi/gcc/64/4.1.5
source activate dasanomaly

# Recommended in a SLURM environment: use srun, not mpirun
# Run with MPI
mpirun -n $SLURM_NTASKS python -u psd_parallel.py

# Print end time
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,15 @@
"from pathlib import Path\n",
"import re\n",
"\n",
"from das_anomaly.settings import SETTINGS\n",
"\n",
"\n",
"from das_anomaly.settings import SETTINGS"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Replace 'events.csv' with the path to your CSV file\n",
"file_path = 'events.csv'\n",
"\n",
Expand Down
350 changes: 350 additions & 0 deletions examples/hyperparameter_tuning.ipynb

Large diffs are not rendered by default.

128 changes: 128 additions & 0 deletions examples/plot_psd.ipynb

Large diffs are not rendered by default.

Loading