Skip to content

UW-MLGEO/ML2026_Orrand

Repository files navigation

Mary Orrand - Machine Learning Class Project Repo

This folder contains the complete workflow for preparing oceanographic satellite and buoy data for machine learning model training. The project combines NOAA satellite SST data with buoy-measured water chemistry (pCO2) observations across 7 coastal monitoring locations.

Project Overview

  • Goal: Generate cleaned, ML-ready training datasets using measured (non-interpolated) continuous buoy data periods
  • Data Sources:
    • Satellite SST: JPL MUR (0.042° resolution, ~4.6 km)
    • Satellite Chl-a: MODIS-Aqua satellite data
    • Buoys: NOAA water chemistry (7 locations, 2013-2025)
  • Key Output: Training tables with matched satellite/buoy observations within 4 km spatial grid

Directory Structure


## Data Source Credits

- **Chlorophyll-a (chl-a) data:** MODIS-Aqua, NASA Goddard Space Flight Center, Ocean Ecology Laboratory, Ocean Biology Processing Group; (2022): MODIS-Aqua Ocean Color Data, NASA OB.DAAC. https://oceancolor.gsfc.nasa.gov/
- **Sea Surface Temperature (SST):** JPL MUR, NASA PO.DAAC. https://podaac.jpl.nasa.gov/Multi-scale_Ultra-high_Resolution_MUR-SST
- **Buoy data:** NOAA National Data Buoy Center. https://www.ndbc.noaa.gov/
ML2026_Orrand/
├── README.md
├── requirements.txt
├── create_presentation.py
├── pCO2_ML_Presentation.pptx
├── data/
│   ├── DATA_ANALYSIS_WORKFLOW.md
│   ├── DATA_REGENERATION_GUIDE.md
│   ├── processed/
│   │   ├── buoy_continuous_data_periods.csv
│   │   ├── buoy_daily_agg.csv
│   │   ├── buoy_data_cleaned.csv
│   │   ├── combined_satellite_buoy.csv
│   │   ├── DATASET_PRESENTATION_SUMMARY.txt
│   │   ├── ml_data_clean_unscaled.csv
│   │   ├── satellite_sst_cleaned.csv
│   │   ├── satellite_sst_daily.csv
│   │   ├── sat_chla_pixels_test_70rows.csv
│   │   ├── sat_sst_pixels_test_70rows.csv
│   │   ├── training_data_700.csv
│   │   ├── training_data_700_ml_ready.csv
│   │   ├── train_anchors_sample_2000.csv
│   │   └── mur/
│   ├── raw/
│   │   ├── buoy_sources/
│   │   ├── chla/
│   │   ├── mur/
│   │   └── satellite_sources/
│   └── training/
│       ├── ml_data_minmax_scaled.csv
│       └── ml_data_standardized.csv
├── docs/
│   ├── data_exploration_report.md
│   └── NOAA_BUOY_DATA_README.md
├── final project deliverables/
│   ├── Mary Orrand multi panel figure ESS 469.pdf
│   ├── pco2_ml_training_summary.md
│   └── summary_figure_6panel.png
├── notebooks/
│   ├── 00_data_download/
│   │   └── RECOVER_DATA_FROM_NOAA.ipynb
│   ├── 01_data_exploration/
│   │   ├── API attempt SST ERDDAP Orrand.ipynb
│   │   ├── Daily data exploration.ipynb
│   │   ├── Data/
│   │   ├── explore_data.ipynb
│   │   └── NOAA buoy data.ipynb
│   ├── 02_data_preparation/
│   │   ├── data/
│   │   ├── ML_DataPrep_SST_pCO2.ipynb
│   │   └── shrink_raw_sat_data.ipynb
│   ├── 03_model_training/
│   │   ├── ML_Training_Continuous_Data.ipynb
│   │   ├── pco2_ml_training.ipynb
│   │   └── pco2_ml_training_summary.md
│   ├── 04_summary_figure.ipynb
│   └── data/
│       └── processed/
├── plots/
│   ├── 01_satellite_sst_overview.png
│   ├── 02_buoy_data_overview.png
│   ├── 03_ml_dataset_dashboard.png
│   ├── 04_geographic_correlation_analysis.png
│   ├── 05_pco2_data_availability_timeline.png
│   ├── ERDDAP SST.png
│   ├── pmel_carbonuptake.jpg
│   ├── presentation_records_per_location.png
│   ├── presentation_sst_vs_pco2_scatter.png
│   ├── summary_figure_6panel.png
│   ├── analysis/
│   ├── exploration/
│   ├── ml_results/
│   └── training_data_eda/
└── .gitignore

Workflow Overview

Phase 1: Data Exploration (01_data_exploration/)

  • Load NOAA satellite and buoy data
  • Inspect data availability and quality across locations
  • Identify continuous measurement windows
  • Generate exploratory plots

Phase 2: Data Preparation (02_data_preparation/)

  • Clean and standardize data (handle -999 nulls, date formats)
  • Create master files combining all locations
  • Scale features (StandardScaler and MinMaxScaler options)
  • Generate quality reports

Phase 3: Training Data Creation (03_model_training/)

  • Filter to continuous data periods only (no interpolation)
  • Create 4 km spatial grid around each buoy location
  • Match satellite observations to buoy dates/locations within grid
  • Export ML-ready training tables with features and target variable

Quick Start

Setup Environment

pip install -r requirements.txt

Run Notebooks in Order

  1. Exploration - Start with notebooks/01_data_exploration/explore_data.ipynb
  2. Preparation - Run notebooks/02_data_preparation/ML_DataPrep_SST_pCO2.ipynb
  3. Training Data - Execute notebooks/03_model_training/ML_Training_Continuous_Data.ipynb

Key Configuration

In ML_Training_Continuous_Data.ipynb, edit the configuration section:

  • APPROACH: Choose 'single', 'multi', or 'all' locations
  • SELECTED_LOCATIONS: List specific buoys to include
  • GRID_RADIUS_KM: Currently set to 4 km (matches satellite resolution)

Data Files Guide

Master Files (Output of Phase 2)

  • buoy_data_cleaned.csv: All 7 buoys, all dates, cleaned values

    • Columns: datetime, latitude, longitude, sst_celsius, pco2_sw_sat, xco2_sw_dry, location
    • ~26,000 records
  • satellite_sst_cleaned.csv: All 6 locations, all dates

    • Columns: datetime, latitude, longitude, sst_celsius, location
    • ~61,500 records

Analysis Files

  • buoy_continuous_data_periods.csv: Data availability windows per location
    • Identifies continuous measurement periods suitable for training

Training Outputs (Phase 3)

  • ml_training_continuous_data_YYYYMMDD.csv: Final training table
    • One row per buoy measurement with matched satellite data
    • No NaN values, all measured (no interpolation)
    • Ready for ML model training

Important Notes

  • Data Quality: All training data uses only measured values; no interpolation or estimation
  • Spatial Resolution: 4 km grid radius chosen to match ~4.6 km satellite resolution
  • Continuous Periods: Training filtered to date windows where buoys had continuous measurements
  • File Paths: Notebooks assume data structure shown above; update paths if reorganizing

Team Members

  • Mary Orrand

Questions & Support

For data documentation details, see docs/NOAA_BUOY_DATA_README.md. For the full data processing workflow, see data/DATA_ANALYSIS_WORKFLOW.md.

About

Mary Orrand class repo

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors