A collection of Python tools for downloading, subsetting, and analyzing GPM IMERG precipitation data.
This project is aimed at satellite-precipitation workflows where a user needs to collect NASA GPM IMERG files, clip them to a study boundary, and produce quick precipitation summaries for hydrology or drought/flood screening.
- NASA Earthdata/GPM IMERG download workflows
- HDF5 and NetCDF precipitation data handling
- Shapefile-based spatial subsetting
- Reproducible plotting and timestamped analysis outputs
This repository contains tools to:
- Download GPM IMERG data from NASA servers for a specific date range
- Subset the data using a shapefile to focus on a specific geographical area
- Analyze and visualize the precipitation data, with results organized in timestamped folders
- Python 3.7 or higher
- NASA Earthdata login credentials (Register here)
- Dependencies listed in
requirements.txt
-
Clone this repository:
git clone https://github.com/ShaliniBalaram/GPM_IMERG_Tools.git cd GPM_IMERG_Tools -
Install the required dependencies:
pip install -r requirements.txt
The workflow consists of three main steps:
Use download_gpm_data.py to download GPM IMERG data for a specified date range from NASA servers.
python download_gpm_data.py \
--start-date 2023-01-01 \
--end-date 2023-01-31 \
--username "$NASA_EARTHDATA_USERNAME" \
--password "$NASA_EARTHDATA_PASSWORD" \
--download-dir "GPM_IMERG_Data" \
--include-monthlyRequired arguments:
--start-date: Start date (YYYY-MM-DD)--end-date: End date (YYYY-MM-DD)--username: NASA Earthdata username, preferably supplied from an environment variable--password: NASA Earthdata password, preferably supplied from an environment variable
Optional arguments:
--download-dir: Directory to store downloaded files (default: "GPM_IMERG_Data")--url-file: File to save generated URLs (default: "gpm_urls.txt")--include-monthly: Include monthly files in addition to half-hourly files
Use subset_gpm_data.py to spatially subset the downloaded data using a shapefile.
python subset_gpm_data.py \
--input-dir "GPM_IMERG_Data" \
--output-dir "GPM_IMERG_Subset" \
--shapefile "path/to/your/shapefile.shp"Required arguments:
--input-dir: Directory containing downloaded HDF5 files--output-dir: Directory to save subset NetCDF files--shapefile: Path to shapefile for subsetting
Optional arguments:
--pattern: File pattern to match (default: "*.HDF5")--max-files: Maximum number of files to process (for testing)
Use analyze_gpm_data.py to analyze and visualize the subset data.
python analyze_gpm_data.py \
--input-dir "GPM_IMERG_Subset" \
--results-dir "Analysis_Results" \
--num-samples 100 \
--threshold 0.5 \
--top-events 5Required arguments:
--input-dir: Directory containing subset NetCDF files
Optional arguments:
--results-dir: Base directory for results (default: "analysis_results")--num-samples: Number of files to sample (default: 100)--threshold: Minimum precipitation threshold in mm/hr (default: 0.1)--top-events: Number of top precipitation events to analyze (default: 5)--comparison-count: Number of events to include in comparison plots (default: 3)
The analysis results are organized in timestamped folders with the following structure:
analysis_results/
└── gpm_analysis_YYYYMMDD_HHMMSS/
├── analysis_summary.txt
├── single_events/
│ ├── precip_3B-HHR.MS.MRG.3IMERG.YYYYMMDD-SHHMMSS-EHHMMSS.XXXX.V07B_subset.png
│ └── precip_3B-HHR.MS.MRG.3IMERG.YYYYMMDD-SHHMMSS-EHHMMSS.XXXX.V07B_subset_hires.png
└── comparisons/
├── precipitation_comparison_YYYYMMDD_HHMMSS.png
└── precipitation_comparison_YYYYMMDD_HHMMSS_hires.png
This organized structure helps keep track of different analysis runs and their outputs.
- GPM IMERG HDF5 Files: The original data downloaded from NASA servers
- Format:
3B-HHR.MS.MRG.3IMERG.YYYYMMDD-SHHMMSS-EHHMMSS.XXXX.V07B.HDF5 - Available at: NASA GES DISC
- Format:
- Subset NetCDF Files: Spatially clipped precipitation data for the specific region
- Format:
3B-HHR.MS.MRG.3IMERG.YYYYMMDD-SHHMMSS-EHHMMSS.XXXX.V07B_subset.nc
- Format:
- Standard and high-resolution precipitation maps
- Comparison plots of top precipitation events
- Summary text file with key statistics
To use these tools, you'll need a shapefile defining your area of interest. The shapefile should:
- Be in WGS84 (EPSG:4326) coordinate system, or the script will attempt to convert it
- Define the boundary of the area you want to extract precipitation data for
- Include all required shapefile components (.shp, .shx, .dbf, etc.)
All scripts include detailed logging that is saved to a timestamped log file in the logs directory.
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.
- NASA GES DISC for providing the GPM IMERG data
- The GPM mission for producing high-quality precipitation measurements