ipeline - Python Package
A comprehensive, Python-compatible package for collecting, processing, and .analyzing hail events in the DFW (Dallas-Fort Worth) area.
- Data Collection: Downloads hail event data from NOAA Storm Events database
- Geocoding: Reverse geocodes events to add city/neighborhood information
- Property Matching: Matches properties with hail events using geospatial proximity
- Unified API: Clean, object-oriented interface for all pipeline operations
- PowerShell Compatible: Includes PowerShell scripts for easy execution
The package is already included in this project. No additional installation needed if you're using the project's virtual environment.
from dfw_pipeline import DFWPipelineRunner
# Initialize runner
runner = DFWPipelineRunner("config.yaml")
# Run full pipeline
results = runner.run_full_pipeline()
# Or run individual steps
runner.collect_data() # Step 1: Collect from NOAA
runner.geocode_data() # Step 2: Add geocoding
runner.match_properties() # Step 3: Match properties# Run full pipeline
python run_dfw_pipeline.py
# Only collect data
python run_dfw_pipeline.py --collect-only
# Skip geocoding
python run_dfw_pipeline.py --no-geocode
# Only match properties
python run_dfw_pipeline.py --match-only# Run full pipeline
.\run_dfw_pipeline.ps1
# Only collect data
.\run_dfw_pipeline.ps1 -CollectOnly
# Skip geocoding
\run_dfw_pipeline.ps1 -NoGeocode
# Only match properties
.\run_dfw_pipeline.ps1 -MatchOnlydfw_pipeline/
├── __init__.py # Package initialization
├── core/
│ ├── __init__.py
│ ├── data_collector.py # NOAA data collection
│ ├── geocoder.py # Reverse geocoding
│ ├── property_matcher.py # Property matching
│ └── pipeline_runner.py # Unified runner
└── README.md # This file
Downloads and processes hail event data from NOAA.
from dfw_pipeline.core.data_collector import HailDataCollector
collector = HailDataCollector("config.yaml")
df = collector.process_and_save("output.csv")Adds city/neighborhood information to hail events.
from dfw_pipeline.core.geocoder import HailGeocoder
geocoder = HailGeocoder(email="your-email@example.com", delay=1.2)
enriched_df = geocoder.geocode_dataframe(df)Matches properties with hail events based on proximity.
from dfw_pipeline.core.property_matcher import PropertyMatcher
matcher = PropertyMatcher(
min_hail_size_in=1.0,
base_radius_mi=1.0,
radius_per_inch_mi=1.0,
max_radius_mi=5.0
)
matched_df = matcher.process("output.csv")Unified interface for running the complete pipeline.
from dfw_pipeline import DFWPipelineRunner
runner = DFWPipelineRunner("config.yaml")
results = runner.run_full_pipeline()The pipeline uses config.yaml for configuration. Key settings:
years:
- 2024
- 2025
state_fips: '48'
dfw_counties:
- Collin
- Dallas
- Denton
# ... more counties
output_csv: hail_events_dfw_2024_2025.csv
geocode_output_csv: hail_events_dfw_2024_2025_neighborhoods.csv
reverse_geocode:
enabled: true
email_contact: your-email@example.com
sleep_seconds: 1.2The pipeline generates several output files:
- hail_events_dfw_2024_2025.csv - Raw hail events data
- hail_events_dfw_2024_2025_neighborhoods.csv - Geocoded events (if enabled)
- hail_damaged_properties.csv - Properties matched with hail events
The pipeline can be integrated into the Flask application:
from dfw_pipeline import DFWPipelineRunner
@app.route('/api/run-pipeline', methods=['POST'])
def run_pipeline():
runner = DFWPipelineRunner()
results = runner.run_full_pipeline()
return jsonify(results)- Python 3.8+
- pandas
- requests
- pyyaml
- geopy (for geocoding)
- tqdm (optional, for progress bars)
All dependencies are included in requirements.txt.
Part of the StormBuster project.
us