EY Data & AI Challenge 2026 - Optimizing Clean Water Supply

This repository contains my feature engineering and modeling workflow for the EY challenge focused on forecasting river water quality in South Africa.

My final score ($R^2$) is 0.4079

Challenge Description

The challenge goal was to predict three features for descibing water quality:

total alkalinity,
electrical conductance (salinity proxy),
dissolved reactive phosphorus.

The core training data spans 2011-2015 and contains river sampling records with latitude, longitude, sample date, and observed targets from roughly 200 locations in South Africa. The validation set includes coordinates and dates from different regions, so strong solutions must generalize spatially, not just memorize sites.

Beyond accuracy, the challenge asks participants to identify which environmental factors drive water quality variation. Final ranking is based on average $R^2$ across the three targets.

Official page: 2026 Optimizing Clean Water Supply

Solution Overview & Approach Summary

I combined geospatial feature engineering with practical modeling workflow to improve water quality prediction and spatial generalization across South African river regions.

At a high level, the pipeline extracts:

Surface/spectral information from Landsat imagery (visible, NIR, SWIR, thermal, and derived water indices).
Topographic context from Copernicus DEM (elevation, slope, aspect).
Monthly hydro-climate context from TerraClimate (precipitation, evapotranspiration, drought/soil moisture, temperature and related variables).

It then adds temporal and derived features, organizes outputs into reproducible datasets

Repository Tree

EY_data_challenge/
├── README.md
├── requirements.txt
├── data/
│   ├── extraction/
│   │   ├── copernicusDEM_data_extraction.py
│   │   ├── landsat_data_extraction.py
│   │   └── terraclimate_data_extraction.py
│   ├── preprocessing/
│   │   ├── preprocess_landsat.py
│   │   └── preprocess_terraclimate.py
│   └── old/
│       └── EDA.ipynb
└── model/
    ├── model.ipynb
    └── tries.md

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
data		data
model		model
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EY Data & AI Challenge 2026 - Optimizing Clean Water Supply

Challenge Description

Solution Overview & Approach Summary

Repository Tree

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

EY Data & AI Challenge 2026 - Optimizing Clean Water Supply

Challenge Description

Solution Overview & Approach Summary

Repository Tree

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages