Skip to content

OndrejKutil/EY_data_challenge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EY Data & AI Challenge 2026 - Optimizing Clean Water Supply

This repository contains my feature engineering and modeling workflow for the EY challenge focused on forecasting river water quality in South Africa.

My final score ($R^2$) is 0.4079

Challenge Description

The challenge goal was to predict three features for descibing water quality:

  • total alkalinity,
  • electrical conductance (salinity proxy),
  • dissolved reactive phosphorus.

The core training data spans 2011-2015 and contains river sampling records with latitude, longitude, sample date, and observed targets from roughly 200 locations in South Africa. The validation set includes coordinates and dates from different regions, so strong solutions must generalize spatially, not just memorize sites.

Beyond accuracy, the challenge asks participants to identify which environmental factors drive water quality variation. Final ranking is based on average $R^2$ across the three targets.

Official page: 2026 Optimizing Clean Water Supply

Solution Overview & Approach Summary

I combined geospatial feature engineering with practical modeling workflow to improve water quality prediction and spatial generalization across South African river regions.

At a high level, the pipeline extracts:

  • Surface/spectral information from Landsat imagery (visible, NIR, SWIR, thermal, and derived water indices).
  • Topographic context from Copernicus DEM (elevation, slope, aspect).
  • Monthly hydro-climate context from TerraClimate (precipitation, evapotranspiration, drought/soil moisture, temperature and related variables).

It then adds temporal and derived features, organizes outputs into reproducible datasets

Repository Tree

EY_data_challenge/
├── README.md
├── requirements.txt
├── data/
│   ├── extraction/
│   │   ├── copernicusDEM_data_extraction.py
│   │   ├── landsat_data_extraction.py
│   │   └── terraclimate_data_extraction.py
│   ├── preprocessing/
│   │   ├── preprocess_landsat.py
│   │   └── preprocess_terraclimate.py
│   └── old/
│       └── EDA.ipynb
└── model/
    ├── model.ipynb
    └── tries.md

About

Code for the 2026 EY Data & AI challenge with topic of predicting water quality in rivers in South Africa

Topics

Resources

Stars

Watchers

Forks

Contributors