Golib Sanaev — November 2025
esg-classification explores the relationship between Environmental, Social, and Governance (ESG) indicators and firm-level financial performance using modern machine learning techniques.
The project uses the ESG and Financial Performance Dataset from Kaggle (shared by Shriyash Jagtap), a structured dataset simulating corporate ESG metrics and financial outcomes.
It applies algorithms such as Logistic Regression, Random Forests, and Gradient Boosting (XGBoost) to classify firms according to sustainability-driven performance patterns.
This work serves both as:
- a learning framework for applied machine learning classification, and
- an analytical study of how ESG pillars interact with firm-level financial data.
| Notebook | Title | Description |
|---|---|---|
| 01 — Data Exploration and Preparation | Data simulation and preprocessing | Generates or loads the ESG dataset, handles feature engineering, adds noise to mimic real-world data variability, and constructs the target label (ESG_Class). |
| 02 — Modeling and Classification | Model training & evaluation | Trains Logistic Regression, Random Forest, and XGBoost classifiers. Evaluates accuracy, precision, recall, F1, and ROC-AUC. |
| 02b — Realistic Modeling Refinement | Controlled noise introduction | Adds measurement noise and inter-pillar correlations to improve model realism. |
| 03 — Model Evaluation and Interpretation | Threshold tuning, calibration, explainability | Includes cross-validation, threshold trade-offs, reliability curves, and SHAP-based explainability. |
| 04 — ESG Insights and Reporting | Scenario analysis & interpretability | Analyzes feature-level and grouped SHAP values to understand what drives ESG classifications. |
| 05 — ESG Reporting & Visualization Dashboard | Interactive analytics | Builds a live interactive dashboard using ipywidgets and plotly, allowing scenario-based experimentation. |
- Examine how ESG indicators relate to firm financial outcomes.
- Benchmark several classification algorithms on ESG features.
- Identify which ESG dimensions (E, S, or G) most strongly influence firm classification.
- Provide interpretable, reproducible, and interactive model evaluation results.
- Introduced realistic measurement noise and correlation between ESG pillars.
- Simulated imperfections to mimic real-world ESG ratings behavior.
| Model | Mean ROC-AUC (CV) | Test ROC-AUC | Remarks |
|---|---|---|---|
| Logistic Regression | 0.93 | 0.93 | Interpretable and stable under moderate noise |
| Random Forest | 0.92 | 0.92 | Handles non-linearities well |
| XGBoost | 0.93+ | 0.92–0.93 | Best-performing, well-calibrated model |
- Used cross-validation for generalization.
- Tuned classification thresholds (balanced at ~0.37 for F1).
- Applied calibration (Brier score ~0.11) to align predicted probabilities.
- Partial Dependence Plots (PDPs) to show marginal effects.
- SHAP analysis ranked Governance and Social as most influential ESG pillars.
- Grouped SHAP impacts provided aggregated ESG vs. financial dimension importance.
| Dimension | Mean |SHAP Value| | |:-----------|----------------:| | Governance | 2.5e-01 | | Social | 2.4e-01 | | Environmental | 2.4e-02 | | Financial | 1.3e-02 |
Key insight: Governance and Social pillars contribute the most to High ESG classification.
| Category | Libraries / Tools |
|---|---|
| Core | Python 3.13, Pandas, NumPy |
| Modeling | scikit-learn, XGBoost |
| Visualization | Matplotlib, Seaborn, Plotly |
| Interpretability | SHAP, PartialDependenceDisplay |
| Interactivity | ipywidgets, Plotly Dash |
| Environment | Jupyter Notebook, uv / venv |
The base ESG dataset originates from:
📘 ESG and Financial Performance Dataset
by Shriyash Jagtap on Kaggle.
This dataset provided the foundation for simulated ESG data exploration and classification modeling.
All preprocessing, noise generation, and label creation were performed for educational purposes.
We gratefully acknowledge Shriyash Jagtap for sharing this valuable dataset.
# Clone the repository
git clone https://github.com/gsanaev/esg-classification.git
cd esg-classification
# Sync environment (installs Python and dependencies)
uv sync# Retrieve Kaggle data
uv run python -m src.esg.kaggleThen, execute notebooks in sequence:
# 1. notebooks/01-notebook.ipynb
# 2. notebooks/02-notebook.ipynb
# 3. notebooks/02b-notebook.ipynb
# 4. notebooks/03-notebook.ipynb
# 5. notebooks/04-notebook.ipynb
# 6. notebooks/05-notebook.ipynbSanaev, G. (2025). Classifying Corporate ESG Performance Using Machine Learning
GitHub Repository: github.com/gsanaev/esg-classification
GitHub: @gsanaev
Email: gsanaev@gmail.com
LinkedIn: golib-sanaev
⭐ If you find this project insightful, please give it a star!
Author: Golib Sanaev
Dataset Author: Shriyash Jagtap
Version: v1.0
License: MIT