An interactive machine learning explainability dashboard built with Streamlit, scikit-learn, and SHAP.
Trains a RandomForestClassifier and makes its predictions fully transparent through global and local SHAP explanations.
Machine learning models are increasingly used to support high-stakes decisions in healthcare, finance, and beyond. Without explainability, a model is a black box — its predictions cannot be audited, trusted, or safely deployed.
Explainable AI (XAI) bridges this gap by answering:
- Which features drove this prediction?
- How much did each feature contribute — positively or negatively?
- Is the model relying on spurious patterns or meaningful signals?
SHAP (SHapley Additive exPlanations) is the gold-standard framework for model-agnostic, theoretically grounded feature attribution.
| Feature | Description |
|---|---|
| 📊 Model metrics | Accuracy, Precision, Recall, F1-score, ROC-AUC displayed as rich metric cards |
| 🌍 Global explanations | Mean absolute SHAP values ranked across all test samples |
| 🔍 Local explanations | Per-sample SHAP waterfall showing top positive/negative contributions |
| 🎛️ Interactive explorer | Slider to select any test sample and inspect the model's reasoning |
| 🖼️ Static chart export | SHAP summary bar chart saved to outputs/shap_summary.png |
| 📋 Dataset overview | Feature matrix preview and class distribution |
model-explainability-dashboard/
│
├── app.py # Streamlit dashboard (main entry point)
│
├── src/
│ ├── __init__.py
│ ├── data.py # Dataset loading & splitting
│ ├── train.py # Model training, evaluation & persistence
│ ├── explain.py # SHAP explainability utilities
│ └── utils.py # Shared helpers
│
├── models/
│ ├── .gitkeep
│ └── model.joblib # Trained model (generated — git-ignored)
│
├── outputs/
│ ├── .gitkeep
│ ├── metrics.json # Evaluation metrics (generated — git-ignored)
│ └── shap_summary.png # Static SHAP chart (generated — git-ignored)
│
├── requirements.txt
├── .gitignore
└── README.md
git clone https://github.com/<your-username>/model-explainability-dashboard.git
cd model-explainability-dashboardpython -m venv .venv
source .venv/bin/activate # macOS / Linux
# .venv\Scripts\activate # Windowspip install -r requirements.txtpython -m src.trainThis will:
- Load the Breast Cancer Wisconsin dataset from
sklearn.datasets. - Split it into train (80%) and test (20%) sets.
- Train a RandomForestClassifier with 200 trees.
- Evaluate accuracy, precision, recall, F1-score, and ROC-AUC.
- Save the trained model →
models/model.joblib - Save evaluation metrics →
outputs/metrics.json - Print a full training report to the console.
Expected output:
🔬 Model Explainability Dashboard — Training Pipeline
=======================================================
[1/4] Loading dataset …
Samples: 569 | Features: 30
[2/4] Splitting into train / test sets …
Train: 455 | Test: 114
[3/4] Training RandomForestClassifier …
Trees: 200
[4/4] Evaluating & saving artefacts …
✓ Model saved → models/model.joblib
✓ Metrics saved → outputs/metrics.json
📊 Test-set results:
accuracy 0.9737
precision 0.9722
recall 0.9859
f1_score 0.9790
roc_auc 0.9974
n_test_samples 114
n_estimators 200
✅ Done! Run `streamlit run app.py` to open the dashboard.
streamlit run app.pyThe app opens at http://localhost:8501 and provides:
- Dataset overview and class distribution
- Model performance metric cards
- Global feature importance bar chart (SHAP)
- Per-sample local explanation with interactive slider
- Positive/negative SHAP contribution breakdown
| Library | Purpose |
|---|---|
| scikit-learn | RandomForestClassifier, dataset, metrics |
| SHAP | TreeExplainer, global & local attributions |
| Streamlit | Interactive web dashboard |
| pandas | Data manipulation |
| NumPy | Numerical computation |
| Altair | Interactive Vega-Lite charts |
| matplotlib | Static SHAP chart export |
| joblib | Model serialisation |
This is an educational demo only.
The model and SHAP explanations are provided for learning and portfolio purposes.
They should not be used for any clinical, diagnostic, or medical decision-making.
The Breast Cancer Wisconsin dataset is a public benchmark dataset widely used in ML research.
MIT License — feel free to fork, extend, and learn from this project.