RecovAI

AI-powered decision support for copper flotation operations

Hindustan Copper Limited · Malanjkhand Copper Project

RecovAI is a full-stack application that helps plant operators and metallurgists predict copper recovery, tune reagent dosing, detect unusual shifts, monitor model drift, and review shift performance. It combines machine learning models with explainability tools and an optional natural-language assistant backed by Groq (Llama 3).

Overview

Copper flotation recovery depends on many interacting variables: head grade, feed rate, pH, pulp density, reagent doses, and more. Manual tracking is slow and often reactive. RecovAI supports day-to-day decisions by:

Predicting recovery (%) for a shift before or after it runs
Recommending reagent doses (SIPX, frother, lime, depressant) toward a target recovery
Flagging anomalous shifts with Isolation Forest scoring
Explaining why a prediction looks high or low (SHAP)
Watching for data drift with Population Stability Index (PSI)
Generating shift summaries and answering plant questions via Groq when configured

The web UI is a single-page application (hcl_recovai_v3.html). The API layer is FastAPI; persistence uses SQLite through SQLAlchemy.

Dataset Collection and Preprocessing

The dataset used in this project was collected from handwritten shift logs of the HCL MCP concentrator plant, which were originally maintained as pen-and-paper records. Since the logs followed a structured operational format, only minimal preprocessing was required. This included:

Digitization of handwritten shift log records into machine-readable format
Date, numerical value, and format standardization
Handling missing or inconsistent entries where required
Organizing and structuring features for model training
Converting processed data into a format suitable for machine learning and system integration

This approach ensured that the original operational characteristics of the plant data were preserved while preparing the dataset for analysis and AI model development.

Screenshots

Dashboard

Recovery prediction

SHAP explainability

Anomaly detection

Shift report

PSI drift monitor

System Architecture

flowchart TB
    subgraph Client["Client layer"]
        UI["hcl_recovai_v3.html\n(Vanilla HTML / CSS / JS)"]
        Assets["assets/\nlogo, shift_data.js"]
    end

    subgraph API["Application layer — FastAPI :8000"]
        Main["main.py\nREST endpoints"]
        DBLayer["database.py\nSQLAlchemy ORM"]
    end

    subgraph Engines["Intelligence engines"]
        E1["Engine 1\nReagent optimizer"]
        E2["Engine 2\nAnomaly detector"]
        E3["Engine 3\nSHAP explainer"]
        E4["Engine 4\nPSI monitor"]
        E5["Engine 5\nNLP / Groq"]
    end

    subgraph ML["Model artifacts — models/"]
        XGB["xgb_model.pkl"]
        RF["rf_model.pkl"]
        LIN["linear_model.pkl"]
        ISO["iso_forest.pkl"]
        SCL["scaler.pkl"]
    end

  subgraph Store["Persistence"]
        SQLite[("recovai.db\nSQLite")]
        CSV["shifts_dataset.csv"]
    end

    UI -->|HTTP JSON| Main
    Main --> E1 & E2 & E3 & E4 & E5
    Main --> XGB & RF & LIN & ISO & SCL
    Main --> DBLayer
    DBLayer --> SQLite
    Main --> CSV
    UI --> Assets

Request Flow

Typical path when an operator submits shift data from the UI:

sequenceDiagram
    participant Op as Operator UI
    participant API as FastAPI main.py
    participant ML as XGBoost / scaler
    participant Eng as Engines 2–4
    participant DB as SQLite

    Op->>API: POST /predict (process + shift metadata)
    API->>ML: Scale features, predict recovery
    API->>Eng: Anomaly score, SHAP top drivers, drift check
    API->>DB: Save shift_predictions row
    API-->>Op: predicted_recovery, anomaly, shap_top5, db_id

    Op->>API: GET /history/predictions
    API->>DB: Query recent rows
    API-->>Op: shift_date, submitted_at, metrics

For chat and AI shift narratives, the UI calls POST /report with either a message (chat) or shift + use_ai (report). Engine 5 uses the Groq API when GROQ_API_KEY is set in .env.

AI Engines

Engine	Module	Role
1	`engine1_reagent.py`	SciPy-based reagent dose optimization (SIPX, frother, lime, depressant)
2	`engine2_anomaly.py`	Isolation Forest anomaly scoring on shift feature vectors
3	`engine3_shap.py`	SHAP TreeExplainer for per-prediction and global importance
4	`engine4_psi.py`	PSI drift monitoring vs training distribution
5	`engine5_nlp.py`	Groq (Llama 3) shift reports and plant Q&A; rule-based fallback

Engine 1 — Reagent optimization

Uses numerical optimization (e.g. L-BFGS-B / differential evolution) to search reagent doses that move predicted recovery toward a target while penalizing excessive chemical use.

Engine 2 — Anomaly detection

An Isolation Forest trained on historical shifts assigns an anomaly score to each submission. Severity bands (example):

Score (typical)	Severity	Suggested action
Above warning threshold	Normal	Continue standard monitoring
Between warning and alert	Warning	Review recent process changes
Below alert threshold	Critical	Investigate before next shift

Engine 3 — SHAP explainability

SHAP values show which inputs pushed recovery up or down for a given prediction. The API returns a compact top-five list for the UI; training scripts can emit full waterfall/beeswarm plots under recovai_output/.

Engine 4 — PSI drift

Population Stability Index compares live feature distributions to the training baseline:

PSI	Interpretation	Action
< 0.10	Stable	Safe to use current models
0.10 – 0.25	Moderate drift	Monitor; plan validation
> 0.25	Strong drift	Consider retraining or recalibration

Engine 5 — NLP (Groq)

When GROQ_API_KEY is present in .env, Engine 5 calls Groq (llama-3.1-8b-instant by default) for shift narratives and chat. If the key is missing or the call fails, the backend returns a structured rule-based response so the UI remains usable.

Models and Data

Training summary

Item	Detail
Shifts	1,778 (Malanjkhand plant data)
Train / test	~80% / ~20% temporal split
Train window	Oct 2024 – Dec 2025
Test window	Jan 2026 – May 2026
Target	`Recovery (%)`
Features	26 process variables (plus engineered lags/rolls where used)

Model comparison (recovery prediction)

Model	R²	RMSE	MAE	Meets target (R² > 0.85)
XGBoost (primary)	0.972	0.139	0.104	Yes
Random Forest	0.910	0.248	0.197	Yes
Linear baseline	—	—	—	Reference only

Target thresholds used in project documentation: R² > 0.85, RMSE < 0.50, MAE < 0.50.

Top XGBoost drivers (global importance)

Feature	Importance
Feed_Condition_Num	0.598
Prev_Recovery (%)	0.105
Conc. Mass Pull (%)	0.057
Tails Grade (%Cu)	0.050
Roll7_Recovery (%)	0.046

Columns excluded from training to avoid leakage include concentrate/tailings mass flows derived from the same shift outcome (e.g. COPPER IN CONCENTRATE (MT), COPPER IN TAILINGS (MT)).

Key process inputs (runtime / predict)

Head grade, feed rate, flotation pH, pulp density, air flow, SIPX, frother, lime, depressant, particle size, water recovery, rougher concentrate grade — aligned with POST /predict in main.py.

Technology Stack

Layer	Technologies
API	FastAPI, Uvicorn, Pydantic v2
ML	XGBoost, scikit-learn (RF, Isolation Forest, scalers)
Explainability	SHAP
Optimization	SciPy
LLM	Groq API (Llama 3 family)
Data	Pandas, NumPy
Database	SQLite, SQLAlchemy 2.x
Frontend	HTML, CSS, JavaScript, Chart.js
Config	python-dotenv (`.env`)
Export	openpyxl (Excel shift reports)

Project Structure

HCL-MCP-AI-project-/
├── main.py                 # FastAPI application (v2.0)
├── database.py             # ORM models, migrations, CRUD
├── hcl_recovai_v3.html     # Main UI
├── shifts_dataset.csv      # Shift data for dashboard / anomalies
├── requirements.txt
│
├── engines/
│   ├── engine1_reagent.py
│   ├── engine2_anomaly.py
│   ├── engine3_shap.py
│   ├── engine4_psi.py
│   └── engine5_nlp.py
│
├── models/                 # Serialized models (gitignored *.pkl in some setups)
├── data/processed/         # Training CSVs
├── recovai_output/         # Training plots and extra artifacts
├── outputs/                # Comparison charts, training_report.txt
├── assets/                 # Static UI assets
│
├── train_all.py            # Train and export models
├── recovai_train.py
├── recov_train_rf.py
└── recov_train_linear.py

Runtime files (not committed): .env, recovai.db, venv/, contacts.json, feedback.json.

Setup

Prerequisites

Python 3.10 or newer
pip
Groq API key for live AI chat and reports — console.groq.com

Installation

git clone https://github.com/dikshadamahe/HCL-MCP-AI-project-.git
cd HCL-MCP-AI-project-

python -m venv venv

# Windows
venv\Scripts\activate

# macOS / Linux
source venv/bin/activate

pip install -r requirements.txt

Create .env in the project root (never commit this file):

GROQ_API_KEY=gsk-your-key-here

Optional: override the database URL (default is local SQLite):

DATABASE_URL=sqlite:///./recovai.db

Train models if models/*.pkl are not already present:

python train_all.py

Running the Application

Terminal 1 — backend

uvicorn main:app --reload --host 127.0.0.1 --port 8000

Check	URL
Health	http://localhost:8000/health
Readiness	http://localhost:8000/api/test
Swagger UI	http://localhost:8000/docs

Terminal 2 — frontend (recommended)

python -m http.server 5500

Open http://localhost:5500/hcl_recovai_v3.html

The UI expects the API at http://localhost:8000. CORS is enabled on the backend for local development.

API Reference

Method	Endpoint	Description
GET	`/health`	Service health and model load count
GET	`/api/test`	Detailed readiness checklist
POST	`/predict`	Predict recovery; persists submission
POST	`/optimize`	Reagent dose recommendations
GET	`/api/anomalies`	Recent shifts with anomaly scores
GET	`/api/importance`	Global feature importance
GET	`/api/heatmap`	Feature correlation matrix
GET	`/api/pdp`	Partial dependence curve data
GET	`/api/dashboard`	Filtered dashboard aggregates
POST	`/report`	Shift report or chat (`message` / `shift`)
GET	`/api/report/download`	Excel shift report
POST	`/api/forecast`	Multi-day recovery forecast
GET	`/history/predictions`	Submission history
GET	`/history/predictions/{id}`	Single submission detail
POST	`/api/contact`	Contact form
POST	`/api/feedback`	User feedback

Interactive documentation: http://localhost:8000/docs

Database

SQLite database file: recovai.db (created on first backend start).

Table	Purpose
`shift_predictions`	Each `/predict` call: inputs, prediction, anomaly/drift summary, optional `shift_date`, `shift`, `operator_name`, `notes`
`shift_reports`	NLP shift reports linked to predictions

History in the UI uses GET /history/predictions, which returns both shift date (operator-entered) and submitted at (server timestamp).

Team

VIT Bhopal University · Intern project with Hindustan Copper Limited (HCL)
Project: Malanjkhand Copper Project — flotation plant optimization

Name	Role
Diksha Damahe	Frontend UI development, FastAPI backend implementation, integrated ML models into the system, image-based dataset generation, and full-stack system integration(connecting frontend, backend, database, and AI modules), Authored project documentation, including the GitHub README, installation guide, and usage instructions
Bhavya Jaiprakash Khatri	Project documentation, technical report, and submission materials
Hiya Porwal	Data manipulation — cleaning, transformation, and supporting data work for training
Ritica Awasthi	Database layer — SQLite integation

Acknowledgements

We thank Hindustan Copper Limited for the opportunity to work on this project as interns. Their domain guidance and support were essential to building a system grounded in real plant operations.

RecovAI — internal decision-support prototype for educational and demonstration purposes at HCL Malanjkhand.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.vscode		.vscode
__pycache__		__pycache__
assets		assets
data/processed		data/processed
engines		engines
models		models
outputs		outputs
recovai_output		recovai_output
.DS_Store		.DS_Store
.gitignore		.gitignore
MCP_PLANT_PR0JECT.code-workspace		MCP_PLANT_PR0JECT.code-workspace
ML_Dataset_Copper_TARGET85.csv		ML_Dataset_Copper_TARGET85.csv
README.md		README.md
__init__.py		__init__.py
database.py		database.py
engine1_reagent.py		engine1_reagent.py
engine2_anomaly.py		engine2_anomaly.py
engine3_shap.py		engine3_shap.py
engine4_psi.py		engine4_psi.py
engine5_nlp.py		engine5_nlp.py
hcl_recovai_v3.html		hcl_recovai_v3.html
main.py		main.py
recov_train_linear.py		recov_train_linear.py
recov_train_rf.py		recov_train_rf.py
recovai_train.py		recovai_train.py
requirements.txt		requirements.txt
requirements_frontend.txt		requirements_frontend.txt
shifts_dataset.csv		shifts_dataset.csv
train_all.py		train_all.py
train_extra_models.py		train_extra_models.py
train_recov_ai.py		train_recov_ai.py

Folders and files

Latest commit

History

Repository files navigation

RecovAI

Table of Contents

Overview

Dataset Collection and Preprocessing

Screenshots

Dashboard

Recovery prediction

SHAP explainability

Anomaly detection

Shift report

PSI drift monitor

System Architecture

Request Flow

AI Engines

Engine 1 — Reagent optimization

Engine 2 — Anomaly detection

Engine 3 — SHAP explainability

Engine 4 — PSI drift

Engine 5 — NLP (Groq)

Models and Data

Training summary

Model comparison (recovery prediction)

Top XGBoost drivers (global importance)

Key process inputs (runtime / predict)

Technology Stack

Project Structure

Setup

Prerequisites

Installation

Running the Application

API Reference

Database

Team

Acknowledgements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages