AI-powered decision support for copper flotation operations
Hindustan Copper Limited · Malanjkhand Copper Project
RecovAI is a full-stack application that helps plant operators and metallurgists predict copper recovery, tune reagent dosing, detect unusual shifts, monitor model drift, and review shift performance. It combines machine learning models with explainability tools and an optional natural-language assistant backed by Groq (Llama 3).
- Overview
- Screenshots
- System Architecture
- Request Flow
- AI Engines
- Models and Data
- Technology Stack
- Project Structure
- Setup
- Running the Application
- API Reference
- Database
- Team
- Acknowledgements
Copper flotation recovery depends on many interacting variables: head grade, feed rate, pH, pulp density, reagent doses, and more. Manual tracking is slow and often reactive. RecovAI supports day-to-day decisions by:
- Predicting recovery (%) for a shift before or after it runs
- Recommending reagent doses (SIPX, frother, lime, depressant) toward a target recovery
- Flagging anomalous shifts with Isolation Forest scoring
- Explaining why a prediction looks high or low (SHAP)
- Watching for data drift with Population Stability Index (PSI)
- Generating shift summaries and answering plant questions via Groq when configured
The web UI is a single-page application (hcl_recovai_v3.html). The API layer is FastAPI; persistence uses SQLite through SQLAlchemy.
The dataset used in this project was collected from handwritten shift logs of the HCL MCP concentrator plant, which were originally maintained as pen-and-paper records. Since the logs followed a structured operational format, only minimal preprocessing was required. This included:
- Digitization of handwritten shift log records into machine-readable format
- Date, numerical value, and format standardization
- Handling missing or inconsistent entries where required
- Organizing and structuring features for model training
- Converting processed data into a format suitable for machine learning and system integration
This approach ensured that the original operational characteristics of the plant data were preserved while preparing the dataset for analysis and AI model development.
flowchart TB
subgraph Client["Client layer"]
UI["hcl_recovai_v3.html\n(Vanilla HTML / CSS / JS)"]
Assets["assets/\nlogo, shift_data.js"]
end
subgraph API["Application layer — FastAPI :8000"]
Main["main.py\nREST endpoints"]
DBLayer["database.py\nSQLAlchemy ORM"]
end
subgraph Engines["Intelligence engines"]
E1["Engine 1\nReagent optimizer"]
E2["Engine 2\nAnomaly detector"]
E3["Engine 3\nSHAP explainer"]
E4["Engine 4\nPSI monitor"]
E5["Engine 5\nNLP / Groq"]
end
subgraph ML["Model artifacts — models/"]
XGB["xgb_model.pkl"]
RF["rf_model.pkl"]
LIN["linear_model.pkl"]
ISO["iso_forest.pkl"]
SCL["scaler.pkl"]
end
subgraph Store["Persistence"]
SQLite[("recovai.db\nSQLite")]
CSV["shifts_dataset.csv"]
end
UI -->|HTTP JSON| Main
Main --> E1 & E2 & E3 & E4 & E5
Main --> XGB & RF & LIN & ISO & SCL
Main --> DBLayer
DBLayer --> SQLite
Main --> CSV
UI --> Assets
Typical path when an operator submits shift data from the UI:
sequenceDiagram
participant Op as Operator UI
participant API as FastAPI main.py
participant ML as XGBoost / scaler
participant Eng as Engines 2–4
participant DB as SQLite
Op->>API: POST /predict (process + shift metadata)
API->>ML: Scale features, predict recovery
API->>Eng: Anomaly score, SHAP top drivers, drift check
API->>DB: Save shift_predictions row
API-->>Op: predicted_recovery, anomaly, shap_top5, db_id
Op->>API: GET /history/predictions
API->>DB: Query recent rows
API-->>Op: shift_date, submitted_at, metrics
For chat and AI shift narratives, the UI calls POST /report with either a message (chat) or shift + use_ai (report). Engine 5 uses the Groq API when GROQ_API_KEY is set in .env.
| Engine | Module | Role |
|---|---|---|
| 1 | engine1_reagent.py |
SciPy-based reagent dose optimization (SIPX, frother, lime, depressant) |
| 2 | engine2_anomaly.py |
Isolation Forest anomaly scoring on shift feature vectors |
| 3 | engine3_shap.py |
SHAP TreeExplainer for per-prediction and global importance |
| 4 | engine4_psi.py |
PSI drift monitoring vs training distribution |
| 5 | engine5_nlp.py |
Groq (Llama 3) shift reports and plant Q&A; rule-based fallback |
Uses numerical optimization (e.g. L-BFGS-B / differential evolution) to search reagent doses that move predicted recovery toward a target while penalizing excessive chemical use.
An Isolation Forest trained on historical shifts assigns an anomaly score to each submission. Severity bands (example):
| Score (typical) | Severity | Suggested action |
|---|---|---|
| Above warning threshold | Normal | Continue standard monitoring |
| Between warning and alert | Warning | Review recent process changes |
| Below alert threshold | Critical | Investigate before next shift |
SHAP values show which inputs pushed recovery up or down for a given prediction. The API returns a compact top-five list for the UI; training scripts can emit full waterfall/beeswarm plots under recovai_output/.
Population Stability Index compares live feature distributions to the training baseline:
| PSI | Interpretation | Action |
|---|---|---|
| < 0.10 | Stable | Safe to use current models |
| 0.10 – 0.25 | Moderate drift | Monitor; plan validation |
| > 0.25 | Strong drift | Consider retraining or recalibration |
When GROQ_API_KEY is present in .env, Engine 5 calls Groq (llama-3.1-8b-instant by default) for shift narratives and chat. If the key is missing or the call fails, the backend returns a structured rule-based response so the UI remains usable.
| Item | Detail |
|---|---|
| Shifts | 1,778 (Malanjkhand plant data) |
| Train / test | ~80% / ~20% temporal split |
| Train window | Oct 2024 – Dec 2025 |
| Test window | Jan 2026 – May 2026 |
| Target | Recovery (%) |
| Features | 26 process variables (plus engineered lags/rolls where used) |
| Model | R² | RMSE | MAE | Meets target (R² > 0.85) |
|---|---|---|---|---|
| XGBoost (primary) | 0.972 | 0.139 | 0.104 | Yes |
| Random Forest | 0.910 | 0.248 | 0.197 | Yes |
| Linear baseline | — | — | — | Reference only |
Target thresholds used in project documentation: R² > 0.85, RMSE < 0.50, MAE < 0.50.
| Feature | Importance |
|---|---|
| Feed_Condition_Num | 0.598 |
| Prev_Recovery (%) | 0.105 |
| Conc. Mass Pull (%) | 0.057 |
| Tails Grade (%Cu) | 0.050 |
| Roll7_Recovery (%) | 0.046 |
Columns excluded from training to avoid leakage include concentrate/tailings mass flows derived from the same shift outcome (e.g. COPPER IN CONCENTRATE (MT), COPPER IN TAILINGS (MT)).
Head grade, feed rate, flotation pH, pulp density, air flow, SIPX, frother, lime, depressant, particle size, water recovery, rougher concentrate grade — aligned with POST /predict in main.py.
| Layer | Technologies |
|---|---|
| API | FastAPI, Uvicorn, Pydantic v2 |
| ML | XGBoost, scikit-learn (RF, Isolation Forest, scalers) |
| Explainability | SHAP |
| Optimization | SciPy |
| LLM | Groq API (Llama 3 family) |
| Data | Pandas, NumPy |
| Database | SQLite, SQLAlchemy 2.x |
| Frontend | HTML, CSS, JavaScript, Chart.js |
| Config | python-dotenv (.env) |
| Export | openpyxl (Excel shift reports) |
HCL-MCP-AI-project-/
├── main.py # FastAPI application (v2.0)
├── database.py # ORM models, migrations, CRUD
├── hcl_recovai_v3.html # Main UI
├── shifts_dataset.csv # Shift data for dashboard / anomalies
├── requirements.txt
│
├── engines/
│ ├── engine1_reagent.py
│ ├── engine2_anomaly.py
│ ├── engine3_shap.py
│ ├── engine4_psi.py
│ └── engine5_nlp.py
│
├── models/ # Serialized models (gitignored *.pkl in some setups)
├── data/processed/ # Training CSVs
├── recovai_output/ # Training plots and extra artifacts
├── outputs/ # Comparison charts, training_report.txt
├── assets/ # Static UI assets
│
├── train_all.py # Train and export models
├── recovai_train.py
├── recov_train_rf.py
└── recov_train_linear.py
Runtime files (not committed): .env, recovai.db, venv/, contacts.json, feedback.json.
- Python 3.10 or newer
- pip
- Groq API key for live AI chat and reports — console.groq.com
git clone https://github.com/dikshadamahe/HCL-MCP-AI-project-.git
cd HCL-MCP-AI-project-
python -m venv venv
# Windows
venv\Scripts\activate
# macOS / Linux
source venv/bin/activate
pip install -r requirements.txtCreate .env in the project root (never commit this file):
GROQ_API_KEY=gsk-your-key-hereOptional: override the database URL (default is local SQLite):
DATABASE_URL=sqlite:///./recovai.dbTrain models if models/*.pkl are not already present:
python train_all.pyTerminal 1 — backend
uvicorn main:app --reload --host 127.0.0.1 --port 8000| Check | URL |
|---|---|
| Health | http://localhost:8000/health |
| Readiness | http://localhost:8000/api/test |
| Swagger UI | http://localhost:8000/docs |
Terminal 2 — frontend (recommended)
python -m http.server 5500Open http://localhost:5500/hcl_recovai_v3.html
The UI expects the API at http://localhost:8000. CORS is enabled on the backend for local development.
| Method | Endpoint | Description |
|---|---|---|
| GET | /health |
Service health and model load count |
| GET | /api/test |
Detailed readiness checklist |
| POST | /predict |
Predict recovery; persists submission |
| POST | /optimize |
Reagent dose recommendations |
| GET | /api/anomalies |
Recent shifts with anomaly scores |
| GET | /api/importance |
Global feature importance |
| GET | /api/heatmap |
Feature correlation matrix |
| GET | /api/pdp |
Partial dependence curve data |
| GET | /api/dashboard |
Filtered dashboard aggregates |
| POST | /report |
Shift report or chat (message / shift) |
| GET | /api/report/download |
Excel shift report |
| POST | /api/forecast |
Multi-day recovery forecast |
| GET | /history/predictions |
Submission history |
| GET | /history/predictions/{id} |
Single submission detail |
| POST | /api/contact |
Contact form |
| POST | /api/feedback |
User feedback |
Interactive documentation: http://localhost:8000/docs
SQLite database file: recovai.db (created on first backend start).
| Table | Purpose |
|---|---|
shift_predictions |
Each /predict call: inputs, prediction, anomaly/drift summary, optional shift_date, shift, operator_name, notes |
shift_reports |
NLP shift reports linked to predictions |
History in the UI uses GET /history/predictions, which returns both shift date (operator-entered) and submitted at (server timestamp).
VIT Bhopal University · Intern project with Hindustan Copper Limited (HCL)
Project: Malanjkhand Copper Project — flotation plant optimization
| Name | Role |
|---|---|
| Diksha Damahe | Frontend UI development, FastAPI backend implementation, integrated ML models into the system, image-based dataset generation, and full-stack system integration(connecting frontend, backend, database, and AI modules), Authored project documentation, including the GitHub README, installation guide, and usage instructions |
| Bhavya Jaiprakash Khatri | Project documentation, technical report, and submission materials |
| Hiya Porwal | Data manipulation — cleaning, transformation, and supporting data work for training |
| Ritica Awasthi | Database layer — SQLite integation |
We thank Hindustan Copper Limited for the opportunity to work on this project as interns. Their domain guidance and support were essential to building a system grounded in real plant operations.
RecovAI — internal decision-support prototype for educational and demonstration purposes at HCL Malanjkhand.