TalentMiner is an AI-powered resume screening platform that matches a candidate resume against a job description and returns:
- Match score (0-100)
- Match status (Good Match, Average Match, Poor Match)
- Predicted resume category
- Matched and missing skills
- Actionable improvement suggestions
It includes a Python Flask backend, a React + TypeScript frontend, and a machine learning pipeline for resume category classification.
- Resume and job description analysis via API (
/api/analysis) - Resume upload flow with text extraction support in frontend
- ATS-style skill gap analysis (matched vs missing keywords)
- Weighted scoring from semantic similarity and skill-priority matching
- Resume category prediction using trained ML artifacts
- Optional semantic engine using SentenceTransformers (
all-MiniLM-L6-v2) - Auth flow: signup, login, logout, profile update, password change
- Frontend: React, TypeScript, Vite, Tailwind CSS, shadcn/ui, React Router
- Backend: Flask, Flask-CORS, scikit-learn, joblib, Werkzeug auth
- Optional NLP/ATS extras:
pypdf,sentence-transformers - Model training: pandas, numpy, scikit-learn, nltk
TalentMiner/
backend/ # Flask API and auth/session logic
dataset/ # Resume.csv dataset
frontend/
hassan-code-canvas-main/ # React + Vite client app
model/ # Training scripts, inference check, model artifacts
notebook/ # Phase notebooks and reports
PROJECT_REPORT.md # Detailed project report
Final score combines:
- Semantic similarity between resume and JD
- Priority-weighted skill coverage from JD keyword ranking
Formula used in backend:
final_score = 0.55 * semantic_score + 0.45 * skill_score
score = clamp(final_score, 0, 1) * 100
Where skill priority is inferred from JD wording (for example: required, mandatory, preferred, bonus).
Current model metadata (model/metadata.json):
- Best model:
linear_svc - Dataset rows used: 2484
- Classes: 24
- Vectorizer: TF-IDF (1,2)-grams, max_features=5000
- Dimensionality reduction: TruncatedSVD (100 components)
- Best macro F1: ~0.601
best_model.pkltfidf_vectorizer.pkllabel_encoder.pklsvd.pklmetadata.json
From project root:
python -m venv .venv
.venv\Scripts\activate
pip install -r backend/requirements.txtOptional advanced ATS features:
pip install -r backend/requirements-ats-advanced.txtRun backend:
python backend/app.pyBackend runs on:
http://127.0.0.1:5000
Health check:
GET http://127.0.0.1:5000/api/health
cd frontend/hassan-code-canvas-main
npm install
npm run devFrontend default dev URL:
http://localhost:5173
API base URL is read from VITE_API_BASE_URL and defaults to:
http://127.0.0.1:5000
You can create a .env in frontend/hassan-code-canvas-main:
VITE_API_BASE_URL=http://127.0.0.1:5000GET /api/healthPOST /api/analysis- Body:
{ "resumeText": string, "jobDescription": string }
- Body:
POST /api/analysis-upload- Multipart with resume file + job description
POST /api/auth/signupPOST /api/auth/loginPOST /api/auth/logoutGET /api/auth/mePUT /api/profilePOST /api/change-password
Install model dependencies:
pip install -r model/requirements.txtTrain and export artifacts:
python model/train_and_export.pyRun inference smoke check:
python model/inference_check.pyOptional custom text:
python model/inference_check.py --text "Experienced Python developer building REST APIs and ML pipelines"- Sessions/tokens are in-memory in backend and reset when server restarts.
- User persistence is JSON-file based (
backend/users.json) and intended for demo/small-scale usage. - BERT loading can increase startup time when enabled.
- For production, use a real database and a production WSGI server.
- JWT-based auth with refresh tokens and expiry
- Persistent database for users and analysis history
- Exportable PDF/CSV analysis reports
- Better confidence calibration and per-category explainability
- Automated regression and integration testing
- Full report:
PROJECT_REPORT.md - Model docs:
model/README.md - Phase notebooks:
notebook/