TalentMiner

TalentMiner is an AI-powered resume screening platform that matches a candidate resume against a job description and returns:

Match score (0-100)
Match status (Good Match, Average Match, Poor Match)
Predicted resume category
Matched and missing skills
Actionable improvement suggestions

It includes a Python Flask backend, a React + TypeScript frontend, and a machine learning pipeline for resume category classification.

Features

Resume and job description analysis via API (/api/analysis)
Resume upload flow with text extraction support in frontend
ATS-style skill gap analysis (matched vs missing keywords)
Weighted scoring from semantic similarity and skill-priority matching
Resume category prediction using trained ML artifacts
Optional semantic engine using SentenceTransformers (all-MiniLM-L6-v2)
Auth flow: signup, login, logout, profile update, password change

Tech Stack

Frontend: React, TypeScript, Vite, Tailwind CSS, shadcn/ui, React Router
Backend: Flask, Flask-CORS, scikit-learn, joblib, Werkzeug auth
Optional NLP/ATS extras: pypdf, sentence-transformers
Model training: pandas, numpy, scikit-learn, nltk

Project Structure

TalentMiner/
  backend/                  # Flask API and auth/session logic
  dataset/                  # Resume.csv dataset
  frontend/
    hassan-code-canvas-main/  # React + Vite client app
  model/                    # Training scripts, inference check, model artifacts
  notebook/                 # Phase notebooks and reports
  PROJECT_REPORT.md         # Detailed project report

How Scoring Works

Final score combines:

Semantic similarity between resume and JD
Priority-weighted skill coverage from JD keyword ranking

Formula used in backend:

final_score = 0.55 * semantic_score + 0.45 * skill_score
score = clamp(final_score, 0, 1) * 100

Where skill priority is inferred from JD wording (for example: required, mandatory, preferred, bonus).

Model Details

Current model metadata (model/metadata.json):

Best model: linear_svc
Dataset rows used: 2484
Classes: 24
Vectorizer: TF-IDF (1,2)-grams, max_features=5000
Dimensionality reduction: TruncatedSVD (100 components)
Best macro F1: ~0.601

Artifacts expected in `model/`

best_model.pkl
tfidf_vectorizer.pkl
label_encoder.pkl
svd.pkl
metadata.json

Setup and Run

1) Backend (Flask)

From project root:

python -m venv .venv
.venv\Scripts\activate
pip install -r backend/requirements.txt

Optional advanced ATS features:

pip install -r backend/requirements-ats-advanced.txt

Run backend:

python backend/app.py

Backend runs on:

http://127.0.0.1:5000

Health check:

GET http://127.0.0.1:5000/api/health

2) Frontend (React + Vite)

cd frontend/hassan-code-canvas-main
npm install
npm run dev

Frontend default dev URL:

http://localhost:5173

API base URL is read from VITE_API_BASE_URL and defaults to:

http://127.0.0.1:5000

You can create a .env in frontend/hassan-code-canvas-main:

VITE_API_BASE_URL=http://127.0.0.1:5000

API Endpoints

Health and analysis

GET /api/health
POST /api/analysis
- Body: { "resumeText": string, "jobDescription": string }
POST /api/analysis-upload
- Multipart with resume file + job description

Auth and profile

POST /api/auth/signup
POST /api/auth/login
POST /api/auth/logout
GET /api/auth/me
PUT /api/profile
POST /api/change-password

Training and Inference

Install model dependencies:

pip install -r model/requirements.txt

Train and export artifacts:

python model/train_and_export.py

Run inference smoke check:

python model/inference_check.py

Optional custom text:

python model/inference_check.py --text "Experienced Python developer building REST APIs and ML pipelines"

Notes and Limitations

Sessions/tokens are in-memory in backend and reset when server restarts.
User persistence is JSON-file based (backend/users.json) and intended for demo/small-scale usage.
BERT loading can increase startup time when enabled.
For production, use a real database and a production WSGI server.

Roadmap Ideas

JWT-based auth with refresh tokens and expiry
Persistent database for users and analysis history
Exportable PDF/CSV analysis reports
Better confidence calibration and per-category explainability
Automated regression and integration testing

References

Full report: PROJECT_REPORT.md
Model docs: model/README.md
Phase notebooks: notebook/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TalentMiner

Features

Tech Stack

Project Structure

How Scoring Works

Model Details

Artifacts expected in `model/`

Setup and Run

1) Backend (Flask)

2) Frontend (React + Vite)

API Endpoints

Health and analysis

Auth and profile

Training and Inference

Notes and Limitations

Roadmap Ideas

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
backend		backend
dataset		dataset
frontend/hassan-code-canvas-main		frontend/hassan-code-canvas-main
model		model
notebook		notebook
.gitignore		.gitignore
PROJECT_REPORT.md		PROJECT_REPORT.md
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

TalentMiner

Features

Tech Stack

Project Structure

How Scoring Works

Model Details

Artifacts expected in model/

Setup and Run

1) Backend (Flask)

2) Frontend (React + Vite)

API Endpoints

Health and analysis

Auth and profile

Training and Inference

Notes and Limitations

Roadmap Ideas

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Artifacts expected in `model/`

Packages