SmartHire

1. Three-Stage Pipeline Architecture & Enterprise Workflow

This architecture was specifically designed and deployed for a client needing an automated, highly scalable Applicant Tracking System (ATS).

The Production Flow:

Webhook Ingestion: The client receives CVs externally which are piped directly into the system via webhooks.
Database Persistence: Resumes are instantly stored into the company database for caching and record retrieval.
Automated ML Screening: The multi-model pipeline described below intercepts the candidate batch, matching their background precisely against the target criteria.
Slack Integration: A detailed breakdown (including explicit matches, critical missing skills, and predictive HIRE/REJECT probabilities) is automatically broadcasted directly to the recruiting team via Slack messages, eliminating hundreds of manual review hours.

SmartHire utilizes a cascaded multi-model architecture pattern similar to production recommendation systems used by large-scale platforms (e.g., LinkedIn, Google Search). It features a cascaded sequence:

Stage 1 (Fast Filter): Classical NLP TF-IDF filters massive pools of candidates efficiently.
Stage 2 (Deep Ranker): NLP Transformer embeddings thoroughly understand semantics of survivors.
Stage 3 (Final Decision): XGBoost ensemble uses prior ML metadata and manual heuristics to formulate a definitive choice.

2. Model 1 — TF-IDF Filter (Classical NLP)

Runs incredibly fast. Transforms JDs and Candidates into mathematical frequencies (Term Frequency - Inverse Document Frequency). Identifies keyword overlap perfectly, cutting off non-compliant candidates in ~5ms.

3. Model 2 — Sentence Transformers (Deep Learning)

Based on all-MiniLM-L6-v2, this 22 million parameter model converts paragraphs into 384-dimensional dense vectors. Uses cosine similarity to determine contextual equivalence (e.g., scoring 'software engineer' highly against 'developer' even though the exact string doesn't match).

4. Model 3 — XGBoost (Ensemble Decision)

An ensemble of decision trees. Incorporates signals returned from Model 1 and Model 2 along with user-guided heuristics (has_required_degree, years_experience_estimated). Evaluates features deeply to categorize the applicant into HIRE, MAYBE, or REJECT.

5. Why 3 models instead of 1:

"A single powerful model would miss the efficiency gains of early filtering. Running a transformer on 1000 resumes costs 50 seconds. Running TF-IDF first reduces the transformer workload to 300 resumes — 15 seconds total. This cascade architecture is used in production search systems."

6. Benchmark table

Metric	TF-IDF (Stage 1)	Semantics (Stage 2)	XGBoost (Stage 3)
Speed/Candidate	2-5ms	50ms	1-2ms
Accuracy / Quality	Low Semantic / High Absolute	High Semantic / Contextual	High Analytical
Cost / 1000 cands	Negligible CPU	Significant GPU/CPU	Negligible CPU

7. API docs, constraints, deployment

Run uvicorn src.api.main:app --reload to start the service.

POST /api/v1/screen: Supply a JD (>100 words) and list of candidate dictionaries (<50 per batch).
GET /api/v1/pipeline/explain: Get architecture details.
GET /api/v1/models/{id}/info: Explore detailed model metadata.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
dashboard		dashboard
frontend		frontend
notebooks		notebooks
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
MODELS.md		MODELS.md
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SmartHire

1. Three-Stage Pipeline Architecture & Enterprise Workflow

2. Model 1 — TF-IDF Filter (Classical NLP)

3. Model 2 — Sentence Transformers (Deep Learning)

4. Model 3 — XGBoost (Ensemble Decision)

5. Why 3 models instead of 1:

6. Benchmark table

7. API docs, constraints, deployment

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SmartHire

1. Three-Stage Pipeline Architecture & Enterprise Workflow

2. Model 1 — TF-IDF Filter (Classical NLP)

3. Model 2 — Sentence Transformers (Deep Learning)

4. Model 3 — XGBoost (Ensemble Decision)

5. Why 3 models instead of 1:

6. Benchmark table

7. API docs, constraints, deployment

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages