This architecture was specifically designed and deployed for a client needing an automated, highly scalable Applicant Tracking System (ATS).
The Production Flow:
- Webhook Ingestion: The client receives CVs externally which are piped directly into the system via webhooks.
- Database Persistence: Resumes are instantly stored into the company database for caching and record retrieval.
- Automated ML Screening: The multi-model pipeline described below intercepts the candidate batch, matching their background precisely against the target criteria.
- Slack Integration: A detailed breakdown (including explicit matches, critical missing skills, and predictive HIRE/REJECT probabilities) is automatically broadcasted directly to the recruiting team via Slack messages, eliminating hundreds of manual review hours.
SmartHire utilizes a cascaded multi-model architecture pattern similar to production recommendation systems used by large-scale platforms (e.g., LinkedIn, Google Search). It features a cascaded sequence:
- Stage 1 (Fast Filter): Classical NLP TF-IDF filters massive pools of candidates efficiently.
- Stage 2 (Deep Ranker): NLP Transformer embeddings thoroughly understand semantics of survivors.
- Stage 3 (Final Decision): XGBoost ensemble uses prior ML metadata and manual heuristics to formulate a definitive choice.
Runs incredibly fast. Transforms JDs and Candidates into mathematical frequencies (Term Frequency - Inverse Document Frequency). Identifies keyword overlap perfectly, cutting off non-compliant candidates in ~5ms.
Based on all-MiniLM-L6-v2, this 22 million parameter model converts paragraphs into 384-dimensional dense vectors. Uses cosine similarity to determine contextual equivalence (e.g., scoring 'software engineer' highly against 'developer' even though the exact string doesn't match).
An ensemble of decision trees. Incorporates signals returned from Model 1 and Model 2 along with user-guided heuristics (has_required_degree, years_experience_estimated). Evaluates features deeply to categorize the applicant into HIRE, MAYBE, or REJECT.
"A single powerful model would miss the efficiency gains of early filtering. Running a transformer on 1000 resumes costs 50 seconds. Running TF-IDF first reduces the transformer workload to 300 resumes — 15 seconds total. This cascade architecture is used in production search systems."
| Metric | TF-IDF (Stage 1) | Semantics (Stage 2) | XGBoost (Stage 3) |
|---|---|---|---|
| Speed/Candidate | 2-5ms | 50ms | 1-2ms |
| Accuracy / Quality | Low Semantic / High Absolute | High Semantic / Contextual | High Analytical |
| Cost / 1000 cands | Negligible CPU | Significant GPU/CPU | Negligible CPU |
Run uvicorn src.api.main:app --reload to start the service.
- POST /api/v1/screen: Supply a JD (>100 words) and list of candidate dictionaries (<50 per batch).
- GET /api/v1/pipeline/explain: Get architecture details.
- GET /api/v1/models/{id}/info: Explore detailed model metadata.