🔍 Search & Ranking Engine

Production-grade search engine with Learning-to-Rank, neural re-ranking, hybrid retrieval (BM25 + ANN), and real-time query understanding.

📊 System Architecture

graph TD
    Q[User Query] --> QU[Query Understanding]
    QU -->|Intent + Entities| BM25[BM25 Retriever]
    QU -->|Expanded Query| ANN[ANN Retriever]

    BM25 --> Hybrid[Hybrid Fusion RRF]
    ANN --> Hybrid

    Hybrid --> FE[Feature Extractor]
    FE --> LTR[Learning-to-Rank]
    LTR --> RR[Reranker Cross-Encoder]
    RR --> Results[Ranked Results]

    Cache[Search Cache] -.->|TTL Lookup| Q
    Q -.->|Cache Miss| QU

    style Q fill:#4a90d9,color:#fff
    style Results fill:#27ae60,color:#fff
    style Cache fill:#f39c12,color:#fff

✨ Features

Stage	Technology	Purpose
Query Understanding	Intent classifier + Entity extractor + Query expansion	Understand what user wants
BM25 Retrieval	Custom inverted index (Okapi BM25)	Keyword-based matching
ANN Retrieval	FAISS IVF-PQ + Sentence-BERT	Semantic vector search
Hybrid Fusion	RRF (Reciprocal Rank Fusion)	Combine BM25 + ANN scores
Feature Extraction	10+ ranking signals	Prepare features for LTR
Learning-to-Rank	LightGBM LambdaRank	Pointwise ranking optimization
Neural Reranking	Cross-encoder (MS MARCO MiniLM)	Fine-grained relevance scoring
Caching	Redis / In-memory (TTL-based)	Low-latency repeated queries
Monitoring	Prometheus metrics + Health checks	Production observability

🚀 Quick Start

# Install dependencies
pip install -r requirements.txt

# Start the API
uvicorn serving.search_api:app --host 0.0.0.0 --port 8000 --reload

🧪 Usage

Index Documents

curl -X POST http://localhost:8000/index \
  -H "Content-Type: application/json" \
  -d '{"doc_id": 1, "title": "Python tutorials", "body": "Learn Python programming"}'

Search

curl "http://localhost:8000/search?q=python+tutorial&top_k=5"

Batch Index

curl -X POST http://localhost:8000/index/batch \
  -H "Content-Type: application/json" \
  -d '[{"doc_id": 1, "title": "Doc 1", "body": "Content 1"}, {"doc_id": 2, "title": "Doc 2", "body": "Content 2"}]'

🏋️ Training

LTR Model

python scripts/train_ltr.py

Produces models/ltr_model.txt (LightGBM LambdaRank).

Metrics Evaluated

Metric	Description
NDCG@1/3/5/10	Normalized Discounted Cumulative Gain
MAP@5/10	Mean Average Precision
Recall@10	Recall at top 10
MRR@10	Mean Reciprocal Rank

☸️ Kubernetes Deployment

kubectl create namespace search
kubectl apply -f k8s/

Includes: Deployment (3 replicas), HPA (auto-scale to 20), ConfigMap, Ingress (TLS), PVC (10Gi models + 50Gi index).

🐳 Docker

docker build -t search-ranking .
docker run -p 8000:8000 search-ranking

⚙️ Configuration

config/search_config.yaml controls all pipeline stages:

Retrieval weights (BM25 k1/b, ANN ef_search, hybrid RRF weights)
LTR hyperparameters (num_leaves, learning_rate, n_estimators)
Reranker model selection
Caching TTL and backend
Evaluation metrics

📁 Project Structure

search_ranking/
├── indexing/         # Inverted index + FAISS vector index
├── retrieval/        # BM25, ANN, Hybrid retrievers
├── query/            # Intent classifier, entity extraction, expansion
├── ranking/          # Feature extraction, LTR, neural reranker
├── serving/          # FastAPI app + search pipeline orchestrator
├── evaluation/       # NDCG, MAP, MRR, Recall metrics
├── caching/          # Redis/in-memory result cache
├── k8s/              # Kubernetes manifests
├── scripts/          # LTR training pipeline
└── config/           # YAML configuration

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🔍 Search & Ranking Engine

📊 System Architecture

✨ Features

🚀 Quick Start

🧪 Usage

Index Documents

Search

Batch Index

🏋️ Training

LTR Model

Metrics Evaluated

☸️ Kubernetes Deployment

🐳 Docker

⚙️ Configuration

📁 Project Structure

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
caching		caching
config		config
evaluation		evaluation
indexing		indexing
k8s		k8s
query		query
ranking		ranking
retrieval		retrieval
scripts		scripts
serving		serving
tests		tests
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

🔍 Search & Ranking Engine

📊 System Architecture

✨ Features

🚀 Quick Start

🧪 Usage

Index Documents

Search

Batch Index

🏋️ Training

LTR Model

Metrics Evaluated

☸️ Kubernetes Deployment

🐳 Docker

⚙️ Configuration

📁 Project Structure

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages