A modular recommendation engine combining:
- Behavioral Collaborative Filtering (CF)
- Semantic Content Similarity
- ML-based ranking filters
The system is designed to handle noisy real-world interaction data by filtering accidental co-occurrences and blending behavioral confidence with semantic relevance.
- Item-Item Collaborative Filtering using cosine similarity
- Noise suppression using similarity thresholds and support counts
- Logistic Regression-based veto/ranking model
- Hybrid recommendation blending:
Final Score =
(CF Score × 0.7)
+ (Semantic Score × 0.3)
- Modular architecture for experimentation and scaling
cf-recommender/
│
├── data/
│ └── interactions.json
│
├── models/
│ └── cf_ranker.pkl
│
├── src/
│ ├── generate_data.py
│ ├── train_ranker.py
│ ├── cf_engine.py
│ ├── cf_recommender.py
│ └── hybrid_engine.py
│
├── .gitignore
└── README.md
python -m venv .venv
.\.venv\Scripts\activatepython3 -m venv .venv
source .venv/bin/activatepip install numpy pandas scikit-learn joblibCreates clustered and noisy interaction logs.
python src/generate_data.pyTrains the Logistic Regression veto model.
python src/train_ranker.pypython src/cf_recommender.pyExample output:
Target: [iPhone 17]
-> AirPods
-> Apple Pencil
-> Magsafe Charger
python src/hybrid_engine.pyExample output:
Hybrid Results for [iPhone 17]:
-> Magsafe Charger
Hybrid: 0.9850
-> AirPods
Hybrid: 0.8500
Builds Item-Item similarity using cosine similarity over interaction matrices.
Recommendations are filtered using:
- Similarity thresholds
- Minimum support counts
This removes weak or accidental correlations.
A Logistic Regression model evaluates:
- Similarity strength
- Shared interaction support
- Item popularity
Final recommendations combine:
- Behavioral relevance
- Semantic similarity
This improves contextual recommendation quality.
- Python
- NumPy
- Pandas
- Scikit-Learn
- Joblib
- FAISS / ANN retrieval
- Transformer embeddings
- Online learning
- Time-decay ranking
- XGBoost rankers
MIT