feat(rag): add hybrid search using RRF score fusion by nancysangani · Pull Request #492 · param20h/PDF-Assistant-RAG

nancysangani · 2026-06-06T06:40:20Z

🔗 Related Issue

Closes #440

📝 What does this PR do?

Replaces the fake RRF approximation in retriever.py with a correct
Reciprocal Rank Fusion implementation and removes the EnsembleRetriever
dependency.

backend/app/rag/retriever.py:

Adds rrf_merge(vector_results, bm25_results, k) — implements the standard
RRF formula score(d) = Σ 1/(k + rank) across both ranked lists, deduplicates
by content key, and returns chunks sorted by descending RRF score.
Removes EnsembleRetriever / CustomVectorRetriever / CustomBM25Retriever
LangChain wrapper classes — query_chunks and query_bm25 are called directly,
giving full control over each ranked list before fusion.
retrieve() now calls embed_query → query_chunks → query_bm25 →
rrf_merge per query variant, then promotes rrf_score → score before
passing candidates to the cross-encoder reranker. Existing reranking and
confidence normalisation logic is unchanged.
Falls back to vector-only when USE_HYBRID_SEARCH=False or BM25 raises.

backend/app/config.py:

Adds USE_HYBRID_SEARCH: bool = True — toggle hybrid search without
redeploying.
Adds RRF_K: int = 60 — exposes the RRF smoothing constant; 60 is the
value from the original RRF paper and the standard production default.

🗂️ Type of Change

✨ New feature
🔧 Refactor / code cleanup

🧪 How was this tested?

Ran the backend locally (uvicorn app.main:app --reload)
Queried a multi-document collection; confirmed RRF scores present on
returned chunks and that chunks appearing in both lists score higher
than single-list results
Set USE_HYBRID_SEARCH=False; confirmed vector-only path runs and
query_bm25 is never called
Removed rank_bm25 from env; confirmed graceful fallback to
vector-only via the except guard
Confirmed reranker and confidence normalisation are unaffected

✅ Self-Review Checklist

My branch is based on dev, not main
I have not added any secrets / API keys
I have not modified main branch or any HuggingFace deployment config
My code follows the existing style (no unnecessary formatting changes)
I have updated relevant docs / comments if needed

nancysangani · 2026-06-06T06:42:37Z

Hi @param20h, I have opened this PR to fix the issue #440. Please review it when you get a chance. Thanks!

nancysangani · 2026-06-11T07:52:27Z

@param20h please review this PR, all checks have passed and resolved the conflicts. Thanks!

github-actions · 2026-06-11T18:16:07Z

🎉 Congratulations on getting your Pull Request merged! 🎉

Thank you for contributing to PDF-Assistant-RAG as part of GSSoC '26! 🚀

Keep up the great work! ✨

feat(rag): add hybrid search using RRF score fusion

f3269ed

nancysangani requested a review from param20h as a code owner June 6, 2026 06:40

nancysangani added 3 commits June 11, 2026 12:49

Merge branch 'dev' into feat/hybrid-search-rrf

458cba2

fix: merge conflicts

dae8a83

fix: merge conflicts

7950804

param20h approved these changes Jun 11, 2026

View reviewed changes

param20h merged commit c25e6cb into param20h:dev Jun 11, 2026
6 checks passed

github-actions Bot added gssoc GirlScript Summer of Code 2026 issue/PR gssoc:approved Approved for GSSoC base points (+50 pts) level:advanced +55 pts mentor:param20h Mentor for this PR type:backend Backend API labels Jun 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(rag): add hybrid search using RRF score fusion#492

feat(rag): add hybrid search using RRF score fusion#492
param20h merged 4 commits into
param20h:devfrom
nancysangani:feat/hybrid-search-rrf

nancysangani commented Jun 6, 2026

Uh oh!

nancysangani commented Jun 6, 2026

Uh oh!

nancysangani commented Jun 11, 2026

Uh oh!

Uh oh!

github-actions Bot commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nancysangani commented Jun 6, 2026

🔗 Related Issue

📝 What does this PR do?

🗂️ Type of Change

🧪 How was this tested?

✅ Self-Review Checklist

Uh oh!

nancysangani commented Jun 6, 2026

Uh oh!

nancysangani commented Jun 11, 2026

Uh oh!

Uh oh!

github-actions Bot commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants