Add adaptive page-aware retrieval pipeline with graded evaluation by Shrey1306 · Pull Request #115 · georgia-tech-db/TokenSmith

Shrey1306 · 2026-04-15T07:21:17Z

Replace static chunk retrieval with adaptive query routing, hierarchical section-to-chunk retrieval, page-aware reranking, and a graded benchmark suite. Measured on 21 judged textbook questions: chunk nDCG@10 = 0.359, chunk Recall@10 = 0.698, page-hit@10 = 0.857.

Key changes:

Manifest-backed artifact bundle with SHA256 validation (src/artifacts.py)
Adaptive query planner with 5-type routing (src/retrieval_pipeline.py)
Hierarchical section-to-chunk retrieval with section-prior boosting
Page-aware reranking (base score + lexical overlap + page specificity)
Multi-part query decomposition with coverage-aware result merging
Hardened GGUF embedder: single-input default, no zero-vector fallback
Hardened query enhancement: try-catch fallbacks on all LLM calls
build_index() decomposed into extract, embed, persist stages
get_answer() split into adaptive and legacy retrieval helpers
21 graded benchmarks with 3-level chunk/page relevance labels
Ranked IR metrics: nDCG, Recall, MRR, MAP, page-hit variants
Benchmark runner with --mode baseline for fair comparison
Ruff lint, docstrings, 36 unit tests passing

This (update) 04/25/2026 (FINAL REPORT)

Consolidates adaptive routing policy, tightens the retrieval + artifact stack, and adds a reproducible local eval path—plus benchmarks and tests aligned to the current textbook index.

What changed

src/planning/rules.py (new)
Single place for query-type classification, follow-up reference patterns, heuristic decomposition, and shared QUERY_TYPE_* constants. HeuristicQueryPlanner, query_enhancement, and retrieval_pipeline all route through the same policy.
src/retrieval_pipeline.py & src/query_enhancement.py
Refactor to use shared rules: named score weights, multi-part merge and anchor rerank behavior, confidence widening traced on RetrievalTrace, and deterministic/consistent follow-up handling.
src/retriever.py & src/config.py
Stricter bundle loading (e.g. manifest / artifact version expectations), type and config cleanups in support of hierarchical + page-aware behavior.
scripts/run_evals.sh (new)
One script to run preflight, optional index build, artifact validation, retrieval benchmark, pytest, and make lint, with timestamped logs under eval_runs/.
Index metadata checked in
index/sections/textbook_index_manifest.json plus updated textbook_index_page_to_chunk_map.json for the current textbook_index build (large diff is the page map).
tests/benchmarks.yaml
Graded retrieval_gold (chunks/pages + grades) and query-type fields updated for the current artifact set.
Tests
tests/test_planning_rules.py (new), extended test_retrieval_pipeline, test_artifacts, and other suites for the above behavior.
Tooling
Makefile Ruff list includes rules.py and test_planning_rules.py. .gitignore: local conda dirs, eval_runs/, generated index/sections/*.npy and *_info.json, and * 2.{py,yaml,yml to avoid duplicate-file accidents.
config/config.yaml
Default generator model path adjusted (e.g. smaller/faster local instruct model); teams should still align paths with their own models/.

How to verify

make lint
pytest tests/ -q
# Optional full local run (needs models + data per config):
# bash scripts/run_evals.sh

Replace static chunk retrieval with adaptive query routing, hierarchical section-to-chunk retrieval, page-aware reranking, and a graded benchmark suite. Measured on 21 judged textbook questions: chunk nDCG@10 = 0.359, chunk Recall@10 = 0.698, page-hit@10 = 0.857. Key changes: - Manifest-backed artifact bundle with SHA256 validation (src/artifacts.py) - Adaptive query planner with 5-type routing (src/retrieval_pipeline.py) - Hierarchical section-to-chunk retrieval with section-prior boosting - Page-aware reranking (base score + lexical overlap + page specificity) - Multi-part query decomposition with coverage-aware result merging - Hardened GGUF embedder: single-input default, no zero-vector fallback - Hardened query enhancement: try-catch fallbacks on all LLM calls - build_index() decomposed into extract, embed, persist stages - get_answer() split into adaptive and legacy retrieval helpers - 21 graded benchmarks with 3-level chunk/page relevance labels - Ranked IR metrics: nDCG, Recall, MRR, MAP, page-hit variants - Benchmark runner with --mode baseline for fair comparison - Ruff lint, docstrings, 36 unit tests passing

Resolve conflicts keeping improved retrieval pipeline while incorporating main's new fields (embedding_model_context_window, enable_topic_extraction, uuid4, flash_attn, LlamaRAMCache). Fix duplicate numpy import and unused deepcopy import introduced during merge.

…rieval-quality # Conflicts: # Makefile # config/config.yaml # src/api_server.py # src/config.py # src/index_builder.py # src/main.py # tests/conftest.py

…rop duplicate files

Shrey1306 added 2 commits April 15, 2026 03:17

Shrey1306 marked this pull request as draft April 15, 2026 07:35

Shrey1306 added 2 commits April 25, 2026 21:34

Merge remote-tracking branch 'origin/main' into feature/sgupta736-ret…

016cfad

…rieval-quality # Conflicts: # Makefile # config/config.yaml # src/api_server.py # src/config.py # src/index_builder.py # src/main.py # tests/conftest.py

Add planning rules module, run_evals.sh, gitignore local artifacts; d…

f8e0b4d

…rop duplicate files

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add adaptive page-aware retrieval pipeline with graded evaluation #115

Add adaptive page-aware retrieval pipeline with graded evaluation #115
Shrey1306 wants to merge 4 commits into
georgia-tech-db:mainfrom
Shrey1306:feature/sgupta736-retrieval-quality

Shrey1306 commented Apr 15, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Shrey1306 commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

This (update) 04/25/2026 (FINAL REPORT)

What changed

How to verify

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Shrey1306 commented Apr 15, 2026 •

edited

Loading