3.Summary Tree & Retriever Integration#105
Open
santo0 wants to merge 15 commits into
Open
Conversation
…etrieval script for enhanced query evaluation
…un_kg_pipeline.py
Contributor
Author
|
I'm trying to simplify this PR, I will notify when I'm done. |
This was referenced Apr 15, 2026
…ctor summary index building
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PRs structure
The PRs depend on the previous ones.
SummaryEntry+summary_tree.pyBuilds LLM-generated summaries bottom-up across the section tree. Leaf nodes summarize sliding windows of adjacent chunks; internal nodes summarize their children's summaries. All summaries are embedded and persisted as a FAISS index (
summary_index.faiss+summary_meta.json) under the run directory.SectionSummaryRetriever(name="section_summary")At query time, embeds the query and searches the summary FAISS index. Each matching summary distributes its cosine similarity score to every chunk it covers; a chunk's final score is the max across all hits. Lazy-loads the embedding model on first call.
Update
benchmark_retrieval.pyEvaluates all three KG retrievers (
kg_node,section_tree,section_summary) plus the existing FAISS/BM25 retrievers against a query set fromtests/benchmarks.yaml. Supports optional LLM relevance grading via OpenRouter.KG and Section retrievers integration in
main.pyEach of
kg_node,section_tree, andsection_summaryis only loaded if its weight inranker_weightsis non-zero.CanonicalLookupis built once and shared across KG retrievers.