Title: [FEAT] Integrate GraphRAG (Knowledge Graph RAG) Support Alongside Vector Search
Is your feature request related to a problem? Please describe.
Standard vector search (ChromaDB) excels at fetching specific local text chunks (e.g., "What was the revenue in Q3?"). However, it struggles to answer global/thematic queries across the entire document (e.g., "What are the primary recurring themes in this legal contract?").
Describe the solution you'd like
Build a GraphRAG extension in the RAG pipeline:
- In
backend/app/rag/, implement a node/relation extraction pipeline that parses the PDF pages, extracts entity relationships using the LLM, and builds a Knowledge Graph (using NetworkX or an in-memory graph store).
- During queries, execute community detection (e.g., using Leiden/Louvain clustering) on the graph to build summaries of related nodes.
- Blend the graph-summarized context with standard semantic vector retrieval results before generating the final answer.
- Compare performance metrics using the existing RAGAS evaluation script.
Describe alternatives you've considered
Increasing the TOP_K_RETRIEVAL limit, but this causes LLM context-window overflow and adds significant inference costs without resolving thematic structure representation.
Additional Context
- GSSoC '26: Yes, I am participating in GirlScript Summer of Code and would like to build this.
- Level: Advanced
- Affected Files:
backend/app/rag/retriever.py, backend/app/rag/chunker.py, backend/requirements.txt
Title: [FEAT] Integrate GraphRAG (Knowledge Graph RAG) Support Alongside Vector Search
Is your feature request related to a problem? Please describe.
Standard vector search (ChromaDB) excels at fetching specific local text chunks (e.g., "What was the revenue in Q3?"). However, it struggles to answer global/thematic queries across the entire document (e.g., "What are the primary recurring themes in this legal contract?").
Describe the solution you'd like
Build a GraphRAG extension in the RAG pipeline:
backend/app/rag/, implement a node/relation extraction pipeline that parses the PDF pages, extracts entity relationships using the LLM, and builds a Knowledge Graph (using NetworkX or an in-memory graph store).Describe alternatives you've considered
Increasing the
TOP_K_RETRIEVALlimit, but this causes LLM context-window overflow and adds significant inference costs without resolving thematic structure representation.Additional Context
backend/app/rag/retriever.py,backend/app/rag/chunker.py,backend/requirements.txt