Skip to content

[FEAT] Integrate GraphRAG (Knowledge Graph RAG) Support Alongside Vector Search #564

@knoxiboy

Description

@knoxiboy

Title: [FEAT] Integrate GraphRAG (Knowledge Graph RAG) Support Alongside Vector Search

Is your feature request related to a problem? Please describe.

Standard vector search (ChromaDB) excels at fetching specific local text chunks (e.g., "What was the revenue in Q3?"). However, it struggles to answer global/thematic queries across the entire document (e.g., "What are the primary recurring themes in this legal contract?").

Describe the solution you'd like

Build a GraphRAG extension in the RAG pipeline:

  1. In backend/app/rag/, implement a node/relation extraction pipeline that parses the PDF pages, extracts entity relationships using the LLM, and builds a Knowledge Graph (using NetworkX or an in-memory graph store).
  2. During queries, execute community detection (e.g., using Leiden/Louvain clustering) on the graph to build summaries of related nodes.
  3. Blend the graph-summarized context with standard semantic vector retrieval results before generating the final answer.
  4. Compare performance metrics using the existing RAGAS evaluation script.

Describe alternatives you've considered

Increasing the TOP_K_RETRIEVAL limit, but this causes LLM context-window overflow and adds significant inference costs without resolving thematic structure representation.

Additional Context

  • GSSoC '26: Yes, I am participating in GirlScript Summer of Code and would like to build this.
  • Level: Advanced
  • Affected Files: backend/app/rag/retriever.py, backend/app/rag/chunker.py, backend/requirements.txt

Metadata

Metadata

Assignees

No one assigned

    Labels

    gssocGirlScript Summer of Code 2026 issue/PR

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions