Skip to content

[BUGFIX] : Persistent Storage Leak of Knowledge Graphs (GraphRAG)#546

Open
hrshjswniii wants to merge 8 commits into
param20h:devfrom
hrshjswniii:bugfix/Persistent-Storage-Leak
Open

[BUGFIX] : Persistent Storage Leak of Knowledge Graphs (GraphRAG)#546
hrshjswniii wants to merge 8 commits into
param20h:devfrom
hrshjswniii:bugfix/Persistent-Storage-Leak

Conversation

@hrshjswniii

Copy link
Copy Markdown
Contributor

🔗 Related Issue


Closes #539



📝 What does this PR do?


This PR implements multi-document chat capabilities, introduces a Recycle Bin trash/restore flow, and resolves a critical persistent storage leak in GraphRAG background cleanup:

1. GraphRAG Persistent Storage Leak Fix (Bug Fix)

  • **Background auto-cleanup in [main.py].
  • Added a delete_graph try-except block call inside document_cleanup_job() to automatically clean up persisting knowledge graph JSON files of expired/inactive documents (which previously caused disk leaks on the server).
  • Unit Test coverage: Added test_cleanup_old_deleted_documents_purges_graph and test_document_cleanup_job_purges_graph in [test_documents.py]to assert correct file removal, database purging, vector cleanup, and graph cleanup.

2. Multi-Document Selection Chat (New Feature)

  • RAG Pipeline & Retrieval Pipeline (Backend): Enhanced retrieve(), PDFSearchTool, _candidate_graphs, get_entity_context, and agent execution helper signatures to accept document_ids: List[str] and verify user access for each. Updated tracing decorator signatures to support keyword arguments dynamically.
  • WebSocket & SSE handlers (Backend): Updated endpoints to parse document_ids, validate readiness, save multi-doc messages under document_id = None, and cache query results under a sorted comma-joined key.
  • Interactive Checkboxes & Indicators (Frontend): Rendered checking inputs in the sidebar list of documents, added selection badge count indicators, updated textarea placeholders, and structured payloads to dispatch document_ids.

3. Recycle Bin / Trash Modal (New Feature)

  • Added TrashModal.tsx and custom routing to list, restore, or immediately purge soft-deleted files.


🗂️ Type of Change


  • 🐛 Bug fix
  • ✨ New feature
  • 🔧 Refactor / code cleanup
  • 📝 Documentation update
  • 🎨 UI / styling change
  • ⚙️ CI / tooling / config change
  • 🧪 Tests


🧪 How was this tested?


  • Tested the affected API endpoints manually
  • Added / updated tests
    • Added test_cleanup_old_deleted_documents_purges_graph and test_document_cleanup_job_purges_graph in test_documents.py.
    • Added test_retrieve_with_document_ids_list_and_rbac_checks in test_retriever.py.
    • Added test_chat_ask_success_with_document_ids in test_chat.py.
    • Verified all 164 tests pass successfully via backend\.venv\Scripts\python -m pytest backend/tests
  • Ran frontend local typecheck and compilation (npx tsc --noEmit completed with no compilation errors)


⚠️ Anything to flag for reviewers?


  • In main.py, the delete_graph import and invocation is wrapped in a try-except block to prevent database updates or file deletions from failing if a document doesn't have an associated graph yet.
  • In test_documents.py, MockDbSessionContext has been updated to commit transactions on context exit so that database purging side effects are accurately tracked by the tests.

@hrshjswniii hrshjswniii requested a review from param20h as a code owner June 9, 2026 20:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] : Persistent Storage Leak of Knowledge Graphs (GraphRAG)

1 participant