Skip to content

Bugfix/unimplemented workspaces#521

Open
hrshjswniii wants to merge 5 commits into
param20h:devfrom
hrshjswniii:bugfix/Unimplemented-Workspaces
Open

Bugfix/unimplemented workspaces#521
hrshjswniii wants to merge 5 commits into
param20h:devfrom
hrshjswniii:bugfix/Unimplemented-Workspaces

Conversation

@hrshjswniii

Copy link
Copy Markdown
Contributor

🔗 Related Issue


Closes #483



📝 What does this PR do?


This PR includes the following bug fixes, feature implementations, and optimizations:

1. Leaking Soft-Deleted Documents in Global Chat RAG Retrieval (Bug Fix)

Resolves a security/data-isolation bug where soft-deleted documents (where is_deleted = True in SQLite) were leaking into global/unscoped chatbot retrieval:

  • Active Document ID Retrieval: Modified the global retrieve function in [retriever.py], to check the SQLite database for active document IDs. If no specific document_id is supplied, it queries for a whitelist of all non-deleted (is_deleted=False) document IDs belonging to the querying user or their workspaces.
  • ChromaDB Filtering: Passed the active document ID whitelist into CustomVectorRetriever as document_ids, which filters vector store queries via where_filter = {"document_id": {"$in": document_ids}}.
  • BM25 Filtering: Passed the active document ID whitelist into CustomBM25Retriever as document_ids and updated query_bm25 in [bm25.py], to skip index .pkl files that do not match the active list.
  • Unit Testing: Added test_retrieve_excludes_soft_deleted_documents to [test_retriever.py], to assert that retrieval ignores soft-deleted documents.

2. Collaborative Workspaces (New Feature)

Fully implements collaborative workspaces to support shared document spaces, invitations, and access isolation:

  • Database Models: Defined Workspace and WorkspaceMembership models in [models.py], . Added workspace_id and relationships to Document and User models.
  • Schema Migrations: Updated _migrate_schema in [database.py], to automatically add the workspace_id column to the documents table on startup.
  • Workspace Endpoints: Added/updated routes in [workspaces.py],
    • POST /api/v1/workspaces/invite: Allows users to invite other users to a workspace (creating the workspace and membership on the fly if needed).
    • GET /api/v1/workspaces/invite/verify: Verifies the validity of secure invite tokens.
    • POST /api/v1/workspaces/invite/accept: Decodes the token and adds the authenticated user to the workspace.
  • Workspace Document & Chat Filtering:
    • Updated [documents.py], to filter list/upload by the selected workspace (personal, company, or specific UUID).
    • Updated RAG retrieval to isolate searchable documents based on the active workspace context.
  • Frontend UI Integration:
    • Updated dashboard [page.tsx], and [DocumentSidebar.tsx] to pass workspace context.
    • Created a beautiful invitation acceptance page at [invite/page.tsx], with glassmorphism styling and session storage handling.

3. PDF Image Parsing Memory Optimizations (OOM Prevention)

Optimizes raw image processing during PDF ingestion to prevent Out-Of-Memory issues on large files:

  • Immediate Page-by-Page Captioning: Generating captions immediately inside the loop
  • On-the-fly Memory Cleanup: Wrapped image reference clearing and byte deletion in a finally block to release memory immediately for each page rather than waiting for the entire document parsing to finish.


🗂️ Type of Change


  • 🐛 Bug fix
  • ✨ New feature
  • 🔧 Refactor / code cleanup
  • 🎨 UI / styling change
  • 🧪 Tests


🧪 How was this tested?


  • Ran the backend locally (uvicorn app.main:app --reload)
  • Ran the frontend locally (npm run dev inside frontend/)
  • Tested the affected API endpoints manually (upload/delete/chat/workspaces)
  • Added / updated tests (Created test_retrieve_excludes_soft_deleted_documents in backend/tests/test_retriever.py and 4 workspace verification tests in backend/tests/test_workspaces.py)
  • Ran full backend test suite (.venv/Scripts/pytest — all 122 tests passed successfully)


⚠️ Anything to flag for reviewers?

  • Added sqlite documents schema migration automatically on startup.


✅ Self-Review Checklist


  • My branch is based on dev, not main
  • I have not added any secrets / API keys
  • My code follows the existing style (no unnecessary formatting changes)
  • I have updated relevant docs / comments if needed

@hrshjswniii hrshjswniii requested a review from param20h as a code owner June 7, 2026 14:44
Comment thread PDF-Assistant-RAG Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] : Fix Unimplemented Collaborative Workspaces

2 participants