Skip to content

feat(rag): add IngestedDocument model and persist uploaded PDF metadata to database#898

Open
Pcmhacker-piro wants to merge 1 commit into
SdSarthak:mainfrom
Pcmhacker-piro:fix/rag-ingested-document-tracking
Open

feat(rag): add IngestedDocument model and persist uploaded PDF metadata to database#898
Pcmhacker-piro wants to merge 1 commit into
SdSarthak:mainfrom
Pcmhacker-piro:fix/rag-ingested-document-tracking

Conversation

@Pcmhacker-piro
Copy link
Copy Markdown

Summary

Closes #578

Adds an IngestedDocument database model and Alembic migration to persist metadata about uploaded regulatory PDFs in the RAG Intelligence module. The POST /api/v1/rag/ingest endpoint now records filename, SHA-256 hash, file size, and chunk count for each uploaded document. The vector store is updated to merge new documents into the existing FAISS index (via merge_into_vector_store) instead of rebuilding from scratch, preserving previously ingested content.

Type of Change

  • Bug fix
  • New feature
  • Documentation update
  • Refactor
  • Tests
  • Infra / CI

Checklist

  • I have read CONTRIBUTING.md
  • My code follows the project style
  • I have added/updated tests where relevant
  • Tests/lint pass locally (if available)
  • I have not committed .env or any secrets
  • I have updated documentation if needed

Screenshots (if UI change)

N/A - Backend-only feature
CHANGED FILES

  • backend/app/api/v1/rag.py - Updated ingest endpoint to use merge_into_vector_store and persist IngestedDocument records
  • backend/app/models/init.py - Registered IngestedDocument model
  • backend/app/models/ingested_document.py - New IngestedDocument ORM model with SourceType enum
  • backend/app/modules/rag/vector_store.py - Added merge_into_vector_store() to merge new docs into existing FAISS index
  • backend/alembic/versions/c8d4b0f6a2e1_add_ingested_documents_table.py - New Alembic migration for ingested_documents table
  • backend/tests/test_rag_ingest.py - Updated patch targets to match renamed function
    COMMITS
  • 28355b4 - feat(rag): add IngestedDocument model and persist uploaded PDF metadata to database
    TESTING PERFORMED
  • git diff --stat verified only intended files changed
  • No unrelated modifications detected
    FINAL STATUS
  • Branch Name: fix/rag-ingested-document-tracking
  • Commit Hash: 28355b4
  • PR Created: Yes (via fork URL - requires manual PR creation due to token scope)
  • Ready for Review: Yes

…ta to database

Adds an IngestedDocument ORM model with Alembic migration for tracking
uploaded regulatory PDFs. The ingest endpoint now persists filename,
SHA-256 hash, file size, and chunk count for each uploaded document.
The vector store merging logic (merge_into_vector_store) preserves
existing FAISS index entries when ingesting new documents.
@Pcmhacker-piro
Copy link
Copy Markdown
Author

@SdSarthak

the checks have passed. Could you please review and approve the pending workflows when you have a chance? Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Pre-Loaded Regulatory Knowledge Base & Custom PDF Ingestion Endpoint for RAG

1 participant