Skip to content

Add POST /documents/upload/batch endpoint for issue 435#528

Open
parthpatidar03 wants to merge 1 commit into
param20h:devfrom
parthpatidar03:feat/435-batch-document-upload
Open

Add POST /documents/upload/batch endpoint for issue 435#528
parthpatidar03 wants to merge 1 commit into
param20h:devfrom
parthpatidar03:feat/435-batch-document-upload

Conversation

@parthpatidar03

Copy link
Copy Markdown

Summary

Closes #435

Adds a new POST /documents/upload/batch endpoint that allows users to upload multiple PDF/DOCX/TXT/MD files simultaneously. Each file is validated, saved, and enqueued as an independent Celery task for parallel RAG ingestion — without blocking the API response.


Changes

backend/app/routes/documents.py

  • Added POST /documents/upload/batch endpoint
    • Accepts files: List[UploadFile] via multipart/form-data
    • Enforces a max batch size of 20 files per request
    • Each file goes through the existing validate_upload() pipeline (extension → size → MIME → deep parse)
    • Per-file error isolation: one invalid file does not abort the rest of the batch
    • Enqueues a process_document.delay() Celery task per file
    • Falls back to BackgroundTasks if Celery is unavailable (consistent with single upload)
    • Returns 202 Accepted immediately with per-file results

backend/app/schemas.py

  • Added BatchUploadResult schema — per-file outcome with filename, success, optional document, optional error
  • Added BatchUploadResponse schema — wraps results list with total, succeeded, failed counters

backend/tests/test_batch_upload.py

  • New test file with 5 test cases:
    • Unauthenticated request → 401
    • Missing files field → 422
    • More than 20 files → 400
    • Single file happy path → 202, succeeded: 1
    • Partial failure (one good + one bad file) → 202, succeeded: 1, failed: 1
    • Celery unavailable → falls back, task_id prefixed local_
    • Unsupported file extension counted as failure

API Reference

Request

POST /documents/upload/batch
Content-Type: multipart/form-data
Authorization: Bearer <token>

files: file1.pdf
files: file2.pdf
...

Response 202 Accepted

{
  "total": 2,
  "succeeded": 2,
  "failed": 0,
  "results": [
    {
      "filename": "file1.pdf",
      "success": true,
      "document": {
        "id": "abc123",
        "original_name": "file1.pdf",
        "status": "pending",
        "task_id": "celery-task-uuid",
        ...
      },
      "error": null
    },
    {
      "filename": "file2.pdf",
      "success": true,
      "document": { ... },
      "error": null
    }
  ]
}

Error cases

Condition Status
Not authenticated 401
files field missing 422
More than 20 files 400
Individual file invalid Counted in failed, rest continue

Testing

cd backend
pytest tests/test_batch_upload.py -v

@parthpatidar03 parthpatidar03 requested a review from param20h as a code owner June 8, 2026 05:58
@parthpatidar03

Copy link
Copy Markdown
Author

@param20h plz check the pr

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(api): Add batch document upload endpoint

1 participant