Skip to content

fix: resolve memory leak and process crashing during concurrent PDF ingestion#586

Open
knoxiboy wants to merge 1 commit into
param20h:devfrom
knoxiboy:fix/issue-565-ingestion-concurrency
Open

fix: resolve memory leak and process crashing during concurrent PDF ingestion#586
knoxiboy wants to merge 1 commit into
param20h:devfrom
knoxiboy:fix/issue-565-ingestion-concurrency

Conversation

@knoxiboy

Copy link
Copy Markdown

📋 PR Checklist


🔗 Related Issue

Closes #565


📝 What does this PR do?

Resolves process crashing and memory leaks during concurrent PDF ingestion:

  1. Implements concurrency throttling (semaphore) to restrict parallel layout parser and file extraction processing to a maximum of 3 concurrent tasks.
  2. Ensures all PDF reader file handles and image-extraction buffers are fully closed and cleaned up in try...finally context managers in AdvancedPDFParser and tasks.
  3. Triggers explicit garbage collection (gc.collect()) after layout analysis/extraction operations per page to avoid memory leaks.

🗂️ Type of Change

  • 🐛 Bug fix
  • ✨ New feature
  • 🔧 Refactor / code cleanup
  • 📝 Documentation update
  • 🎨 UI / styling change
  • ⚙️ CI / tooling / config change
  • 🧪 Tests

🧪 How was this tested?

  • Ran the backend locally
  • Tested document uploads manually

📸 Screenshots (if UI change)


⚠️ Anything to flag for reviewers?

None.


✅ Self-Review Checklist

  • My branch is based on dev, not main
  • I have not added any secrets / API keys
  • I have not modified main branch or any HuggingFace deployment config
  • My code follows the existing style (no unnecessary formatting changes)
  • I have updated relevant docs / comments if needed

@knoxiboy knoxiboy requested a review from param20h as a code owner June 13, 2026 12:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Resolve Memory Leak and Process Crashing during Concurrent PDF Ingestion

1 participant