Skip to content

Fix consumer stalls from hung network calls#251

Merged
Fillll merged 1 commit intomasterfrom
fix/consumer-stalls
Apr 9, 2026
Merged

Fix consumer stalls from hung network calls#251
Fillll merged 1 commit intomasterfrom
fix/consumer-stalls

Conversation

@Fillll
Copy link
Copy Markdown
Owner

@Fillll Fillll commented Apr 9, 2026

Summary

This fixes the long-running consumer stall that causes posting to become delayed and then stop for days.

Root cause on the live process:

  • worker threads were getting stuck in requests.head() inside media detection with no timeout
  • sender instances were creating fresh Mongo clients repeatedly instead of reusing one, which increased PyMongo threads and memory over time
  • after restart, stale SCHEDULED and IN_PROGRESS tasks were not recovered, so the queue could stay wedged

Changes

  • add hard timeouts and request error handling for media HEAD/GET requests
  • add Reddit API request timeout for PRAW
  • reuse a shared Mongo client/database instead of opening new ones per sender
  • recover abandoned SCHEDULED and IN_PROGRESS tasks back to NEW on consumer startup
  • add regression tests for:
    • failed HEAD requests
    • shared Mongo client reuse
    • abandoned task recovery

Why This Fix

Previously, enough slow/bad URLs could fill the worker pool and leave tasks stuck in SCHEDULED, while Mongo client churn slowly degraded the process over time. This change
makes network calls fail fast, reduces background thread growth, and allows the consumer to recover cleanly after restart.

Validation

  • inspected the live consumer with py-spy and confirmed workers blocked in requests.head()
  • ./.venv/bin/python -m pytest -q
  • result: 4 passed, 10 skipped

Rollout

After merge, restart the consumer service once so startup recovery can requeue abandoned tasks.

@Fillll Fillll merged commit 33f1bf7 into master Apr 9, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant