Skip to content

Fix: add lock-protected reads on _sync_failures dict in scheduler #327

Description

@neuromechanist

Summary

src/api/scheduler.py uses a shared mutable dict _sync_failures across threads (background scheduler jobs run in separate threads via APScheduler). Reads of _sync_failures[key] at line 57 occur outside the lock, creating a race condition with concurrent writes.

Location

  • src/api/scheduler.py:52-66: _track_failure() writes under lock but reads at line 57 (count = _sync_failures[key]) happen without holding _sync_failures_lock.
  • Other read sites: _reset_failure() (line 80, pop), _cleanup_mirrors (lines 301-303), _check_community_budgets (lines 361-379).

Fix

Wrap all reads of _sync_failures under the same lock used for writes:

with _sync_failures_lock:
    count = _sync_failures.get(key, 0)

Acceptance Criteria

  • All reads of _sync_failures are protected by _sync_failures_lock
  • New unit test exercises concurrent read/write pattern
  • No race conditions detected under thread sanitizer (if available)

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Priority 2: Important, fix when possiblebugSomething isn't workingtestingTesting and quality assurance

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions