Skip to content

Fix multiprocessing segfaults on Windows (#727)#774

Open
lanpa wants to merge 1 commit into
masterfrom
fix-issue-727-multiprocessing-segfault
Open

Fix multiprocessing segfaults on Windows (#727)#774
lanpa wants to merge 1 commit into
masterfrom
fix-issue-727-multiprocessing-segfault

Conversation

@lanpa
Copy link
Copy Markdown
Owner

@lanpa lanpa commented Apr 10, 2026

Description

This PR addresses the unpredictable segfaults on Windows reported in issue #727. The root cause was identified as unsafe pickling of SummaryWriter objects and background threads across processes, combined with child processes incorrectly closing shared resources during cleanup.

Changes:

  • Serialization Improvement:
    • Event objects are now serialized to bytes before being put into the multiprocessing.Queue. This avoids complex pickling of Protobuf objects, which is often unstable across process boundaries on Windows.
    • Updated the background logger thread to handle both bytes and Event objects for backward compatibility.
  • Multiprocessing Safety:
    • Added __getstate__ and __setstate__ to SummaryWriter, FileWriter, and EventFileWriter to explicitly exclude non-picklable resources (threads, file handles, and loggers) from being transmitted to child processes.
    • Resources are lazily re-initialized in child processes only if needed.
  • Cleanup Refinement:
    • Added a PID check to the atexit cleanup handler to ensure that only the process that originally created the FileWriter can close it. This prevents child processes from prematurely closing shared queues and file handles.
    • Moved the cleanup handler to a top-level function to ensure it can be correctly pickled in spawn mode.

Verification Results:

  • New Test Case: Added tests/test_issue_727.py which specifically tests SummaryWriter usage in a spawn-based multiprocessing environment (simulating Windows behavior on other platforms).
  • Local Verification: Verified that the changes prevent the reported resource-closing crashes and that data is correctly recorded from worker processes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant