Skip to content

Symbols not visible to child processes, breaking multiprocessing #9717

@akshayka

Description

@akshayka

Describe the bug

Attempting to execute a notebook-defined function in another process with multiprocessing fails, because Python's multiprocessing bootstrap can't find the function:

Process SpawnProcess-1:1:
Traceback (most recent call last):
  File "/Users/aagrawal/.local/share/uv/python/cpython-3.14-macos-aarch64-none/lib/python3.14/multiprocessing/process.py", line 320, in _bootstrap
    self.run()
    ~~~~~~~~^^
  File "/Users/aagrawal/.local/share/uv/python/cpython-3.14-macos-aarch64-none/lib/python3.14/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
    ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/aagrawal/.local/share/uv/python/cpython-3.14-macos-aarch64-none/lib/python3.14/concurrent/futures/process.py", line 242, in _process_worker
    call_item = call_queue.get(block=True)
  File "/Users/aagrawal/.local/share/uv/python/cpython-3.14-macos-aarch64-none/lib/python3.14/multiprocessing/queues.py", line 120, in get
    return _ForkingPickler.loads(res)
           ~~~~~~~~~~~~~~~~~~~~~^^^^^
AttributeError: module '__mp_main__' has no attribute 'compute_square'
import multiprocessing
import time
import matplotlib.pyplot as plt
from concurrent.futures import ProcessPoolExecutor
import numpy as np

# Embarrassingly parallel workload: calculate square of numbers
def compute_square(n):
    # Simulate some work with a small delay
    time.sleep(0.001)
    return n * n

def run_parallel_workload(num_workers, data_size=100):
    """Run parallel workload with specified number of workers"""
    data = list(range(data_size))
    
    start_time = time.time()
    with ProcessPoolExecutor(max_workers=num_workers) as executor:
        results = list(executor.map(compute_square, data))
    end_time = time.time()
    
    return end_time - start_time

# Test with varying numbers of cores
max_cores = multiprocessing.cpu_count()
core_counts = list(range(1, max_cores + 1))[:4]
times = []

for num_cores in core_counts:
    print("Number of cores")
    elapsed_time = run_parallel_workload(num_cores)
    times.append(elapsed_time)
    print(f"Cores: {num_cores}, Time: {elapsed_time:.2f}s")

# Plot time vs number of cores
plt.figure(figsize=(10, 6))
plt.scatter(core_counts, times, color='blue', s=50)
plt.plot(core_counts, times, color='lightblue', linestyle='-', alpha=0.7)
plt.xlabel('Number of Cores')
plt.ylabel('Execution Time (seconds)')
plt.title('Execution Time vs Number of Cores for Parallel Workload')
plt.grid(True, alpha=0.3)
plt.xticks(core_counts)
plt.gca()

Will you submit a PR?

  • Yes

Environment

Details
Replace this line with the output of marimo env. Leave the backticks in place.

Code to reproduce

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions