Skip to content

Isolate Parakeet in a subprocess so Metal crashes don't kill Flask (issue #23)#32

Merged
DozaVisuals merged 1 commit into
mainfrom
fix/issue-23-subprocess-isolation
May 25, 2026
Merged

Isolate Parakeet in a subprocess so Metal crashes don't kill Flask (issue #23)#32
DozaVisuals merged 1 commit into
mainfrom
fix/issue-23-subprocess-isolation

Conversation

@DozaVisuals
Copy link
Copy Markdown
Owner

Summary

  • Move the chunked Parakeet MLX decode into a new parakeet_worker.py script and spawn it via subprocess.Popen([sys.executable, ...]) from _transcribe_parakeet.
  • Parent shepherds the child: pipes stdout for per-chunk progress, raises RuntimeError on any nonzero exit (SIGABRT included), and the existing fall-through to WhisperX/Whisper handles it.
  • Bump version 3.5.4 → 3.5.5.

Why

The v3.4.1 mitigation (60s chunks + mx.synchronize + mx.clear_cache) made the Metal crash rarer but couldn't eliminate it. mlx::core::gpu::check_error raises a C++ exception inside Metal's addCompletedHandler, which has no catch — it unwinds to std::terminateabort(). Python can't trap that, so the whole interpreter dies, taking Flask with it. The browser then renders the generic "Load failed" page reported in #23.

With process isolation the SIGABRT kills only the worker. Flask stays up, the outer transcribe_file catches the worker error, and the existing fallback chain transcribes the file with WhisperX/Whisper instead.

Trade-off

Each Parakeet call now reloads the ~600 MB model (~5–15 s of overhead per file) since the parent-side model cache is gone. Acceptable for the one-file-at-a-time workflow reporter is using; a long-lived worker with a request pipe is a follow-up if batch throughput matters later.

Test plan

  • Worker runs end-to-end on a 14-min audio (15 chunks, 133 segments, exit 0).
  • _transcribe_parakeet round-trip via the parent returns the same dict shape app.py expects (segments, language, duration, engine, plus per-segment start_formatted/end_formatted/words).
  • Bad-input path: nonexistent audio path → child exits 1, parent raises RuntimeError carrying the child's exception message; outer fallback chain takes over.
  • app.py still imports cleanly.
  • Existing test suite unchanged (18 pre-existing failures unrelated — mocked num_ctx kwarg in chat/editorial_dna tests).
  • On-device verification on a Metal-crash-prone M1 Mac (issue reporter's machine class).

🤖 Generated with Claude Code

…ssue #23)

The v3.4.1 mitigation (60s chunks + mx.clear_cache) reduces but cannot
eliminate the SIGABRT path: mlx::core::gpu::check_error raises a C++
exception inside Metal's addCompletedHandler, which has no catch block,
so it unwinds straight to std::terminate → abort(). Python can't trap
that. The whole interpreter dies, Flask dies with it, and the user sees
a generic "Load failed" page from the browser.

Move the chunked Parakeet decode into parakeet_worker.py and spawn it
via subprocess.Popen([sys.executable, ...]). When MLX aborts the child,
the parent observes a nonzero returncode, raises RuntimeError, and the
outer transcribe_file falls through to the existing WhisperX / Whisper
fallback path — Flask stays up, the user gets a real transcript via a
slower engine instead of a dead server.

Trade-off: each Parakeet call now reloads the ~600 MB model (~5–15 s of
overhead per file) since there's no parent-side model cache anymore.
That's acceptable for the one-file-at-a-time workflow the issue
reporter is using; a long-lived worker with a request pipe is a
follow-up if batch throughput becomes a concern.

The 60s chunks and mx.clear_cache stay (now inside the worker). Smaller
chunks make the worker itself less likely to crash and need restarting.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cloudflare-workers-and-pages
Copy link
Copy Markdown

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Updated (UTC)
❌ Deployment failed
View logs
doza-assist e6ec6fa May 25 2026, 01:13 PM

@DozaVisuals DozaVisuals merged commit 99d36db into main May 25, 2026
2 of 3 checks passed
@DozaVisuals DozaVisuals deleted the fix/issue-23-subprocess-isolation branch May 25, 2026 13:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant