Skip to content

fix(whisper): add version guard to GPU probe for ctranslate2 4.7.x + CUDA 12.8#1043

Merged
anandray merged 4 commits into
developfrom
fix/whisper-probe-version-guard
Jun 1, 2026
Merged

fix(whisper): add version guard to GPU probe for ctranslate2 4.7.x + CUDA 12.8#1043
anandray merged 4 commits into
developfrom
fix/whisper-probe-version-guard

Conversation

@anandray
Copy link
Copy Markdown
Contributor

@anandray anandray commented Jun 1, 2026

Summary

Follow-up to #1036. The CR fix to _check_gpu_compatible() correctly changed the probe to use StorageView.from_array(), but this made the probe actually return True on CUDA machines — which caused a regression.

Root cause: ctranslate2 4.7.x + cuBLAS 12.8.4 on H200 causes a tcache_thread_shutdown(): unaligned tcache chunk detected SIGABRT during GPU transcription (not during StorageView creation). Creating a StorageView on GPU doesn't trigger cuBLAS, so the probe passed but the actual inference crashed. Before the CR fix, the probe accidentally returned False (via AttributeError on the wrong attribute name), which was the correct behavior for this machine.

Fix: Add an explicit version guard in the probe script: exit(1) when ctranslate2 >= 4.7 and CUDA 12.8.x are detected, forcing CPU fallback. The StorageView.from_array() sanity check is retained for other version combinations. The guard can be removed once ctranslate2 ships a fix for the cuBLAS 12.8.4 heap corruption.

Test plan

  • ./builder model_server:test — 35 passed, 11 deselected on 8× H200 (ctranslate2 4.7.2 + cuBLAS 12.8.4)
  • No SIGABRT in local or server mode Whisper tests

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes
    • Improved GPU compatibility detection for audio transcription: added an extra compatibility guard for certain CUDA/toolkit combinations so incompatible GPUs are now reliably detected and the app falls back to CPU, preventing runtime failures and ensuring more stable audio processing across diverse environments.

…CUDA 12.8

The CR fix to _check_gpu_compatible() made the probe correctly return True on
CUDA machines — but ctranslate2 4.7.x + cuBLAS 12.8.4 on H200 causes
tcache_thread_shutdown() SIGABRT during GPU transcription (heap corruption),
so returning True causes regressions in both local and server mode.

Add an explicit version guard: exit non-zero when ctranslate2 >= 4.7 and
CUDA 12.8.x are detected, forcing CPU fallback until ctranslate2 ships a fix.
The StorageView.from_array() sanity check is kept for other CUDA versions.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 1, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 85c84c38-f634-4674-b727-95fad12df1f1

📥 Commits

Reviewing files that changed from the base of the PR and between 424d943 and f57a565.

📒 Files selected for processing (1)
  • packages/ai/src/ai/common/models/audio/whisper.py

📝 Walkthrough

Walkthrough

The PR updates WhisperLoader._check_gpu_compatible()’s subprocess probe script to parse ctranslate2 and torch CUDA versions and force a non-zero exit when ctranslate2 >= 4.7 is paired with CUDA 12.8, preserving prior CUDA compute-type and StorageView.from_array() checks.

Changes

GPU Compatibility Probe Version Guard

Layer / File(s) Summary
ctranslate2 + CUDA 12.8 version guard in probe script
packages/ai/src/ai/common/models/audio/whisper.py
The GPU compatibility probe script adds ctranslate2 and CUDA version parsing with a conditional exit when ctranslate2 is at/after 4.7 and CUDA starts with "12.8", forcing fallback to CPU. The generated script imports sys to support the guarded exit while retaining existing CUDA compute-type and StorageView.from_array() sanity checks.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • rocketride-org/rocketride-server#1036: Both PRs modify the WhisperLoader._check_gpu_compatible() GPU-probe logic; the related PR introduced the subprocess-based probe and server-mode gating, and this PR adds the ctranslate2/CUDA 12.8 version guard.

Suggested labels

module:ai

Suggested reviewers

  • jmaionchi
  • Rod-Christensen
  • stepmikhaylov
  • kwit75

Poem

🐰 I hopped in code to check the land,
Read ctranslate2 and CUDA by hand,
If four point seven and twelve-dot-eight align,
I'll bow to CPU where GPUs resign,
Sniffed tensors, then tipped my ear—fallback's fine.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically identifies the main change: adding a version guard for ctranslate2 4.7.x + CUDA 12.8 in the Whisper GPU probe.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/whisper-probe-version-guard

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 1, 2026

No description provided.

@github-actions github-actions Bot added the module:ai AI/ML modules label Jun 1, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/ai/src/ai/common/models/audio/whisper.py`:
- Around line 102-105: The probe_script embeds an inline if with semicolons so
the sys.exit(1) guard consumes the rest of the script and prevents printing
"ok", causing _gpu_compatible to be false; update the probe_script in
packages/ai/src/ai/common/models/audio/whisper.py so the conditional is written
on its own line (newline-delimited) with sys.exit(1) only inside the if block,
and have the torch tensor creation, ctranslate2.StorageView.from_array, and
print("ok") on subsequent lines outside the if; this ensures the probe prints
"ok" when compatible and lets _gpu_compatible = result.returncode == 0 and 'ok'
in result.stdout work correctly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 81ecc428-9f16-4dbd-a656-6cfb3688203d

📥 Commits

Reviewing files that changed from the base of the PR and between 9124b25 and fa2a108.

📒 Files selected for processing (1)
  • packages/ai/src/ai/common/models/audio/whisper.py

Comment thread packages/ai/src/ai/common/models/audio/whisper.py Outdated
…e swallow

Semicolon-separated statements on the same line as an inline `if` are ALL
part of the if-suite in Python. The previous probe had:

  if ct2 >= (4,7) ...: sys.exit(1); t=...; sv=...; print("ok")

so when the version guard was False (compatible machines), the StorageView
creation and print("ok") were both skipped — the script exited 0 without
printing "ok", making _gpu_compatible always False on non-affected machines.

Switch to newline/indent delimiters so sys.exit(1) is the only statement
inside the if-block.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/ai/src/ai/common/models/audio/whisper.py`:
- Line 100: Wrap the version parsing expression ct2 = tuple(int(x) for x in
ctranslate2.__version__.split(".")[:2]) in a try/except block that catches
ValueError (and optionally TypeError) and falls back to the existing safe
default (e.g., ct2 = (0, 0) or whatever the current failure behavior expects);
update the code in packages/ai/src/ai/common/models/audio/whisper.py around the
ctranslate2.__version__ probe to perform the guarded parse so malformed versions
like "4.7rc1" don't raise and the probe continues to select CPU as before.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: bfc0ef5a-9070-4d48-b8c7-15461a8d6246

📥 Commits

Reviewing files that changed from the base of the PR and between fa2a108 and 7e50c7e.

📒 Files selected for processing (1)
  • packages/ai/src/ai/common/models/audio/whisper.py

Comment thread packages/ai/src/ai/common/models/audio/whisper.py Outdated
…ments

Wrap the version tuple parse in try/except so pre-release strings like
'4.7rc1' don't raise ValueError and crash the probe. Falls back to (0,0)
which safely skips the version guard and continues to the StorageView check.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/ai/src/ai/common/models/audio/whisper.py`:
- Around line 100-106: The current parse-failure fallback sets ct2 = (0, 0),
which causes the version-guard (ct2 >= (4, 7) and cuda.startswith("12.8")) to be
skipped on unparseable versions; change the fallback so that parse failures bias
toward disabling GPU (e.g. set ct2 to a high sentinel like (999, 999) or set a
flag on failure and treat unknown as >= (4,7)) so that the check in the block
containing ct2, cuda.startswith("12.8") and sys.exit(1) will trip and force the
process to exit (fall back to CPU) when the ctranslate2 version string cannot be
parsed.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4b17835d-5b4c-4daf-85a1-52606f9be3c6

📥 Commits

Reviewing files that changed from the base of the PR and between 7e50c7e and 424d943.

📒 Files selected for processing (1)
  • packages/ai/src/ai/common/models/audio/whisper.py

Comment thread packages/ai/src/ai/common/models/audio/whisper.py
… fail-safe

(0,0) fallback skips the cuBLAS 12.8 guard on unparseable versions (e.g.
'4.7rc1'), allowing GPU selection that hits the tcache SIGABRT. A high
sentinel ensures any unrecognised version on CUDA 12.8 still forces CPU.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@anandray anandray enabled auto-merge (squash) June 1, 2026 20:20
Copy link
Copy Markdown
Collaborator

@kwit75 kwit75 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved — sound fix for the in-scope regression: the guard sys.exit(1)s before any GPU WhisperModel load, so the CTranslate2 inference-time cuBLAS SIGABRT can't fire on the auto-detect paths (closing the hole #1036's StorageView-only probe left), clean CPU fallback, and the version compare is done as int tuples (no "4.10"<"4.7" string bug). CI green.

Two non-blocking follow-ups (fine as-is for the hackathon hotfix):

  1. ct2 >= (4,7) is open-ended, so it also traps 4.8/4.9/5.x — when CTranslate2 ships a fix in 4.8+ on CUDA 12.8 it'd still be forced to CPU. Since the plan is to drop this once upstream fixes it, consider bounding the upper end (e.g. (4,7) <= ct2 < (4,8)) or a known-bad set, to match the '4.7.x' wording.
  2. The guard only runs on auto-detect paths (server mode + local-mode device=None); an explicit device='cuda'/'cuda:N' bypasses _check_gpu_compatible and is still exposed to the SIGABRT on the affected H200/cuBLAS-12.8.4 box — gate that path too if it's supported.

@anandray anandray merged commit 08558ce into develop Jun 1, 2026
33 of 35 checks passed
@anandray anandray deleted the fix/whisper-probe-version-guard branch June 1, 2026 21:15
kwit75 pushed a commit that referenced this pull request Jun 1, 2026
…ces (#1052)

* fix(whisper): bound ctranslate2 version guard + protect explicit CUDA devices

Addresses two post-merge nits from kwit75 on #1043:

1. Narrow version guard to (4,7) <= ct2 < (4,8) so the restriction
   auto-lifts when ctranslate2 4.8+ ships the cuBLAS 12.8.4 fix,
   rather than trapping 4.8/4.9/5.x indefinitely.

2. Run _check_gpu_compatible() for explicit device='cuda'/'cuda:N' in
   local mode, not just for auto-detect (device=None). The SIGABRT can
   occur regardless of how the CUDA device was selected.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* ci: retrigger CI (Windows HuggingFace network flake)

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module:ai AI/ML modules

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants