fix(whisper): add version guard to GPU probe for ctranslate2 4.7.x + CUDA 12.8 by anandray · Pull Request #1043 · rocketride-org/rocketride-server

anandray · 2026-06-01T18:19:59Z

Summary

Follow-up to #1036. The CR fix to _check_gpu_compatible() correctly changed the probe to use StorageView.from_array(), but this made the probe actually return True on CUDA machines — which caused a regression.

Root cause: ctranslate2 4.7.x + cuBLAS 12.8.4 on H200 causes a tcache_thread_shutdown(): unaligned tcache chunk detected SIGABRT during GPU transcription (not during StorageView creation). Creating a StorageView on GPU doesn't trigger cuBLAS, so the probe passed but the actual inference crashed. Before the CR fix, the probe accidentally returned False (via AttributeError on the wrong attribute name), which was the correct behavior for this machine.

Fix: Add an explicit version guard in the probe script: exit(1) when ctranslate2 >= 4.7 and CUDA 12.8.x are detected, forcing CPU fallback. The StorageView.from_array() sanity check is retained for other version combinations. The guard can be removed once ctranslate2 ships a fix for the cuBLAS 12.8.4 heap corruption.

Test plan

./builder model_server:test — 35 passed, 11 deselected on 8× H200 (ctranslate2 4.7.2 + cuBLAS 12.8.4)
No SIGABRT in local or server mode Whisper tests

🤖 Generated with Claude Code

Summary by CodeRabbit

Bug Fixes
- Improved GPU compatibility detection for audio transcription: added an extra compatibility guard for certain CUDA/toolkit combinations so incompatible GPUs are now reliably detected and the app falls back to CPU, preventing runtime failures and ensuring more stable audio processing across diverse environments.

…CUDA 12.8 The CR fix to _check_gpu_compatible() made the probe correctly return True on CUDA machines — but ctranslate2 4.7.x + cuBLAS 12.8.4 on H200 causes tcache_thread_shutdown() SIGABRT during GPU transcription (heap corruption), so returning True causes regressions in both local and server mode. Add an explicit version guard: exit non-zero when ctranslate2 >= 4.7 and CUDA 12.8.x are detected, forcing CPU fallback until ctranslate2 ships a fix. The StorageView.from_array() sanity check is kept for other CUDA versions. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

coderabbitai · 2026-06-01T18:20:13Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 85c84c38-f634-4674-b727-95fad12df1f1

📥 Commits

Reviewing files that changed from the base of the PR and between 424d943 and f57a565.

📒 Files selected for processing (1)

packages/ai/src/ai/common/models/audio/whisper.py

📝 Walkthrough

Walkthrough

The PR updates WhisperLoader._check_gpu_compatible()’s subprocess probe script to parse ctranslate2 and torch CUDA versions and force a non-zero exit when ctranslate2 >= 4.7 is paired with CUDA 12.8, preserving prior CUDA compute-type and StorageView.from_array() checks.

Changes

GPU Compatibility Probe Version Guard

Layer / File(s)	Summary
ctranslate2 + CUDA 12.8 version guard in probe script `packages/ai/src/ai/common/models/audio/whisper.py`	The GPU compatibility probe script adds ctranslate2 and CUDA version parsing with a conditional exit when ctranslate2 is at/after 4.7 and CUDA starts with "12.8", forcing fallback to CPU. The generated script imports `sys` to support the guarded exit while retaining existing CUDA compute-type and `StorageView.from_array()` sanity checks.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

rocketride-org/rocketride-server#1036: Both PRs modify the WhisperLoader._check_gpu_compatible() GPU-probe logic; the related PR introduced the subprocess-based probe and server-mode gating, and this PR adds the ctranslate2/CUDA 12.8 version guard.

Suggested labels

module:ai

Suggested reviewers

jmaionchi
Rod-Christensen
stepmikhaylov
kwit75

Poem

🐰 I hopped in code to check the land,
Read ctranslate2 and CUDA by hand,
If four point seven and twelve-dot-eight align,
I'll bow to CPU where GPUs resign,
Sniffed tensors, then tipped my ear—fallback's fine.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically identifies the main change: adding a version guard for ctranslate2 4.7.x + CUDA 12.8 in the Whisper GPU probe.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/whisper-probe-version-guard

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-06-01T18:20:19Z

No description provided.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/ai/src/ai/common/models/audio/whisper.py`:
- Around line 102-105: The probe_script embeds an inline if with semicolons so
the sys.exit(1) guard consumes the rest of the script and prevents printing
"ok", causing _gpu_compatible to be false; update the probe_script in
packages/ai/src/ai/common/models/audio/whisper.py so the conditional is written
on its own line (newline-delimited) with sys.exit(1) only inside the if block,
and have the torch tensor creation, ctranslate2.StorageView.from_array, and
print("ok") on subsequent lines outside the if; this ensures the probe prints
"ok" when compatible and lets _gpu_compatible = result.returncode == 0 and 'ok'
in result.stdout work correctly.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 81ecc428-9f16-4dbd-a656-6cfb3688203d

📥 Commits

Reviewing files that changed from the base of the PR and between 9124b25 and fa2a108.

📒 Files selected for processing (1)

packages/ai/src/ai/common/models/audio/whisper.py

…e swallow Semicolon-separated statements on the same line as an inline `if` are ALL part of the if-suite in Python. The previous probe had: if ct2 >= (4,7) ...: sys.exit(1); t=...; sv=...; print("ok") so when the version guard was False (compatible machines), the StorageView creation and print("ok") were both skipped — the script exited 0 without printing "ok", making _gpu_compatible always False on non-affected machines. Switch to newline/indent delimiters so sys.exit(1) is the only statement inside the if-block. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/ai/src/ai/common/models/audio/whisper.py`:
- Line 100: Wrap the version parsing expression ct2 = tuple(int(x) for x in
ctranslate2.__version__.split(".")[:2]) in a try/except block that catches
ValueError (and optionally TypeError) and falls back to the existing safe
default (e.g., ct2 = (0, 0) or whatever the current failure behavior expects);
update the code in packages/ai/src/ai/common/models/audio/whisper.py around the
ctranslate2.__version__ probe to perform the guarded parse so malformed versions
like "4.7rc1" don't raise and the probe continues to select CPU as before.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: bfc0ef5a-9070-4d48-b8c7-15461a8d6246

📥 Commits

Reviewing files that changed from the base of the PR and between fa2a108 and 7e50c7e.

📒 Files selected for processing (1)

packages/ai/src/ai/common/models/audio/whisper.py

…ments Wrap the version tuple parse in try/except so pre-release strings like '4.7rc1' don't raise ValueError and crash the probe. Falls back to (0,0) which safely skips the version guard and continues to the StorageView check. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/ai/src/ai/common/models/audio/whisper.py`:
- Around line 100-106: The current parse-failure fallback sets ct2 = (0, 0),
which causes the version-guard (ct2 >= (4, 7) and cuda.startswith("12.8")) to be
skipped on unparseable versions; change the fallback so that parse failures bias
toward disabling GPU (e.g. set ct2 to a high sentinel like (999, 999) or set a
flag on failure and treat unknown as >= (4,7)) so that the check in the block
containing ct2, cuda.startswith("12.8") and sys.exit(1) will trip and force the
process to exit (fall back to CPU) when the ctranslate2 version string cannot be
parsed.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4b17835d-5b4c-4daf-85a1-52606f9be3c6

📥 Commits

Reviewing files that changed from the base of the PR and between 7e50c7e and 424d943.

📒 Files selected for processing (1)

packages/ai/src/ai/common/models/audio/whisper.py

… fail-safe (0,0) fallback skips the cuBLAS 12.8 guard on unparseable versions (e.g. '4.7rc1'), allowing GPU selection that hits the tcache SIGABRT. A high sentinel ensures any unrecognised version on CUDA 12.8 still forces CPU. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

kwit75

Approved — sound fix for the in-scope regression: the guard sys.exit(1)s before any GPU WhisperModel load, so the CTranslate2 inference-time cuBLAS SIGABRT can't fire on the auto-detect paths (closing the hole #1036's StorageView-only probe left), clean CPU fallback, and the version compare is done as int tuples (no "4.10"<"4.7" string bug). CI green.

Two non-blocking follow-ups (fine as-is for the hackathon hotfix):

ct2 >= (4,7) is open-ended, so it also traps 4.8/4.9/5.x — when CTranslate2 ships a fix in 4.8+ on CUDA 12.8 it'd still be forced to CPU. Since the plan is to drop this once upstream fixes it, consider bounding the upper end (e.g. (4,7) <= ct2 < (4,8)) or a known-bad set, to match the '4.7.x' wording.
The guard only runs on auto-detect paths (server mode + local-mode device=None); an explicit device='cuda'/'cuda:N' bypasses _check_gpu_compatible and is still exposed to the SIGABRT on the affected H200/cuBLAS-12.8.4 box — gate that path too if it's supported.

…ces (#1052) * fix(whisper): bound ctranslate2 version guard + protect explicit CUDA devices Addresses two post-merge nits from kwit75 on #1043: 1. Narrow version guard to (4,7) <= ct2 < (4,8) so the restriction auto-lifts when ctranslate2 4.8+ ships the cuBLAS 12.8.4 fix, rather than trapping 4.8/4.9/5.x indefinitely. 2. Run _check_gpu_compatible() for explicit device='cuda'/'cuda:N' in local mode, not just for auto-detect (device=None). The SIGABRT can occur regardless of how the CUDA device was selected. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * ci: retrigger CI (Windows HuggingFace network flake) --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

anandray requested review from Rod-Christensen, jmaionchi and stepmikhaylov as code owners June 1, 2026 18:20

anandray requested review from asclearuc, dsapandora and kwit75 June 1, 2026 18:20

github-actions Bot added the module:ai AI/ML modules label Jun 1, 2026

coderabbitai Bot reviewed Jun 1, 2026

View reviewed changes

Comment thread packages/ai/src/ai/common/models/audio/whisper.py Outdated

coderabbitai Bot reviewed Jun 1, 2026

View reviewed changes

Comment thread packages/ai/src/ai/common/models/audio/whisper.py Outdated

coderabbitai Bot reviewed Jun 1, 2026

View reviewed changes

Comment thread packages/ai/src/ai/common/models/audio/whisper.py

anandray enabled auto-merge (squash) June 1, 2026 20:20

kwit75 approved these changes Jun 1, 2026

View reviewed changes

anandray merged commit 08558ce into develop Jun 1, 2026
33 of 35 checks passed

anandray deleted the fix/whisper-probe-version-guard branch June 1, 2026 21:15

anandray mentioned this pull request Jun 1, 2026

fix(whisper): bound version guard to < 4.8 + guard explicit CUDA devices #1052

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(whisper): add version guard to GPU probe for ctranslate2 4.7.x + CUDA 12.8#1043

fix(whisper): add version guard to GPU probe for ctranslate2 4.7.x + CUDA 12.8#1043
anandray merged 4 commits into
developfrom
fix/whisper-probe-version-guard

anandray commented Jun 1, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 1, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

github-actions Bot commented Jun 1, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

kwit75 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

anandray commented Jun 1, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

github-actions Bot commented Jun 1, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kwit75 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

anandray commented Jun 1, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 1, 2026 •

edited

Loading