fix(whisper): add version guard to GPU probe for ctranslate2 4.7.x + CUDA 12.8#1043
Conversation
…CUDA 12.8 The CR fix to _check_gpu_compatible() made the probe correctly return True on CUDA machines — but ctranslate2 4.7.x + cuBLAS 12.8.4 on H200 causes tcache_thread_shutdown() SIGABRT during GPU transcription (heap corruption), so returning True causes regressions in both local and server mode. Add an explicit version guard: exit non-zero when ctranslate2 >= 4.7 and CUDA 12.8.x are detected, forcing CPU fallback until ctranslate2 ships a fix. The StorageView.from_array() sanity check is kept for other CUDA versions. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThe PR updates WhisperLoader._check_gpu_compatible()’s subprocess probe script to parse ctranslate2 and torch CUDA versions and force a non-zero exit when ctranslate2 >= 4.7 is paired with CUDA 12.8, preserving prior CUDA compute-type and StorageView.from_array() checks. ChangesGPU Compatibility Probe Version Guard
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
No description provided. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/ai/src/ai/common/models/audio/whisper.py`:
- Around line 102-105: The probe_script embeds an inline if with semicolons so
the sys.exit(1) guard consumes the rest of the script and prevents printing
"ok", causing _gpu_compatible to be false; update the probe_script in
packages/ai/src/ai/common/models/audio/whisper.py so the conditional is written
on its own line (newline-delimited) with sys.exit(1) only inside the if block,
and have the torch tensor creation, ctranslate2.StorageView.from_array, and
print("ok") on subsequent lines outside the if; this ensures the probe prints
"ok" when compatible and lets _gpu_compatible = result.returncode == 0 and 'ok'
in result.stdout work correctly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 81ecc428-9f16-4dbd-a656-6cfb3688203d
📒 Files selected for processing (1)
packages/ai/src/ai/common/models/audio/whisper.py
…e swallow
Semicolon-separated statements on the same line as an inline `if` are ALL
part of the if-suite in Python. The previous probe had:
if ct2 >= (4,7) ...: sys.exit(1); t=...; sv=...; print("ok")
so when the version guard was False (compatible machines), the StorageView
creation and print("ok") were both skipped — the script exited 0 without
printing "ok", making _gpu_compatible always False on non-affected machines.
Switch to newline/indent delimiters so sys.exit(1) is the only statement
inside the if-block.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/ai/src/ai/common/models/audio/whisper.py`:
- Line 100: Wrap the version parsing expression ct2 = tuple(int(x) for x in
ctranslate2.__version__.split(".")[:2]) in a try/except block that catches
ValueError (and optionally TypeError) and falls back to the existing safe
default (e.g., ct2 = (0, 0) or whatever the current failure behavior expects);
update the code in packages/ai/src/ai/common/models/audio/whisper.py around the
ctranslate2.__version__ probe to perform the guarded parse so malformed versions
like "4.7rc1" don't raise and the probe continues to select CPU as before.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: bfc0ef5a-9070-4d48-b8c7-15461a8d6246
📒 Files selected for processing (1)
packages/ai/src/ai/common/models/audio/whisper.py
…ments Wrap the version tuple parse in try/except so pre-release strings like '4.7rc1' don't raise ValueError and crash the probe. Falls back to (0,0) which safely skips the version guard and continues to the StorageView check. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/ai/src/ai/common/models/audio/whisper.py`:
- Around line 100-106: The current parse-failure fallback sets ct2 = (0, 0),
which causes the version-guard (ct2 >= (4, 7) and cuda.startswith("12.8")) to be
skipped on unparseable versions; change the fallback so that parse failures bias
toward disabling GPU (e.g. set ct2 to a high sentinel like (999, 999) or set a
flag on failure and treat unknown as >= (4,7)) so that the check in the block
containing ct2, cuda.startswith("12.8") and sys.exit(1) will trip and force the
process to exit (fall back to CPU) when the ctranslate2 version string cannot be
parsed.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 4b17835d-5b4c-4daf-85a1-52606f9be3c6
📒 Files selected for processing (1)
packages/ai/src/ai/common/models/audio/whisper.py
… fail-safe (0,0) fallback skips the cuBLAS 12.8 guard on unparseable versions (e.g. '4.7rc1'), allowing GPU selection that hits the tcache SIGABRT. A high sentinel ensures any unrecognised version on CUDA 12.8 still forces CPU. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
kwit75
left a comment
There was a problem hiding this comment.
Approved — sound fix for the in-scope regression: the guard sys.exit(1)s before any GPU WhisperModel load, so the CTranslate2 inference-time cuBLAS SIGABRT can't fire on the auto-detect paths (closing the hole #1036's StorageView-only probe left), clean CPU fallback, and the version compare is done as int tuples (no "4.10"<"4.7" string bug). CI green.
Two non-blocking follow-ups (fine as-is for the hackathon hotfix):
ct2 >= (4,7)is open-ended, so it also traps 4.8/4.9/5.x — when CTranslate2 ships a fix in 4.8+ on CUDA 12.8 it'd still be forced to CPU. Since the plan is to drop this once upstream fixes it, consider bounding the upper end (e.g.(4,7) <= ct2 < (4,8)) or a known-bad set, to match the '4.7.x' wording.- The guard only runs on auto-detect paths (server mode + local-mode device=None); an explicit
device='cuda'/'cuda:N'bypasses_check_gpu_compatibleand is still exposed to the SIGABRT on the affected H200/cuBLAS-12.8.4 box — gate that path too if it's supported.
…ces (#1052) * fix(whisper): bound ctranslate2 version guard + protect explicit CUDA devices Addresses two post-merge nits from kwit75 on #1043: 1. Narrow version guard to (4,7) <= ct2 < (4,8) so the restriction auto-lifts when ctranslate2 4.8+ ships the cuBLAS 12.8.4 fix, rather than trapping 4.8/4.9/5.x indefinitely. 2. Run _check_gpu_compatible() for explicit device='cuda'/'cuda:N' in local mode, not just for auto-detect (device=None). The SIGABRT can occur regardless of how the CUDA device was selected. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * ci: retrigger CI (Windows HuggingFace network flake) --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Summary
Follow-up to #1036. The CR fix to
_check_gpu_compatible()correctly changed the probe to useStorageView.from_array(), but this made the probe actually returnTrueon CUDA machines — which caused a regression.Root cause: ctranslate2 4.7.x + cuBLAS 12.8.4 on H200 causes a
tcache_thread_shutdown(): unaligned tcache chunk detectedSIGABRT during GPU transcription (not during StorageView creation). Creating a StorageView on GPU doesn't trigger cuBLAS, so the probe passed but the actual inference crashed. Before the CR fix, the probe accidentally returnedFalse(viaAttributeErroron the wrong attribute name), which was the correct behavior for this machine.Fix: Add an explicit version guard in the probe script:
exit(1)when ctranslate2 >= 4.7 and CUDA 12.8.x are detected, forcing CPU fallback. TheStorageView.from_array()sanity check is retained for other version combinations. The guard can be removed once ctranslate2 ships a fix for the cuBLAS 12.8.4 heap corruption.Test plan
./builder model_server:test— 35 passed, 11 deselected on 8× H200 (ctranslate2 4.7.2 + cuBLAS 12.8.4)🤖 Generated with Claude Code
Summary by CodeRabbit