fix(whisper): bound version guard to < 4.8 + guard explicit CUDA devices#1052
Conversation
… devices Addresses two post-merge nits from kwit75 on #1043: 1. Narrow version guard to (4,7) <= ct2 < (4,8) so the restriction auto-lifts when ctranslate2 4.8+ ships the cuBLAS 12.8.4 fix, rather than trapping 4.8/4.9/5.x indefinitely. 2. Run _check_gpu_compatible() for explicit device='cuda'/'cuda:N' in local mode, not just for auto-detect (device=None). The SIGABRT can occur regardless of how the CUDA device was selected. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughWhisperLoader's GPU compatibility probe is tightened to enforce a specific version range (ctranslate2 4.7–4.8 with CUDA 12.8) rather than a broad ≥4.7 check, with updated documentation. When the probe fails for an explicitly requested non-CPU device, the loader now logs a warning and gracefully falls back to CPU instead of attempting GPU initialization. ChangesGPU Compatibility Handling in WhisperLoader
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
No description provided. |
kwit75
left a comment
There was a problem hiding this comment.
LGTM — both nits from #1043 are correctly and completely addressed. Traced it end-to-end:
1. Version bound (4, 7) <= ct2 < (4, 8) — ct2 = tuple(int(x) for x in ctranslate2.__version__.split(".")[:2]) is always a 2-tuple, so this fires on exactly the 4.7.x family and auto-lifts at the first 4.8 release (4.8.0 → (4,8) fails < (4,8); 4.10 → (4,10) and 5.x → (5,0) correctly excluded — no lexical-string pitfall since components are int-cast). Comment now matches behavior. The parse-failure sentinel (999,999) skips the version gate but the real StorageView.from_array check still runs in the isolated subprocess, so an incompatible build still SIGABRTs there → returncode != 0 → CPU. Fail-open is safe.
2. Explicit-device fallback — the new elif device != 'cpu' and not _check_gpu_compatible() in local mode is correct on every case:
- compatible explicit
cuda/cuda:N→ elif isTrue and not True == False→ device unchanged → still uses GPU (no regression); - incompatible explicit
cuda/cuda:N→ falls to CPU with a warning; device='cpu'→ short-circuits ondevice != 'cpu', so the (subprocess) probe isn't even invoked;- downstream
':' in deviceindex parse and thedevice is Noneauto branch are untouched; server mode was already guarded at theallocate_gpubranch.
Probe is cached + subprocess-isolated, so the explicit path reuses the result and the StorageView check is the backstop. The float16 → int8 CPU fallback still applies after the new demotion (separate if device == 'cpu' block runs after).
Approving. Only the Build matrix is still pending — merge once it greens.
One ordering note for downstream: saas #182 currently pins the submodule at 08558ce5 (#1043 only). After this lands, re-point #182 to the post-#1052 develop tip so SaaS picks up the nits too, then merge #182.
Summary
Follow-up to #1043, addressing two non-blocking nits from kwit75's review:
1. Bound the version guard to
(4,7) <= ct2 < (4,8)The previous
ct2 >= (4, 7)check would have kept 4.8/4.9/5.x forced to CPU even after CTranslate2 ships a fix. Adding< (4, 8)as an upper bound means the guard auto-lifts once 4.8+ is installed.2. Guard explicit
device='cuda'/device='cuda:N'in local mode_check_gpu_compatible()was only called on the auto-detect path (device=None). Callers passing an explicit CUDA device bypassed the probe entirely and could still hit the SIGABRT. The fix adds the same probe check for any non-CPU explicit device:Server mode was already fully protected (probe runs before
allocate_gpu, which is the only GPU selection path in that mode).Test plan
./builder model_server:test— 35 passed, 11 deselected on 8× H200🤖 Generated with Claude Code
Summary by CodeRabbit