Skip to content

fix(whisper): bound version guard to < 4.8 + guard explicit CUDA devices#1052

Merged
kwit75 merged 2 commits into
developfrom
fix/whisper-probe-nits
Jun 1, 2026
Merged

fix(whisper): bound version guard to < 4.8 + guard explicit CUDA devices#1052
kwit75 merged 2 commits into
developfrom
fix/whisper-probe-nits

Conversation

@anandray
Copy link
Copy Markdown
Contributor

@anandray anandray commented Jun 1, 2026

Summary

Follow-up to #1043, addressing two non-blocking nits from kwit75's review:

1. Bound the version guard to (4,7) <= ct2 < (4,8)

The previous ct2 >= (4, 7) check would have kept 4.8/4.9/5.x forced to CPU even after CTranslate2 ships a fix. Adding < (4, 8) as an upper bound means the guard auto-lifts once 4.8+ is installed.

2. Guard explicit device='cuda'/device='cuda:N' in local mode

_check_gpu_compatible() was only called on the auto-detect path (device=None). Callers passing an explicit CUDA device bypassed the probe entirely and could still hit the SIGABRT. The fix adds the same probe check for any non-CPU explicit device:

elif device != 'cpu' and not WhisperLoader._check_gpu_compatible():
    logger.warning('ctranslate2 CUDA probe failed for explicit device=%r — will use CPU', device)
    device = 'cpu'

Server mode was already fully protected (probe runs before allocate_gpu, which is the only GPU selection path in that mode).

Test plan

  • ./builder model_server:test — 35 passed, 11 deselected on 8× H200

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes
    • Enhanced GPU compatibility checks for audio processing with automatic fallback to CPU when GPU requirements aren't met, ensuring stable operation across different hardware configurations.

… devices

Addresses two post-merge nits from kwit75 on #1043:

1. Narrow version guard to (4,7) <= ct2 < (4,8) so the restriction
   auto-lifts when ctranslate2 4.8+ ships the cuBLAS 12.8.4 fix,
   rather than trapping 4.8/4.9/5.x indefinitely.

2. Run _check_gpu_compatible() for explicit device='cuda'/'cuda:N' in
   local mode, not just for auto-detect (device=None). The SIGABRT can
   occur regardless of how the CUDA device was selected.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 1, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7bcc2c9f-5de3-47f2-92f1-3269e28624e6

📥 Commits

Reviewing files that changed from the base of the PR and between 08558ce and ee9b364.

📒 Files selected for processing (1)
  • packages/ai/src/ai/common/models/audio/whisper.py

📝 Walkthrough

Walkthrough

WhisperLoader's GPU compatibility probe is tightened to enforce a specific version range (ctranslate2 4.7–4.8 with CUDA 12.8) rather than a broad ≥4.7 check, with updated documentation. When the probe fails for an explicitly requested non-CPU device, the loader now logs a warning and gracefully falls back to CPU instead of attempting GPU initialization.

Changes

GPU Compatibility Handling in WhisperLoader

Layer / File(s) Summary
GPU compatibility version guard
packages/ai/src/ai/common/models/audio/whisper.py
Comment describing the ctranslate2 CUDA 12.8 compatibility probe is updated. The version check condition changes from ctranslate2 >= 4.7 to bounded range (4,7) <= ct2 < (4,8) to match upstream fix timing expectations.
Device fallback when GPU incompatible
packages/ai/src/ai/common/models/audio/whisper.py
Local-mode model loading now detects when a non-CPU device is explicitly requested but GPU compatibility fails, logs a warning, and forces the device to CPU rather than proceeding with incompatible GPU initialization.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Suggested labels

module:ai

Suggested reviewers

  • jmaionchi
  • stepmikhaylov
  • Rod-Christensen

Poem

🐰 When GPUs dance with CUDA's glow,
We bound the versions—four-point-oh,
If compatibility fades to black,
We gracefully fall to CPUs back,
With warnings logged for those who know.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the two main changes: narrowing the CTranslate2 version guard to <4.8 and adding guards for explicit CUDA devices in local mode.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/whisper-probe-nits

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added the module:ai AI/ML modules label Jun 1, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 1, 2026

No description provided.

Copy link
Copy Markdown
Collaborator

@kwit75 kwit75 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — both nits from #1043 are correctly and completely addressed. Traced it end-to-end:

1. Version bound (4, 7) <= ct2 < (4, 8)ct2 = tuple(int(x) for x in ctranslate2.__version__.split(".")[:2]) is always a 2-tuple, so this fires on exactly the 4.7.x family and auto-lifts at the first 4.8 release (4.8.0 → (4,8) fails < (4,8); 4.10 → (4,10) and 5.x → (5,0) correctly excluded — no lexical-string pitfall since components are int-cast). Comment now matches behavior. The parse-failure sentinel (999,999) skips the version gate but the real StorageView.from_array check still runs in the isolated subprocess, so an incompatible build still SIGABRTs there → returncode != 0 → CPU. Fail-open is safe.

2. Explicit-device fallback — the new elif device != 'cpu' and not _check_gpu_compatible() in local mode is correct on every case:

  • compatible explicit cuda/cuda:N → elif is True and not True == False → device unchanged → still uses GPU (no regression);
  • incompatible explicit cuda/cuda:N → falls to CPU with a warning;
  • device='cpu' → short-circuits on device != 'cpu', so the (subprocess) probe isn't even invoked;
  • downstream ':' in device index parse and the device is None auto branch are untouched; server mode was already guarded at the allocate_gpu branch.

Probe is cached + subprocess-isolated, so the explicit path reuses the result and the StorageView check is the backstop. The float16 → int8 CPU fallback still applies after the new demotion (separate if device == 'cpu' block runs after).

Approving. Only the Build matrix is still pending — merge once it greens.

One ordering note for downstream: saas #182 currently pins the submodule at 08558ce5 (#1043 only). After this lands, re-point #182 to the post-#1052 develop tip so SaaS picks up the nits too, then merge #182.

@anandray anandray enabled auto-merge (squash) June 1, 2026 22:47
@anandray anandray disabled auto-merge June 1, 2026 22:48
@kwit75 kwit75 merged commit da2a7d2 into develop Jun 1, 2026
20 checks passed
@kwit75 kwit75 deleted the fix/whisper-probe-nits branch June 1, 2026 23:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module:ai AI/ML modules

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants