Skip to content

fix(desktop): Blackwell GPU (RTX 5000) falls back to CPU silently#240

Merged
thcp merged 1 commit into
mainfrom
fix/blackwell-cuda-cpu-fallback
Jun 30, 2026
Merged

fix(desktop): Blackwell GPU (RTX 5000) falls back to CPU silently#240
thcp merged 1 commit into
mainfrom
fix/blackwell-cuda-cpu-fallback

Conversation

@thcp

@thcp thcp commented Jun 29, 2026

Copy link
Copy Markdown
Collaborator

Closes #239

Changes

  • Bump cu128 torch 2.7.1 -> 2.8.0 (main.rs line 1098): 2.7.1 shipped with incomplete sm_120 kernels for Blackwell, causing verify_cuda_torch to fail and silently land on CPU. torch==2.8.0+cu128 and torchaudio==2.8.0+cu128 wheels are confirmed available on download.pytorch.org/whl/cu128.

  • Log verify_cuda_torch stderr to logs/setup.log (main.rs): previously Stdio::null() meant there was no diagnostic path when Blackwell kernel verification failed. Now follows the same log pattern as install_cuda_torch.

  • Visible error in setup.js when GPU found but CUDA unverified: the previous status-line message was easy to miss during setup. Now calls showError() with a message pointing users to logs/setup.log.

Test plan

  • Confirm torch==2.8.0+cu128 wheel resolves on a fresh NVIDIA build install
  • On Blackwell hardware (RTX 5000 series): verify CUDA is used for separation
  • Simulate verify failure (point at cu124 wheel on Blackwell): confirm logs/setup.log captures stderr and setup.js shows the error dialog

Three changes to address NVIDIA RTX 5000 series (sm_120, cu128) running
stem separation on CPU instead of GPU:

- Bump cu128 torch from 2.7.1 to 2.8.0: 2.7.1 shipped incomplete sm_120
  kernels for Blackwell, causing verify_cuda_torch to fail. 2.8.0+cu128
  wheels are available and include full sm_120 support.

- Log verify_cuda_torch stderr to logs/setup.log: previously silenced
  with Stdio::null(), so there was no diagnostic path when Blackwell
  kernel verification failed.

- Show a visible error in setup.js when gpu_detected but cuda_verified
  is false: the previous status-line message was easy to miss. Now calls
  showError() so the user knows their GPU was found but CUDA setup failed
  and where to look for details.
@thcp thcp marked this pull request as ready for review June 30, 2026 07:57
@thcp thcp merged commit d61be8c into main Jun 30, 2026
8 checks passed
@thcp thcp deleted the fix/blackwell-cuda-cpu-fallback branch June 30, 2026 07:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

NVIDIA build falls back to CPU on Blackwell GPUs (RTX 5060 Ti / sm_120)

1 participant