fix(desktop): Blackwell GPU (RTX 5000) falls back to CPU silently#240
Merged
Conversation
Three changes to address NVIDIA RTX 5000 series (sm_120, cu128) running stem separation on CPU instead of GPU: - Bump cu128 torch from 2.7.1 to 2.8.0: 2.7.1 shipped incomplete sm_120 kernels for Blackwell, causing verify_cuda_torch to fail. 2.8.0+cu128 wheels are available and include full sm_120 support. - Log verify_cuda_torch stderr to logs/setup.log: previously silenced with Stdio::null(), so there was no diagnostic path when Blackwell kernel verification failed. - Show a visible error in setup.js when gpu_detected but cuda_verified is false: the previous status-line message was easy to miss. Now calls showError() so the user knows their GPU was found but CUDA setup failed and where to look for details.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #239
Changes
Bump cu128 torch 2.7.1 -> 2.8.0 (
main.rsline 1098): 2.7.1 shipped with incomplete sm_120 kernels for Blackwell, causingverify_cuda_torchto fail and silently land on CPU.torch==2.8.0+cu128andtorchaudio==2.8.0+cu128wheels are confirmed available ondownload.pytorch.org/whl/cu128.Log
verify_cuda_torchstderr tologs/setup.log(main.rs): previouslyStdio::null()meant there was no diagnostic path when Blackwell kernel verification failed. Now follows the same log pattern asinstall_cuda_torch.Visible error in setup.js when GPU found but CUDA unverified: the previous status-line message was easy to miss during setup. Now calls
showError()with a message pointing users tologs/setup.log.Test plan
torch==2.8.0+cu128wheel resolves on a fresh NVIDIA build installlogs/setup.logcaptures stderr and setup.js shows the error dialog