Skip to content

fix: export CUDA devices in run_fast.sh#57

Open
Travor278 wants to merge 1 commit into
Robbyant:mainfrom
Travor278:fix/run-fast-cuda-visible-devices
Open

fix: export CUDA devices in run_fast.sh#57
Travor278 wants to merge 1 commit into
Robbyant:mainfrom
Travor278:fix/run-fast-cuda-visible-devices

Conversation

@Travor278
Copy link
Copy Markdown

@Travor278 Travor278 commented May 28, 2026

Description

Fixes run_fast.sh so CUDA device selection is actually exported to the torchrun subprocess.

The default behavior remains the same: the script still uses devices 0,1,2,3,4,5,6,7 and 8 processes unless the caller overrides them. The script now also accepts CUDA_VISIBLE_DEVICES and NPROC_PER_NODE from the environment, adds an argument usage check, quotes user-provided paths, and keeps shell scripts checked out with LF line endings.

Related Issue

This was found while checking the provided fast inference runner.

Motivation and Context

The previous script used this form:

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7; torchrun ...

With the semicolon, the variable assignment is not exported to the torchrun child process. Using a shared exported default makes the runner behave as intended and makes overrides explicit.

How Has This Been / Can This Be Tested?

Environment: WSL2 Ubuntu2204.

bash -n run_fast.sh

Stubbed torchrun to avoid launching model inference and verified both cases receive the overridden environment and process count:

CUDA_VISIBLE_DEVICES=2,3 NPROC_PER_NODE=2 bash run_fast.sh /tmp/weights 9

Observed both generated torchrun invocations include:

CUDA_VISIBLE_DEVICES=2,3
--nproc_per_node=2
--ulysses_size 2

Also ran:

git diff --check origin/main...HEAD

Checklist

  • Preserves the default 8-GPU runner behavior.
  • Allows explicit CUDA_VISIBLE_DEVICES / NPROC_PER_NODE overrides.
  • Avoids launching full inference in validation.

@Travor278 Travor278 marked this pull request as ready for review May 28, 2026 23:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant