fix: export CUDA devices in run_fast.sh#57
Open
Travor278 wants to merge 1 commit into
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Fixes
run_fast.shso CUDA device selection is actually exported to thetorchrunsubprocess.The default behavior remains the same: the script still uses devices
0,1,2,3,4,5,6,7and8processes unless the caller overrides them. The script now also acceptsCUDA_VISIBLE_DEVICESandNPROC_PER_NODEfrom the environment, adds an argument usage check, quotes user-provided paths, and keeps shell scripts checked out with LF line endings.Related Issue
This was found while checking the provided fast inference runner.
Motivation and Context
The previous script used this form:
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7; torchrun ...With the semicolon, the variable assignment is not exported to the
torchrunchild process. Using a shared exported default makes the runner behave as intended and makes overrides explicit.How Has This Been / Can This Be Tested?
Environment: WSL2 Ubuntu2204.
Stubbed
torchrunto avoid launching model inference and verified both cases receive the overridden environment and process count:Observed both generated
torchruninvocations include:Also ran:
Checklist
CUDA_VISIBLE_DEVICES/NPROC_PER_NODEoverrides.