Use trtexec_safe on safety platforms when using remoteAutoTuning#1378
Use trtexec_safe on safety platforms when using remoteAutoTuning#1378dthienan-nv wants to merge 13 commits intoNVIDIA:mainfrom
Conversation
Signed-off-by: dmoodie <dmoodie@nvidia.com>
… the safe runtime Signed-off-by: dmoodie <dmoodie@nvidia.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughReplaces a single latency regex with separate safe and standard patterns; enforces ssh:// for --remoteAutoTuningConfig and stores SSH credentials/remote paths; SCPs engines and runs remote trtexec_safe with fallback to trtexec --safe; adds subprocess timeouts and returns inf on failures. Changes
Sequence DiagramsequenceDiagram
participant Local as Local System
participant CLI as Arg Parser
participant Bench as TrtExecBenchmark
participant Remote as Remote Host
participant Parser as Latency Parser
Local->>CLI: Parse args (--remoteAutoTuningConfig, --safe)
CLI->>Bench: Provide parsed config
Bench->>Bench: Persist SSH creds, host, port, remote paths
alt remote autotuning && safe
Bench->>Bench: Build TensorRT engine
Bench->>Remote: scp engine -> remote_exec_path (timeout)
Remote-->>Bench: scp result
Bench->>Remote: ssh run `LD_LIBRARY_PATH=... trtexec_safe --loadEngine=...` (timeout)
alt safe stdout parsed
Remote-->>Bench: stdout
Bench->>Parser: parse with safe_pattern
else trtexec_safe fail
Bench->>Remote: ssh run `trtexec --safe --loadEngine=...` (timeout fallback)
Remote-->>Bench: stdout
Bench->>Parser: parse with std_pattern
end
else local or non-safe
Bench->>Bench: run local trtexec (timeout)
Bench->>Parser: parse with std_pattern
end
Parser-->>Bench: latency or parse-failure -> return latency or inf
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 5 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@modelopt/onnx/quantization/autotune/benchmark.py`:
- Line 263: The code assumes required query params exist and directly indexes
self.remote_options["remote_exec_path"] (and later "remote_lib_path"), which can
raise a KeyError; fix by validating that both "remote_exec_path" and
"remote_lib_path" are present in self.remote_options before using them (e.g., in
the constructor or the method that parses remote options), and raise a clear
ValueError or return a helpful error message if missing; after validation,
continue to compute self.remote_bin_path =
os.path.dirname(str(self.remote_options["remote_exec_path"])) and the remote lib
path usage as before.
- Around line 355-378: The code in benchmark.py (scp_cmd, trtexec_safe_cmd,
subprocess.run calls) has multiple security and correctness issues: remove all
"# nosec" comments, validate self.remote_port (ensure not None and use the
correct ssh/scp port flags), never embed self.remote_password into command
arguments, and stop passing a single shell-interpreted string to SSH
(trtexec_safe_cmd) which risks command injection from self.remote_engine_path,
self.remote_options['remote_lib_path'], and self.remote_bin_path. Fix by using
key-based SSH authentication (or only use sshpass with explicit approver
consent), build scp_cmd and ssh target as "user@host" (no password), pass port
via the correct flag, and pass remote commands/LD_LIBRARY_PATH safely: either
set environment via subprocess.run(env=...) when running locally or construct a
safely-escaped remote command using shlex.quote for each component (or send a
prepared remote script) and invoke subprocess.run with a list of args (not a
single concatenated shell string); also check subprocess.returncode and capture
output/errors for logging. Ensure changes touch scp_cmd, trtexec_safe_cmd,
subprocess.run usages, and validations for
remote_port/remote_user/remote_engine_path.
- Around line 291-293: The info log in the Benchmark (or surrounding
class/method where self._base_cmd is built and logged) prints the full base
command which can leak credentials from arguments like --remoteAutoTuningConfig
(e.g., ssh://user:password@host); before calling self.logger.info, create a
redacted copy of self._base_cmd that masks sensitive values by detecting args
that contain remoteAutoTuningConfig or ssh:// patterns (or args starting with
"--remoteAutoTuningConfig") and replace the password portion (or entire URI)
with a placeholder such as "ssh://user:****@host" or "[REDACTED]"; then log the
redacted command instead of the original to avoid credential exposure.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: b8d2a6d0-2a7c-4a27-9841-5b576f1c8ec2
📒 Files selected for processing (1)
modelopt/onnx/quantization/autotune/benchmark.py
| if self.has_remote_config and self.is_safe: | ||
| # need to push the model to the device and use trtexec_safe to run | ||
| scp_cmd = [ | ||
| "scp", | ||
| f"-P{self.remote_port}", | ||
| self.engine_path, | ||
| f"{self.remote_user}@{self.remote_ip}:{self.remote_engine_path}", | ||
| ] | ||
| result = subprocess.run(scp_cmd) # nosec B603 | ||
| if result.returncode != 0: | ||
| self.logger.error("Failed to push engine to remote device") | ||
| return float("inf") | ||
| ld_path = ( | ||
| f"LD_LIBRARY_PATH={self.remote_options['remote_lib_path']}:$LD_LIBRARY_PATH" | ||
| ) | ||
| trt_path = f"{os.path.join(self.remote_bin_path, 'trtexec_safe')}" | ||
| trtexec_safe_cmd = [ | ||
| "ssh", | ||
| "-p", | ||
| f"{self.remote_port}", | ||
| f"{self.remote_user}:{self.remote_password}@{self.remote_ip}", | ||
| f"{ld_path} {trt_path} --loadEngine={self.remote_engine_path}", | ||
| ] | ||
| result = subprocess.run(trtexec_safe_cmd, capture_output=True, text=True) # nosec B603 |
There was a problem hiding this comment.
CRITICAL: Multiple security violations and incorrect SSH implementation.
Several issues violate the security coding guidelines:
-
# noseccomments are not allowed (lines 363, 378): Per SECURITY.md and coding guidelines,# nosecbypasses are forbidden. If the pattern is necessary, it requires@NVIDIA/modelopt-setup-codeownersapproval with explicit PR justification. -
Invalid SSH authentication syntax:
{self.remote_user}:{self.remote_password}@{self.remote_ip}(line 375) is not valid SSH syntax. SSH does not accept passwords on the command line. Use SSH keys,sshpass, or SSH_ASKPASS instead. -
Password exposure: Even if the syntax were valid, including the password in command arguments makes it visible to other users via
ps. -
Command injection risk: Line 376 passes a command string to the remote shell. If
self.remote_engine_path,self.remote_options['remote_lib_path'], orself.remote_bin_pathcontain shell metacharacters, they will be interpreted by the remote shell. -
Missing null check for
remote_port: If the URL doesn't specify a port,self.remote_portisNone, causing-PNone(line 359) andNone(line 374) which are invalid.
Proposed fix direction
if self.has_remote_config and self.is_safe:
# need to push the model to the device and use trtexec_safe to run
+ if self.remote_port is None:
+ self.logger.error("Remote port is required for SCP/SSH")
+ return float("inf")
+ if self.remote_password:
+ self.logger.warning(
+ "Password-based SSH auth is not recommended. Consider using SSH keys."
+ )
scp_cmd = [
- "scp",
- f"-P{self.remote_port}",
+ "sshpass", "-p", self.remote_password,
+ "scp", "-P", str(self.remote_port),
self.engine_path,
f"{self.remote_user}@{self.remote_ip}:{self.remote_engine_path}",
]
- result = subprocess.run(scp_cmd) # nosec B603
+ result = subprocess.run(scp_cmd, capture_output=True, text=True)
# ... similar changes for SSH commandConsider using SSH key-based authentication and passing command arguments separately to avoid shell interpretation on the remote side.
As per coding guidelines: "Any use of '# nosec' comments to bypass Bandit security checks is not allowed."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modelopt/onnx/quantization/autotune/benchmark.py` around lines 355 - 378, The
code in benchmark.py (scp_cmd, trtexec_safe_cmd, subprocess.run calls) has
multiple security and correctness issues: remove all "# nosec" comments,
validate self.remote_port (ensure not None and use the correct ssh/scp port
flags), never embed self.remote_password into command arguments, and stop
passing a single shell-interpreted string to SSH (trtexec_safe_cmd) which risks
command injection from self.remote_engine_path,
self.remote_options['remote_lib_path'], and self.remote_bin_path. Fix by using
key-based SSH authentication (or only use sshpass with explicit approver
consent), build scp_cmd and ssh target as "user@host" (no password), pass port
via the correct flag, and pass remote commands/LD_LIBRARY_PATH safely: either
set environment via subprocess.run(env=...) when running locally or construct a
safely-escaped remote command using shlex.quote for each component (or send a
prepared remote script) and invoke subprocess.run with a list of args (not a
single concatenated shell string); also check subprocess.returncode and capture
output/errors for logging. Ensure changes touch scp_cmd, trtexec_safe_cmd,
subprocess.run usages, and validations for
remote_port/remote_user/remote_engine_path.
There was a problem hiding this comment.
Partially addressed.
There was a problem hiding this comment.
@dthienan-nv, thanks for the partial fix. Here's what's been addressed and what still needs attention:
✅ Fixed:
- Invalid SSH target syntax (
user:password@host→user@host) sshpassis now used for password-based authentication
❌ Still remaining:
1. # nosec comments (critical per guidelines):
The new subprocess calls still carry # nosec B603. Per SECURITY.md and the coding guidelines, these are not allowed. If the pattern is genuinely necessary, the PR must be reviewed and approved by @NVIDIA/modelopt-setup-codeowners with explicit justification in the PR description.
2. Command injection risk (line ~376 in current code):
The remote command is still a single interpolated string:
f"{ld_path} {trt_path} --loadEngine={self.remote_engine_path}",If any of remote_lib_path, remote_bin_path, or remote_engine_path contain shell metacharacters, they'll be interpreted by the remote shell. Use shlex.quote on each component:
import shlex
remote_cmd = (
f"LD_LIBRARY_PATH={shlex.quote(str(self.remote_options['remote_lib_path']))}:$LD_LIBRARY_PATH"
f" {shlex.quote(trt_path)}"
f" --loadEngine={shlex.quote(str(self.remote_engine_path))}"
)3. Missing remote_port null check:
urlparse returns None for .port when no port is specified in the URL. This would produce -PNone in the scp command and None as the SSH -p argument. Add a guard:
if self.remote_port is None:
self.remote_port = 22 # default SSH portor raise a ValueError if a port is required.
🧠 Learnings used
Learnt from: kevalmorabia97
Repo: NVIDIA/Model-Optimizer PR: 974
File: modelopt/torch/puzzletron/pruning/pruning_utils.py:195-201
Timestamp: 2026-03-06T14:26:21.439Z
Learning: In NVIDIA/Model-Optimizer, for PyTorch >= 2.6, torch.load() calls without an explicit weights_only argument are safe. Do not flag bare torch.load(...) as a security issue in files under the modelopt package (e.g., modelopt/torch/puzzletron/pruning/pruning_utils.py) as long as the PyTorch version constraint is maintained. If supporting PyTorch < 2.6, require an explicit weights_only argument to torch.load() to avoid potential issues.
There was a problem hiding this comment.
Partially addressed.
There was a problem hiding this comment.
@dthienan-nv, thanks for the continued updates! Here's an accurate status check against the current code:
✅ Now fixed:
shlex.quoteapplied toremote_lib_path,trt_path, andremote_engine_path— command injection risk resolved.remote_portnull check added in the constructor — defaults to22if not specified in the URL.sshpasswithssh_passlist applied correctly for bothscpandsshcommands.
❌ Still remaining:
# nosec comments (critical per guidelines): Three # nosec B603 comments remain in the run() method. Per SECURITY.md and the coding guidelines, these are not allowed without explicit review and approval by @NVIDIA/modelopt-setup-codeowners with justification in the PR description. Please either:
- Remove the
# noseccomments and seek@NVIDIA/modelopt-setup-codeownersapproval with a justification in the PR description explaining whysubprocess.runwith external commands is necessary and safe here, or - If the project has existing approved patterns for subprocess use, align with those.
scp_cmd subprocess call still uses subprocess.run(scp_cmd) # nosec B603 without capture_output=True, text=True, so SCP failures won't have their stderr captured for logging. Consider adding it for consistency with the other calls.
🧠 Learnings used
Learnt from: kevalmorabia97
Repo: NVIDIA/Model-Optimizer PR: 974
File: modelopt/torch/puzzletron/pruning/pruning_utils.py:195-201
Timestamp: 2026-03-06T14:26:21.439Z
Learning: In NVIDIA/Model-Optimizer, for PyTorch >= 2.6, torch.load() calls without an explicit weights_only argument are safe. Do not flag bare torch.load(...) as a security issue in files under the modelopt package (e.g., modelopt/torch/puzzletron/pruning/pruning_utils.py) as long as the PyTorch version constraint is maintained. If supporting PyTorch < 2.6, require an explicit weights_only argument to torch.load() to avoid potential issues.
…fig validation, default value for remote port Signed-off-by: dmoodie <dmoodie@nvidia.com>
Signed-off-by: dmoodie <dmoodie@nvidia.com>
Signed-off-by: dmoodie <dmoodie@nvidia.com>
Signed-off-by: dmoodie <dmoodie@nvidia.com>
Signed-off-by: dmoodie <dmoodie@nvidia.com>
Signed-off-by: dmoodie <dmoodie@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@modelopt/onnx/quantization/autotune/benchmark.py`:
- Around line 293-296: The remote-config cleanup currently only removes the
"--remoteAutoTuningConfig" token from trtexec_args but leaves the following URL
token and doesn't clear the self.has_remote_config flag, so update the cleanup
in the ImportError path to remove both the flag and its following token when
present and set self.has_remote_config = False; specifically, process the
trtexec_args list in the block that handles ImportError to filter out
"--remoteAutoTuningConfig" and the next element (if any) and then set
self.has_remote_config = False so subsequent checks of
self.is_safe/self.has_remote_config behave correctly.
- Around line 376-404: The SCP/SSH subprocess.run calls (scp_cmd and the
trtexec_safe_cmd invocations) lack timeouts and can hang; update each
subprocess.run that executes scp_cmd and trtexec_safe_cmd to pass a configurable
timeout (e.g., self.ssh_timeout or a new attribute like self.remote_timeout with
a sensible default) and handle subprocess.TimeoutExpired by killing the
operation, logging an error via self.logger.error, and returning float("inf")
(same failure path as other errors). Apply this to the three subprocess.run
sites shown (the initial scp_cmd run, the first trtexec_safe_cmd run, and the
fallback trtexec_safe_cmd run), using the timeout kwarg and wrapping calls in
try/except subprocess.TimeoutExpired to ensure deterministic autotune behavior.
- Around line 256-261: The code assigns parsed.username/hostname directly to
self.remote_user and self.remote_ip, which allows None to overwrite the intended
defaults and permits missing hostnames; update the assignment logic in the
constructor/initializer that sets self.remote_user, self.remote_password,
self.remote_ip, and self.remote_port so that parsed.username only replaces the
default (e.g., "root") when not None/empty, parsed.password is handled
similarly, and parsed.hostname is validated—if hostname is missing raise a
ValueError (or similar) with a clear message; keep the fallback for remote_port
to 22 when parsed.port is None.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 05684e65-2b3d-4d9b-aa78-493751fc56b9
📒 Files selected for processing (1)
modelopt/onnx/quantization/autotune/benchmark.py
Signed-off-by: dmoodie <dmoodie@nvidia.com>
There was a problem hiding this comment.
♻️ Duplicate comments (4)
modelopt/onnx/quantization/autotune/benchmark.py (4)
294-297:⚠️ Potential issue | 🟠 Major | ⚡ Quick winImportError cleanup leaves remote state inconsistent for split-arg form.
When args are passed as
--remoteAutoTuningConfig <url>, this only removes the flag token and leaves the URL behind.self.has_remote_configalso remainsTrue, so remote-safe flow can still be entered later.Suggested fix
- trtexec_args = [ - arg for arg in trtexec_args if "--remoteAutoTuningConfig" not in arg - ] + filtered_args: list[str] = [] + skip_next = False + for arg in trtexec_args: + if skip_next: + skip_next = False + continue + if arg == "--remoteAutoTuningConfig": + skip_next = True + continue + if arg.startswith("--remoteAutoTuningConfig="): + continue + filtered_args.append(arg) + trtexec_args = filtered_args + self.has_remote_config = False🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/onnx/quantization/autotune/benchmark.py` around lines 294 - 297, The removal of "--remoteAutoTuningConfig" only deletes the flag token but leaves the URL token and doesn't clear self.has_remote_config; update the cleanup for trtexec_args in the code around trtexec_args and self.is_safe so it removes both the flag and its following value when the flag appears in split form (e.g., iterate through trtexec_args, drop "--remoteAutoTuningConfig" and the next token), and ensure self.has_remote_config is set to False when no remote config token remains so the remote-safe flow won't be entered erroneously.
339-341:⚠️ Potential issue | 🔴 Critical | ⚡ Quick win
# nosecbypasses on subprocess calls violate repository security policy.These inline Bandit bypass comments are not allowed by the project policy and should be removed before merge (with explicit safety justification and required reviewer approval handled in PR process if needed).
As per coding guidelines: "Any use of '# nosec' comments to bypass Bandit security checks is not allowed."
Also applies to: 379-381, 395-400, 414-419
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/onnx/quantization/autotune/benchmark.py` around lines 339 - 341, Remove the "# nosec" comments on the subprocess.run calls (the one creating result = subprocess.run(..., timeout=self.remote_timeout_sec) and the other occurrences at the referenced ranges) and replace the bypass with secure handling: ensure each call passes a validated list of arguments (no shell=True), validate or sanitize the cmd input before use, handle subprocess.TimeoutExpired and CalledProcessError exceptions and log/propagate failures instead of silencing them, and ensure capture_output/text remain intentional; update the code around the subprocess.run calls (refer to the result = subprocess.run(...) invocation and the other similar subprocess.run sites) to implement these safe patterns.
257-262:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPreserve defaults and validate SSH authority fields before building remote targets.
parsed.username/parsed.hostnamecan beNone; assigning directly here can produce invalid endpoints (e.g.,None@None) and hides the intended"root"default user fallback.Suggested fix
- self.remote_user = parsed.username - self.remote_password = parsed.password - self.remote_ip = parsed.hostname - self.remote_port = parsed.port - if self.remote_port is None: - self.remote_port = 22 + self.remote_user = parsed.username or self.remote_user + self.remote_password = parsed.password or "" + self.remote_ip = parsed.hostname + if not self.remote_ip: + raise ValueError( + "Missing hostname in --remoteAutoTuningConfig (expected ssh://user[:pass]@host[:port]?...)." + ) + self.remote_port = parsed.port or 22🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/onnx/quantization/autotune/benchmark.py` around lines 257 - 262, Do not blindly overwrite SSH fields with possibly-None parsed values; instead validate and preserve defaults: only set self.remote_user = parsed.username if parsed.username is not None (otherwise keep existing default "root"), only set self.remote_password = parsed.password if parsed.password is not None, only set self.remote_ip = parsed.hostname if parsed.hostname is not None (and raise or handle missing hostname), and set self.remote_port = parsed.port if parsed.port is not None else keep default 22; ensure the code uses these validated attributes when constructing remote targets so you never get "None@None".
368-370:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAvoid
sshpass -pbecause it exposes the password in process arguments.Passing secrets via command-line arguments can leak through process listings. Prefer SSH keys, or use
sshpass -ewithSSHPASSset in a transient env passed tosubprocess.run.As per coding guidelines: "Never hardcode or log credentials/tokens/passwords; ensure error/log messages do not expose secrets."
Does `sshpass -p <password>` expose credentials via process listings, and is `sshpass -e` (SSHPASS env var) the recommended safer alternative?🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/onnx/quantization/autotune/benchmark.py` around lines 368 - 370, The current code appends "sshpass", "-p", and self.remote_password to ssh_pass (in the block building the SSH command), which exposes the password via process arguments; change this to use "sshpass", "-e" and remove the "-p" and self.remote_password from ssh_pass, then pass the password via a transient environment variable SSHPASS when invoking subprocess.run/call (add/merge an env dict with {"SSHPASS": self.remote_password} or use an existing env copy) so the secret is not visible in argv; also prefer using SSH keys if available and remove any logging of self.remote_password in functions building or executing the SSH command (e.g., the code that constructs ssh_pass and the caller that runs subprocess).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@modelopt/onnx/quantization/autotune/benchmark.py`:
- Around line 294-297: The removal of "--remoteAutoTuningConfig" only deletes
the flag token but leaves the URL token and doesn't clear
self.has_remote_config; update the cleanup for trtexec_args in the code around
trtexec_args and self.is_safe so it removes both the flag and its following
value when the flag appears in split form (e.g., iterate through trtexec_args,
drop "--remoteAutoTuningConfig" and the next token), and ensure
self.has_remote_config is set to False when no remote config token remains so
the remote-safe flow won't be entered erroneously.
- Around line 339-341: Remove the "# nosec" comments on the subprocess.run calls
(the one creating result = subprocess.run(..., timeout=self.remote_timeout_sec)
and the other occurrences at the referenced ranges) and replace the bypass with
secure handling: ensure each call passes a validated list of arguments (no
shell=True), validate or sanitize the cmd input before use, handle
subprocess.TimeoutExpired and CalledProcessError exceptions and log/propagate
failures instead of silencing them, and ensure capture_output/text remain
intentional; update the code around the subprocess.run calls (refer to the
result = subprocess.run(...) invocation and the other similar subprocess.run
sites) to implement these safe patterns.
- Around line 257-262: Do not blindly overwrite SSH fields with possibly-None
parsed values; instead validate and preserve defaults: only set self.remote_user
= parsed.username if parsed.username is not None (otherwise keep existing
default "root"), only set self.remote_password = parsed.password if
parsed.password is not None, only set self.remote_ip = parsed.hostname if
parsed.hostname is not None (and raise or handle missing hostname), and set
self.remote_port = parsed.port if parsed.port is not None else keep default 22;
ensure the code uses these validated attributes when constructing remote targets
so you never get "None@None".
- Around line 368-370: The current code appends "sshpass", "-p", and
self.remote_password to ssh_pass (in the block building the SSH command), which
exposes the password via process arguments; change this to use "sshpass", "-e"
and remove the "-p" and self.remote_password from ssh_pass, then pass the
password via a transient environment variable SSHPASS when invoking
subprocess.run/call (add/merge an env dict with {"SSHPASS":
self.remote_password} or use an existing env copy) so the secret is not visible
in argv; also prefer using SSH keys if available and remove any logging of
self.remote_password in functions building or executing the SSH command (e.g.,
the code that constructs ssh_pass and the caller that runs subprocess).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: cb68d482-2dee-449b-ba0a-2c4b8e306e3a
📒 Files selected for processing (1)
modelopt/onnx/quantization/autotune/benchmark.py
Signed-off-by: dmoodie <dmoodie@nvidia.com>
There was a problem hiding this comment.
♻️ Duplicate comments (3)
modelopt/onnx/quantization/autotune/benchmark.py (3)
297-299:⚠️ Potential issue | 🟠 Major | ⚡ Quick winRedact
--remoteAutoTuningConfigbefore logging command templateAt Line 299, logging the full command can leak credentials from SSH URLs in remote autotuning config.
Proposed fix
- self.logger.debug(f"Base command template: {' '.join(self._base_cmd)}") + redacted_cmd: list[str] = [] + skip_next_remote_cfg_value = False + for arg in self._base_cmd: + if skip_next_remote_cfg_value: + skip_next_remote_cfg_value = False + redacted_cmd.append("[REDACTED_REMOTE_CONFIG]") + continue + if arg == "--remoteAutoTuningConfig": + redacted_cmd.append(arg) + skip_next_remote_cfg_value = True + continue + if arg.startswith("--remoteAutoTuningConfig="): + redacted_cmd.append("--remoteAutoTuningConfig=[REDACTED_REMOTE_CONFIG]") + continue + redacted_cmd.append(arg) + self.logger.debug(f"Base command template: {' '.join(redacted_cmd)}")Based on learnings: “Avoid logging sensitive inputs, paths, tokens, proprietary model details, or overly verbose operational information that could leak security-relevant data.”
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/onnx/quantization/autotune/benchmark.py` around lines 297 - 299, The debug log prints the full command template via self._base_cmd (extended with trtexec_args), which can leak sensitive SSH URLs passed with the --remoteAutoTuningConfig flag; before calling self.logger.debug, create a sanitized copy of the command list (do not mutate self._base_cmd) and replace any argument that startswith "--remoteAutoTuningConfig=" or follows the "--remoteAutoTuningConfig" token with a redacted placeholder like "--remoteAutoTuningConfig=<REDACTED>" and then log ' '.join(sanitized_cmd) in the logger.debug call so the sensitive value is not exposed (keep use of self._base_cmd, trtexec_args, and self.logger.debug).
257-262:⚠️ Potential issue | 🟠 Major | ⚡ Quick winValidate SSH authority fields before assigning remote target
On Line 257 and Line 259,
parsed.username/parsed.hostnamecan beNone. That can produce invalid targets and fail later with unclear errors.Proposed fix
- self.remote_user = parsed.username - self.remote_password = parsed.password - self.remote_ip = parsed.hostname - self.remote_port = parsed.port - if self.remote_port is None: - self.remote_port = 22 + self.remote_user = parsed.username or self.remote_user + self.remote_password = parsed.password or "" + self.remote_ip = parsed.hostname + if not self.remote_ip: + raise ValueError( + "Missing hostname in --remoteAutoTuningConfig " + "(expected ssh://user[:pass]@host[:port]?...)." + ) + self.remote_port = parsed.port or 22🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/onnx/quantization/autotune/benchmark.py` around lines 257 - 262, Validate parsed.username and parsed.hostname before assigning to self.remote_user and self.remote_ip: if either is None, raise a clear ValueError (or custom exception) that includes the original target string and explains that SSH username and host are required; keep the existing fallback for parsed.port (set to 22 if None) and still assign parsed.password to self.remote_password (you may allow None for password but document it), and update the assignment block that sets self.remote_user, self.remote_password, self.remote_ip, self.remote_port to perform these checks and raise the informative error instead of silently producing invalid targets.
340-340:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winRemove forbidden
# nosecsuppressions on subprocess calls
# nosecis still present on Line 340, Line 380, Line 399, and Line 418. This violates the repo’s Python security policy formodelopt/**/*.py.Proposed fix
- result = subprocess.run( - cmd, capture_output=True, text=True, timeout=self.remote_timeout_sec - ) # nosec B603 + result = subprocess.run( + cmd, capture_output=True, text=True, timeout=self.remote_timeout_sec + ) ... - result = subprocess.run( - scp_cmd, capture_output=True, text=True, timeout=self.remote_timeout_sec - ) # nosec B603 + result = subprocess.run( + scp_cmd, capture_output=True, text=True, timeout=self.remote_timeout_sec + ) ... - result = subprocess.run( - trtexec_safe_cmd, - capture_output=True, - text=True, - timeout=self.remote_timeout_sec, - ) # nosec B603 + result = subprocess.run( + trtexec_safe_cmd, + capture_output=True, + text=True, + timeout=self.remote_timeout_sec, + ) ... - result = subprocess.run( - trtexec_safe_cmd, - capture_output=True, - text=True, - timeout=self.remote_timeout_sec, - ) # nosec B603 + result = subprocess.run( + trtexec_safe_cmd, + capture_output=True, + text=True, + timeout=self.remote_timeout_sec, + )As per coding guidelines: “Any use of '# nosec' comments to bypass Bandit security checks is not allowed.”
Also applies to: 380-380, 399-399, 418-418
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/onnx/quantization/autotune/benchmark.py` at line 340, Remove the forbidden "# nosec" suppressions on subprocess calls and fix the underlying security issue: locate the subprocess invocations (e.g., calls to subprocess.run / subprocess.Popen in benchmark.py) that currently have "# nosec B603", remove those comments, and make the calls secure by passing arguments as a list (not a single shell string), ensuring shell=False, sanitizing/validating any dynamic inputs before use (or using shlex.split on trusted strings), and properly handling exceptions/return codes; update the relevant call sites (the subprocess.run/subprocess.Popen usages) to follow these patterns so Bandit warnings are resolved without silencing them.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@modelopt/onnx/quantization/autotune/benchmark.py`:
- Around line 297-299: The debug log prints the full command template via
self._base_cmd (extended with trtexec_args), which can leak sensitive SSH URLs
passed with the --remoteAutoTuningConfig flag; before calling self.logger.debug,
create a sanitized copy of the command list (do not mutate self._base_cmd) and
replace any argument that startswith "--remoteAutoTuningConfig=" or follows the
"--remoteAutoTuningConfig" token with a redacted placeholder like
"--remoteAutoTuningConfig=<REDACTED>" and then log ' '.join(sanitized_cmd) in
the logger.debug call so the sensitive value is not exposed (keep use of
self._base_cmd, trtexec_args, and self.logger.debug).
- Around line 257-262: Validate parsed.username and parsed.hostname before
assigning to self.remote_user and self.remote_ip: if either is None, raise a
clear ValueError (or custom exception) that includes the original target string
and explains that SSH username and host are required; keep the existing fallback
for parsed.port (set to 22 if None) and still assign parsed.password to
self.remote_password (you may allow None for password but document it), and
update the assignment block that sets self.remote_user, self.remote_password,
self.remote_ip, self.remote_port to perform these checks and raise the
informative error instead of silently producing invalid targets.
- Line 340: Remove the forbidden "# nosec" suppressions on subprocess calls and
fix the underlying security issue: locate the subprocess invocations (e.g.,
calls to subprocess.run / subprocess.Popen in benchmark.py) that currently have
"# nosec B603", remove those comments, and make the calls secure by passing
arguments as a list (not a single shell string), ensuring shell=False,
sanitizing/validating any dynamic inputs before use (or using shlex.split on
trusted strings), and properly handling exceptions/return codes; update the
relevant call sites (the subprocess.run/subprocess.Popen usages) to follow these
patterns so Bandit warnings are resolved without silencing them.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 25d0d07d-dd26-491d-b62d-0b95541d748f
📒 Files selected for processing (1)
modelopt/onnx/quantization/autotune/benchmark.py
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
modelopt/onnx/quantization/autotune/benchmark.py (2)
300-302:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPotential credential exposure in logs
Line 302 logs the full base command including
trtexec_args, which may contain--remoteAutoTuningConfig=ssh://user:password@host.... This logs credentials at debug level.Proposed fix: Redact credentials before logging
self._base_cmd.extend(trtexec_args) - self.logger.debug(f"Base command template: {' '.join(self._base_cmd)}") + # Redact credentials from log output + redacted_cmd = [ + re.sub(r"(ssh://[^:]*:)[^@]+(@)", r"\1****\2", arg) for arg in self._base_cmd + ] + self.logger.debug(f"Base command template: {' '.join(redacted_cmd)}")As per coding guidelines (SECURITY.md): "Don't log secrets/credentials/tokens/SSH details."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/onnx/quantization/autotune/benchmark.py` around lines 300 - 302, The debug log prints the full base command including trtexec_args which may contain SSH credentials (e.g., --remoteAutoTuningConfig=ssh://user:password@host); update the logging in the code path after self._base_cmd.extend(trtexec_args) so you sanitize trtexec_args (or a copy of self._base_cmd) before calling self.logger.debug. Implement a small redaction step in benchmark.py that detects and masks credential patterns like ssh://user:password@host or key=value pairs containing passwords/tokens (e.g., replace the password portion with **** or remove the value) and then log the sanitized command using the same logger.debug call instead of the raw command.
341-343:⚠️ Potential issue | 🔴 CriticalRemove
# nosec B603comments—they violate security guidelines without required justificationThe code contains four
# nosec B603comments (lines 343, 383, 402, 421) that bypass Bandit security checks. Per SECURITY.md and coding guidelines,# noseccomments are forbidden unless accompanied by:
- An inline comment explaining why the pattern is necessary and why it is safe in this specific context
- A PR description requesting review from
@NVIDIA/modelopt-setup-codeownerswith explicit justificationCurrently, neither is present. The subprocess calls themselves are safe (list arguments, no
shell=True, proper use ofshlex.quoteon user-controlled values), so removing the# noseccomments should allow Bandit to pass without lowering security posture.Fix: Remove # nosec comments from all four subprocess.run calls
result = subprocess.run( cmd, capture_output=True, text=True, timeout=self.remote_timeout_sec - ) # nosec B603 + ) ... result = subprocess.run( scp_cmd, capture_output=True, text=True, timeout=self.remote_timeout_sec - ) # nosec B603 + ) ... result = subprocess.run( trtexec_safe_cmd, capture_output=True, text=True, timeout=self.remote_timeout_sec, - ) # nosec B603 + ) ... result = subprocess.run( trtexec_safe_cmd, capture_output=True, text=True, timeout=self.remote_timeout_sec, - ) # nosec B603 + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/onnx/quantization/autotune/benchmark.py` around lines 341 - 343, Remove the forbidden "# nosec B603" annotations from the subprocess.run calls in this module (the four occurrences that call subprocess.run(..., timeout=self.remote_timeout_sec) and the other subprocess.run(...) invocations using list arguments and shlex.quote); leave the subprocess.run call signatures and argument lists unchanged. Do not add any other code or suppressions; if a justification is later required, include an inline safety comment plus the PR review request to `@NVIDIA/modelopt-setup-codeowners` per guidelines.
🧹 Nitpick comments (3)
modelopt/onnx/quantization/autotune/benchmark.py (3)
439-441: 💤 Low valueConsider adding specific timeout handling for better error messages
The generic
except Exceptionat line 439 catchessubprocess.TimeoutExpired, but the error message "Benchmark failed: {e}" doesn't clearly indicate a timeout occurred. A specific handler would improve debugging.Proposed fix
+ except subprocess.TimeoutExpired as e: + self.logger.error(f"Benchmark timed out after {self.remote_timeout_sec}s: {e}") + return float("inf") except Exception as e: self.logger.error(f"Benchmark failed: {e}") return float("inf")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/onnx/quantization/autotune/benchmark.py` around lines 439 - 441, The generic except in the benchmarking routine (the block that currently does except Exception as e and calls self.logger.error(f"Benchmark failed: {e}") then returns float("inf")) should be split to handle subprocess.TimeoutExpired specifically: add an except subprocess.TimeoutExpired as e clause before the generic except to log a clear timeout message (e.g., self.logger.error(f"Benchmark timed out after {timeout}s: {e}")) and return float("inf"), then keep the broad except Exception as e for other failures; reference the existing self.logger.error usage and the function/method containing this try/except to locate where to add the new except.
256-265: 💤 Low valueValidate empty username in addition to None
urlparse("ssh://:password@host")returnsusername=""which passes theis Nonecheck but produces an invalid SSH target. Consider validating that the username is non-empty.Proposed fix
self.remote_user = parsed.username self.remote_password = parsed.password self.remote_ip = parsed.hostname self.remote_port = parsed.port - if self.remote_user is None: + if not self.remote_user: raise ValueError("Unable to parse remote user from --remoteAutoTuningConfig")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/onnx/quantization/autotune/benchmark.py` around lines 256 - 265, After parsing the SSH URL into parsed, tighten validation for parsed fields by ensuring self.remote_user is non-empty (not None and not an empty string) and likewise ensure self.remote_ip is a non-empty value; replace the existing "if self.remote_user is None" and "if self.remote_ip is None" checks with truthy checks (e.g., if not self.remote_user / if not self.remote_ip) so that cases like urlparse("ssh://:password@host") are rejected, while leaving the remote_port defaulting behavior (set to 22 when None) unchanged.
286-286: 💤 Low valueRedundant
is_safeassignmentLine 286 sets
self.is_safe = Trueinside the conditional block, but line 299 unconditionally reassignsself.is_safe = "--safe" in trtexec_argsimmediately after the block. Line 286 is redundant.Proposed fix
self.trtexec_args.append("--safe") - self.is_safe = TrueAlso applies to: 299-299
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/onnx/quantization/autotune/benchmark.py` at line 286, The assignment self.is_safe = True inside the conditional is redundant because self.is_safe is immediately overwritten later with self.is_safe = "--safe" in trtexec_args; in the class/method where trtexec args are prepared (look for the method that builds trtexec_args and sets self.is_safe in modelopt/onnx/quantization/autotune/benchmark.py), remove the earlier self.is_safe = True so the flag is set only once by the final assignment, leaving any conditional logic that affects trtexec_args intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@modelopt/onnx/quantization/autotune/benchmark.py`:
- Around line 292-297: The warning message inside the except ImportError block
in benchmark.py (the "except ImportError as e" handler that currently calls
self.logger.warning(...)) is misleading because it says
"--remoteAutoTuningConfig" will be removed but the code immediately raises the
exception; update the log and error handling in that except block of the
benchmarking logic (reference: the except ImportError as e handler and
self.logger) so the message accurately reflects that remote autotuning is
unsupported and the operation will abort: either change the log to a clear
error/exception-level message (e.g., use self.logger.error or
self.logger.exception) that includes e (the exception details) and then
re-raise, or, if you intend to continue by removing the arg, implement the
actual removal logic for "--remoteAutoTuningConfig" before proceeding; do not
leave the old "Removing --remoteAutoTuningConfig" text while still raising.
---
Duplicate comments:
In `@modelopt/onnx/quantization/autotune/benchmark.py`:
- Around line 300-302: The debug log prints the full base command including
trtexec_args which may contain SSH credentials (e.g.,
--remoteAutoTuningConfig=ssh://user:password@host); update the logging in the
code path after self._base_cmd.extend(trtexec_args) so you sanitize trtexec_args
(or a copy of self._base_cmd) before calling self.logger.debug. Implement a
small redaction step in benchmark.py that detects and masks credential patterns
like ssh://user:password@host or key=value pairs containing passwords/tokens
(e.g., replace the password portion with **** or remove the value) and then log
the sanitized command using the same logger.debug call instead of the raw
command.
- Around line 341-343: Remove the forbidden "# nosec B603" annotations from the
subprocess.run calls in this module (the four occurrences that call
subprocess.run(..., timeout=self.remote_timeout_sec) and the other
subprocess.run(...) invocations using list arguments and shlex.quote); leave the
subprocess.run call signatures and argument lists unchanged. Do not add any
other code or suppressions; if a justification is later required, include an
inline safety comment plus the PR review request to
`@NVIDIA/modelopt-setup-codeowners` per guidelines.
---
Nitpick comments:
In `@modelopt/onnx/quantization/autotune/benchmark.py`:
- Around line 439-441: The generic except in the benchmarking routine (the block
that currently does except Exception as e and calls
self.logger.error(f"Benchmark failed: {e}") then returns float("inf")) should be
split to handle subprocess.TimeoutExpired specifically: add an except
subprocess.TimeoutExpired as e clause before the generic except to log a clear
timeout message (e.g., self.logger.error(f"Benchmark timed out after {timeout}s:
{e}")) and return float("inf"), then keep the broad except Exception as e for
other failures; reference the existing self.logger.error usage and the
function/method containing this try/except to locate where to add the new
except.
- Around line 256-265: After parsing the SSH URL into parsed, tighten validation
for parsed fields by ensuring self.remote_user is non-empty (not None and not an
empty string) and likewise ensure self.remote_ip is a non-empty value; replace
the existing "if self.remote_user is None" and "if self.remote_ip is None"
checks with truthy checks (e.g., if not self.remote_user / if not
self.remote_ip) so that cases like urlparse("ssh://:password@host") are
rejected, while leaving the remote_port defaulting behavior (set to 22 when
None) unchanged.
- Line 286: The assignment self.is_safe = True inside the conditional is
redundant because self.is_safe is immediately overwritten later with
self.is_safe = "--safe" in trtexec_args; in the class/method where trtexec args
are prepared (look for the method that builds trtexec_args and sets self.is_safe
in modelopt/onnx/quantization/autotune/benchmark.py), remove the earlier
self.is_safe = True so the flag is set only once by the final assignment,
leaving any conditional logic that affects trtexec_args intact.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 05a6afb2-73a0-4d73-a098-c67808a0e42a
📒 Files selected for processing (1)
modelopt/onnx/quantization/autotune/benchmark.py
…uld be the one to decide on killing the process. Improve documentation, improve log messages Signed-off-by: dmoodie <dmoodie@nvidia.com>
Previously there was a bug when using TrtExecBenchmark with the safety runtime and remoteAutoTuning that resulted in invalid latency measurements. This was due to using the local trtexec instead of the remote trtexec_safe.
This PR fixes this issue by using scp to push the resulting engine to the remote device and then remotely calling trtexec_safe to generate latency measurements.
Before:
After:
Summary by CodeRabbit