Skip to content

Use trtexec_safe on safety platforms when using remoteAutoTuning#1378

Open
dthienan-nv wants to merge 13 commits intoNVIDIA:mainfrom
dthienan-nv:dmoodie/bugfix/trtexec_safe
Open

Use trtexec_safe on safety platforms when using remoteAutoTuning#1378
dthienan-nv wants to merge 13 commits intoNVIDIA:mainfrom
dthienan-nv:dmoodie/bugfix/trtexec_safe

Conversation

@dthienan-nv
Copy link
Copy Markdown
Contributor

@dthienan-nv dthienan-nv commented Apr 30, 2026

Previously there was a bug when using TrtExecBenchmark with the safety runtime and remoteAutoTuning that resulted in invalid latency measurements. This was due to using the local trtexec instead of the remote trtexec_safe.
This PR fixes this issue by using scp to push the resulting engine to the remote device and then remotely calling trtexec_safe to generate latency measurements.

Before:

[modelopt][onnx] - WARNING - Could not parse median latency from trtexec output
[modelopt][onnx] - WARNING - Benchmark failed: /tmp/tmp9s0yckb3/baseline.onnx
[modelopt][onnx] - INFO - Baseline latency: inf ms
[modelopt][onnx] - INFO - Baseline: inf ms

After:

[modelopt][onnx] - INFO - TrtExec benchmark (median): 3.16 ms
[modelopt][onnx] - INFO - Baseline latency: 3.165 ms
[modelopt][onnx] - INFO - Baseline: 3.16 ms

Summary by CodeRabbit

  • Improvements
    • Distinct latency measurement for safe vs standard benchmark modes for more accurate results.
    • Strict parsing and validation of remote autotuning configuration as URL-style input.
    • Automatic transfer of built artifacts and SSH-based remote benchmarking with a fallback execution path.
    • Timeouts for remote operations and clearer error handling that surfaces missing dependencies and yields infinite latency on failures.

Signed-off-by: dmoodie <dmoodie@nvidia.com>
… the safe runtime

Signed-off-by: dmoodie <dmoodie@nvidia.com>
@dthienan-nv dthienan-nv requested a review from a team as a code owner April 30, 2026 16:00
@dthienan-nv dthienan-nv requested a review from galagam April 30, 2026 16:00
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Apr 30, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 30, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Replaces a single latency regex with separate safe and standard patterns; enforces ssh:// for --remoteAutoTuningConfig and stores SSH credentials/remote paths; SCPs engines and runs remote trtexec_safe with fallback to trtexec --safe; adds subprocess timeouts and returns inf on failures.

Changes

Cohort / File(s) Summary
TensorRT Benchmark Safe Mode & Remote Autotuning
modelopt/onnx/quantization/autotune/benchmark.py
Introduces separate safe_pattern and std_pattern for latency parsing; parses --remoteAutoTuningConfig strictly as ssh:// and extracts/stores host, creds, remote_exec_path, remote_lib_path; when remote+safe, SCPs engine to remote, SSH-runs trtexec_safe and parses with safe_pattern (fallback: SSH trtexec --safe + std_pattern); adds timeouts to subprocess.run and treats failures (nonzero exit, SCP/SSH errors, missing parse) as float("inf"); ImportError during remote setup now re-raised after warning.

Sequence Diagram

sequenceDiagram
    participant Local as Local System
    participant CLI as Arg Parser
    participant Bench as TrtExecBenchmark
    participant Remote as Remote Host
    participant Parser as Latency Parser

    Local->>CLI: Parse args (--remoteAutoTuningConfig, --safe)
    CLI->>Bench: Provide parsed config
    Bench->>Bench: Persist SSH creds, host, port, remote paths

    alt remote autotuning && safe
        Bench->>Bench: Build TensorRT engine
        Bench->>Remote: scp engine -> remote_exec_path (timeout)
        Remote-->>Bench: scp result
        Bench->>Remote: ssh run `LD_LIBRARY_PATH=... trtexec_safe --loadEngine=...` (timeout)
        alt safe stdout parsed
            Remote-->>Bench: stdout
            Bench->>Parser: parse with safe_pattern
        else trtexec_safe fail
            Bench->>Remote: ssh run `trtexec --safe --loadEngine=...` (timeout fallback)
            Remote-->>Bench: stdout
            Bench->>Parser: parse with std_pattern
        end
    else local or non-safe
        Bench->>Bench: run local trtexec (timeout)
        Bench->>Parser: parse with std_pattern
    end

    Parser-->>Bench: latency or parse-failure -> return latency or inf
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 5 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly addresses the main change: using trtexec_safe on safety platforms during remote autotuning, which matches the core fix described in the PR objectives.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns ✅ Passed No security anti-patterns detected: no unsafe torch.load/numpy.load, no trust_remote_code, no eval/exec on external input, no nosec bypass comments, no new risky dependencies.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelopt/onnx/quantization/autotune/benchmark.py`:
- Line 263: The code assumes required query params exist and directly indexes
self.remote_options["remote_exec_path"] (and later "remote_lib_path"), which can
raise a KeyError; fix by validating that both "remote_exec_path" and
"remote_lib_path" are present in self.remote_options before using them (e.g., in
the constructor or the method that parses remote options), and raise a clear
ValueError or return a helpful error message if missing; after validation,
continue to compute self.remote_bin_path =
os.path.dirname(str(self.remote_options["remote_exec_path"])) and the remote lib
path usage as before.
- Around line 355-378: The code in benchmark.py (scp_cmd, trtexec_safe_cmd,
subprocess.run calls) has multiple security and correctness issues: remove all
"# nosec" comments, validate self.remote_port (ensure not None and use the
correct ssh/scp port flags), never embed self.remote_password into command
arguments, and stop passing a single shell-interpreted string to SSH
(trtexec_safe_cmd) which risks command injection from self.remote_engine_path,
self.remote_options['remote_lib_path'], and self.remote_bin_path. Fix by using
key-based SSH authentication (or only use sshpass with explicit approver
consent), build scp_cmd and ssh target as "user@host" (no password), pass port
via the correct flag, and pass remote commands/LD_LIBRARY_PATH safely: either
set environment via subprocess.run(env=...) when running locally or construct a
safely-escaped remote command using shlex.quote for each component (or send a
prepared remote script) and invoke subprocess.run with a list of args (not a
single concatenated shell string); also check subprocess.returncode and capture
output/errors for logging. Ensure changes touch scp_cmd, trtexec_safe_cmd,
subprocess.run usages, and validations for
remote_port/remote_user/remote_engine_path.
- Around line 291-293: The info log in the Benchmark (or surrounding
class/method where self._base_cmd is built and logged) prints the full base
command which can leak credentials from arguments like --remoteAutoTuningConfig
(e.g., ssh://user:password@host); before calling self.logger.info, create a
redacted copy of self._base_cmd that masks sensitive values by detecting args
that contain remoteAutoTuningConfig or ssh:// patterns (or args starting with
"--remoteAutoTuningConfig") and replace the password portion (or entire URI)
with a placeholder such as "ssh://user:****@host" or "[REDACTED]"; then log the
redacted command instead of the original to avoid credential exposure.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: b8d2a6d0-2a7c-4a27-9841-5b576f1c8ec2

📥 Commits

Reviewing files that changed from the base of the PR and between bb08094 and 19a4961.

📒 Files selected for processing (1)
  • modelopt/onnx/quantization/autotune/benchmark.py

Comment thread modelopt/onnx/quantization/autotune/benchmark.py
Comment thread modelopt/onnx/quantization/autotune/benchmark.py Outdated
Comment on lines +355 to +378
if self.has_remote_config and self.is_safe:
# need to push the model to the device and use trtexec_safe to run
scp_cmd = [
"scp",
f"-P{self.remote_port}",
self.engine_path,
f"{self.remote_user}@{self.remote_ip}:{self.remote_engine_path}",
]
result = subprocess.run(scp_cmd) # nosec B603
if result.returncode != 0:
self.logger.error("Failed to push engine to remote device")
return float("inf")
ld_path = (
f"LD_LIBRARY_PATH={self.remote_options['remote_lib_path']}:$LD_LIBRARY_PATH"
)
trt_path = f"{os.path.join(self.remote_bin_path, 'trtexec_safe')}"
trtexec_safe_cmd = [
"ssh",
"-p",
f"{self.remote_port}",
f"{self.remote_user}:{self.remote_password}@{self.remote_ip}",
f"{ld_path} {trt_path} --loadEngine={self.remote_engine_path}",
]
result = subprocess.run(trtexec_safe_cmd, capture_output=True, text=True) # nosec B603
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | 🏗️ Heavy lift

CRITICAL: Multiple security violations and incorrect SSH implementation.

Several issues violate the security coding guidelines:

  1. # nosec comments are not allowed (lines 363, 378): Per SECURITY.md and coding guidelines, # nosec bypasses are forbidden. If the pattern is necessary, it requires @NVIDIA/modelopt-setup-codeowners approval with explicit PR justification.

  2. Invalid SSH authentication syntax: {self.remote_user}:{self.remote_password}@{self.remote_ip} (line 375) is not valid SSH syntax. SSH does not accept passwords on the command line. Use SSH keys, sshpass, or SSH_ASKPASS instead.

  3. Password exposure: Even if the syntax were valid, including the password in command arguments makes it visible to other users via ps.

  4. Command injection risk: Line 376 passes a command string to the remote shell. If self.remote_engine_path, self.remote_options['remote_lib_path'], or self.remote_bin_path contain shell metacharacters, they will be interpreted by the remote shell.

  5. Missing null check for remote_port: If the URL doesn't specify a port, self.remote_port is None, causing -PNone (line 359) and None (line 374) which are invalid.

Proposed fix direction
         if self.has_remote_config and self.is_safe:
             # need to push the model to the device and use trtexec_safe to run
+            if self.remote_port is None:
+                self.logger.error("Remote port is required for SCP/SSH")
+                return float("inf")
+            if self.remote_password:
+                self.logger.warning(
+                    "Password-based SSH auth is not recommended. Consider using SSH keys."
+                )
             scp_cmd = [
-                "scp",
-                f"-P{self.remote_port}",
+                "sshpass", "-p", self.remote_password,
+                "scp", "-P", str(self.remote_port),
                 self.engine_path,
                 f"{self.remote_user}@{self.remote_ip}:{self.remote_engine_path}",
             ]
-            result = subprocess.run(scp_cmd)  # nosec B603
+            result = subprocess.run(scp_cmd, capture_output=True, text=True)
             # ... similar changes for SSH command

Consider using SSH key-based authentication and passing command arguments separately to avoid shell interpretation on the remote side.

As per coding guidelines: "Any use of '# nosec' comments to bypass Bandit security checks is not allowed."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/onnx/quantization/autotune/benchmark.py` around lines 355 - 378, The
code in benchmark.py (scp_cmd, trtexec_safe_cmd, subprocess.run calls) has
multiple security and correctness issues: remove all "# nosec" comments,
validate self.remote_port (ensure not None and use the correct ssh/scp port
flags), never embed self.remote_password into command arguments, and stop
passing a single shell-interpreted string to SSH (trtexec_safe_cmd) which risks
command injection from self.remote_engine_path,
self.remote_options['remote_lib_path'], and self.remote_bin_path. Fix by using
key-based SSH authentication (or only use sshpass with explicit approver
consent), build scp_cmd and ssh target as "user@host" (no password), pass port
via the correct flag, and pass remote commands/LD_LIBRARY_PATH safely: either
set environment via subprocess.run(env=...) when running locally or construct a
safely-escaped remote command using shlex.quote for each component (or send a
prepared remote script) and invoke subprocess.run with a list of args (not a
single concatenated shell string); also check subprocess.returncode and capture
output/errors for logging. Ensure changes touch scp_cmd, trtexec_safe_cmd,
subprocess.run usages, and validations for
remote_port/remote_user/remote_engine_path.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partially addressed.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dthienan-nv, thanks for the partial fix. Here's what's been addressed and what still needs attention:

Fixed:

  • Invalid SSH target syntax (user:password@hostuser@host)
  • sshpass is now used for password-based authentication

Still remaining:

1. # nosec comments (critical per guidelines):
The new subprocess calls still carry # nosec B603. Per SECURITY.md and the coding guidelines, these are not allowed. If the pattern is genuinely necessary, the PR must be reviewed and approved by @NVIDIA/modelopt-setup-codeowners with explicit justification in the PR description.

2. Command injection risk (line ~376 in current code):
The remote command is still a single interpolated string:

f"{ld_path} {trt_path} --loadEngine={self.remote_engine_path}",

If any of remote_lib_path, remote_bin_path, or remote_engine_path contain shell metacharacters, they'll be interpreted by the remote shell. Use shlex.quote on each component:

import shlex
remote_cmd = (
    f"LD_LIBRARY_PATH={shlex.quote(str(self.remote_options['remote_lib_path']))}:$LD_LIBRARY_PATH"
    f" {shlex.quote(trt_path)}"
    f" --loadEngine={shlex.quote(str(self.remote_engine_path))}"
)

3. Missing remote_port null check:
urlparse returns None for .port when no port is specified in the URL. This would produce -PNone in the scp command and None as the SSH -p argument. Add a guard:

if self.remote_port is None:
    self.remote_port = 22  # default SSH port

or raise a ValueError if a port is required.


🧠 Learnings used
Learnt from: kevalmorabia97
Repo: NVIDIA/Model-Optimizer PR: 974
File: modelopt/torch/puzzletron/pruning/pruning_utils.py:195-201
Timestamp: 2026-03-06T14:26:21.439Z
Learning: In NVIDIA/Model-Optimizer, for PyTorch >= 2.6, torch.load() calls without an explicit weights_only argument are safe. Do not flag bare torch.load(...) as a security issue in files under the modelopt package (e.g., modelopt/torch/puzzletron/pruning/pruning_utils.py) as long as the PyTorch version constraint is maintained. If supporting PyTorch < 2.6, require an explicit weights_only argument to torch.load() to avoid potential issues.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partially addressed.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dthienan-nv, thanks for the continued updates! Here's an accurate status check against the current code:

Now fixed:

  • shlex.quote applied to remote_lib_path, trt_path, and remote_engine_path — command injection risk resolved.
  • remote_port null check added in the constructor — defaults to 22 if not specified in the URL.
  • sshpass with ssh_pass list applied correctly for both scp and ssh commands.

Still remaining:

# nosec comments (critical per guidelines): Three # nosec B603 comments remain in the run() method. Per SECURITY.md and the coding guidelines, these are not allowed without explicit review and approval by @NVIDIA/modelopt-setup-codeowners with justification in the PR description. Please either:

  1. Remove the # nosec comments and seek @NVIDIA/modelopt-setup-codeowners approval with a justification in the PR description explaining why subprocess.run with external commands is necessary and safe here, or
  2. If the project has existing approved patterns for subprocess use, align with those.

⚠️ Minor: The scp_cmd subprocess call still uses subprocess.run(scp_cmd) # nosec B603 without capture_output=True, text=True, so SCP failures won't have their stderr captured for logging. Consider adding it for consistency with the other calls.


🧠 Learnings used
Learnt from: kevalmorabia97
Repo: NVIDIA/Model-Optimizer PR: 974
File: modelopt/torch/puzzletron/pruning/pruning_utils.py:195-201
Timestamp: 2026-03-06T14:26:21.439Z
Learning: In NVIDIA/Model-Optimizer, for PyTorch >= 2.6, torch.load() calls without an explicit weights_only argument are safe. Do not flag bare torch.load(...) as a security issue in files under the modelopt package (e.g., modelopt/torch/puzzletron/pruning/pruning_utils.py) as long as the PyTorch version constraint is maintained. If supporting PyTorch < 2.6, require an explicit weights_only argument to torch.load() to avoid potential issues.

@dthienan-nv dthienan-nv marked this pull request as draft April 30, 2026 16:49
…fig validation, default value for remote port

Signed-off-by: dmoodie <dmoodie@nvidia.com>
Signed-off-by: dmoodie <dmoodie@nvidia.com>
Signed-off-by: dmoodie <dmoodie@nvidia.com>
Signed-off-by: dmoodie <dmoodie@nvidia.com>
Signed-off-by: dmoodie <dmoodie@nvidia.com>
Signed-off-by: dmoodie <dmoodie@nvidia.com>
@dthienan-nv dthienan-nv marked this pull request as ready for review April 30, 2026 18:31
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelopt/onnx/quantization/autotune/benchmark.py`:
- Around line 293-296: The remote-config cleanup currently only removes the
"--remoteAutoTuningConfig" token from trtexec_args but leaves the following URL
token and doesn't clear the self.has_remote_config flag, so update the cleanup
in the ImportError path to remove both the flag and its following token when
present and set self.has_remote_config = False; specifically, process the
trtexec_args list in the block that handles ImportError to filter out
"--remoteAutoTuningConfig" and the next element (if any) and then set
self.has_remote_config = False so subsequent checks of
self.is_safe/self.has_remote_config behave correctly.
- Around line 376-404: The SCP/SSH subprocess.run calls (scp_cmd and the
trtexec_safe_cmd invocations) lack timeouts and can hang; update each
subprocess.run that executes scp_cmd and trtexec_safe_cmd to pass a configurable
timeout (e.g., self.ssh_timeout or a new attribute like self.remote_timeout with
a sensible default) and handle subprocess.TimeoutExpired by killing the
operation, logging an error via self.logger.error, and returning float("inf")
(same failure path as other errors). Apply this to the three subprocess.run
sites shown (the initial scp_cmd run, the first trtexec_safe_cmd run, and the
fallback trtexec_safe_cmd run), using the timeout kwarg and wrapping calls in
try/except subprocess.TimeoutExpired to ensure deterministic autotune behavior.
- Around line 256-261: The code assigns parsed.username/hostname directly to
self.remote_user and self.remote_ip, which allows None to overwrite the intended
defaults and permits missing hostnames; update the assignment logic in the
constructor/initializer that sets self.remote_user, self.remote_password,
self.remote_ip, and self.remote_port so that parsed.username only replaces the
default (e.g., "root") when not None/empty, parsed.password is handled
similarly, and parsed.hostname is validated—if hostname is missing raise a
ValueError (or similar) with a clear message; keep the fallback for remote_port
to 22 when parsed.port is None.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 05684e65-2b3d-4d9b-aa78-493751fc56b9

📥 Commits

Reviewing files that changed from the base of the PR and between 19a4961 and be947cf.

📒 Files selected for processing (1)
  • modelopt/onnx/quantization/autotune/benchmark.py

Comment thread modelopt/onnx/quantization/autotune/benchmark.py
Comment thread modelopt/onnx/quantization/autotune/benchmark.py Outdated
Comment thread modelopt/onnx/quantization/autotune/benchmark.py
Signed-off-by: dmoodie <dmoodie@nvidia.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (4)
modelopt/onnx/quantization/autotune/benchmark.py (4)

294-297: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

ImportError cleanup leaves remote state inconsistent for split-arg form.

When args are passed as --remoteAutoTuningConfig <url>, this only removes the flag token and leaves the URL behind. self.has_remote_config also remains True, so remote-safe flow can still be entered later.

Suggested fix
-                trtexec_args = [
-                    arg for arg in trtexec_args if "--remoteAutoTuningConfig" not in arg
-                ]
+                filtered_args: list[str] = []
+                skip_next = False
+                for arg in trtexec_args:
+                    if skip_next:
+                        skip_next = False
+                        continue
+                    if arg == "--remoteAutoTuningConfig":
+                        skip_next = True
+                        continue
+                    if arg.startswith("--remoteAutoTuningConfig="):
+                        continue
+                    filtered_args.append(arg)
+                trtexec_args = filtered_args
+                self.has_remote_config = False
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/onnx/quantization/autotune/benchmark.py` around lines 294 - 297, The
removal of "--remoteAutoTuningConfig" only deletes the flag token but leaves the
URL token and doesn't clear self.has_remote_config; update the cleanup for
trtexec_args in the code around trtexec_args and self.is_safe so it removes both
the flag and its following value when the flag appears in split form (e.g.,
iterate through trtexec_args, drop "--remoteAutoTuningConfig" and the next
token), and ensure self.has_remote_config is set to False when no remote config
token remains so the remote-safe flow won't be entered erroneously.

339-341: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

# nosec bypasses on subprocess calls violate repository security policy.

These inline Bandit bypass comments are not allowed by the project policy and should be removed before merge (with explicit safety justification and required reviewer approval handled in PR process if needed).

As per coding guidelines: "Any use of '# nosec' comments to bypass Bandit security checks is not allowed."

Also applies to: 379-381, 395-400, 414-419

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/onnx/quantization/autotune/benchmark.py` around lines 339 - 341,
Remove the "# nosec" comments on the subprocess.run calls (the one creating
result = subprocess.run(..., timeout=self.remote_timeout_sec) and the other
occurrences at the referenced ranges) and replace the bypass with secure
handling: ensure each call passes a validated list of arguments (no shell=True),
validate or sanitize the cmd input before use, handle subprocess.TimeoutExpired
and CalledProcessError exceptions and log/propagate failures instead of
silencing them, and ensure capture_output/text remain intentional; update the
code around the subprocess.run calls (refer to the result = subprocess.run(...)
invocation and the other similar subprocess.run sites) to implement these safe
patterns.

257-262: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Preserve defaults and validate SSH authority fields before building remote targets.

parsed.username/parsed.hostname can be None; assigning directly here can produce invalid endpoints (e.g., None@None) and hides the intended "root" default user fallback.

Suggested fix
-            self.remote_user = parsed.username
-            self.remote_password = parsed.password
-            self.remote_ip = parsed.hostname
-            self.remote_port = parsed.port
-            if self.remote_port is None:
-                self.remote_port = 22
+            self.remote_user = parsed.username or self.remote_user
+            self.remote_password = parsed.password or ""
+            self.remote_ip = parsed.hostname
+            if not self.remote_ip:
+                raise ValueError(
+                    "Missing hostname in --remoteAutoTuningConfig (expected ssh://user[:pass]@host[:port]?...)."
+                )
+            self.remote_port = parsed.port or 22
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/onnx/quantization/autotune/benchmark.py` around lines 257 - 262, Do
not blindly overwrite SSH fields with possibly-None parsed values; instead
validate and preserve defaults: only set self.remote_user = parsed.username if
parsed.username is not None (otherwise keep existing default "root"), only set
self.remote_password = parsed.password if parsed.password is not None, only set
self.remote_ip = parsed.hostname if parsed.hostname is not None (and raise or
handle missing hostname), and set self.remote_port = parsed.port if parsed.port
is not None else keep default 22; ensure the code uses these validated
attributes when constructing remote targets so you never get "None@None".

368-370: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid sshpass -p because it exposes the password in process arguments.

Passing secrets via command-line arguments can leak through process listings. Prefer SSH keys, or use sshpass -e with SSHPASS set in a transient env passed to subprocess.run.

As per coding guidelines: "Never hardcode or log credentials/tokens/passwords; ensure error/log messages do not expose secrets."

Does `sshpass -p <password>` expose credentials via process listings, and is `sshpass -e` (SSHPASS env var) the recommended safer alternative?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/onnx/quantization/autotune/benchmark.py` around lines 368 - 370, The
current code appends "sshpass", "-p", and self.remote_password to ssh_pass (in
the block building the SSH command), which exposes the password via process
arguments; change this to use "sshpass", "-e" and remove the "-p" and
self.remote_password from ssh_pass, then pass the password via a transient
environment variable SSHPASS when invoking subprocess.run/call (add/merge an env
dict with {"SSHPASS": self.remote_password} or use an existing env copy) so the
secret is not visible in argv; also prefer using SSH keys if available and
remove any logging of self.remote_password in functions building or executing
the SSH command (e.g., the code that constructs ssh_pass and the caller that
runs subprocess).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@modelopt/onnx/quantization/autotune/benchmark.py`:
- Around line 294-297: The removal of "--remoteAutoTuningConfig" only deletes
the flag token but leaves the URL token and doesn't clear
self.has_remote_config; update the cleanup for trtexec_args in the code around
trtexec_args and self.is_safe so it removes both the flag and its following
value when the flag appears in split form (e.g., iterate through trtexec_args,
drop "--remoteAutoTuningConfig" and the next token), and ensure
self.has_remote_config is set to False when no remote config token remains so
the remote-safe flow won't be entered erroneously.
- Around line 339-341: Remove the "# nosec" comments on the subprocess.run calls
(the one creating result = subprocess.run(..., timeout=self.remote_timeout_sec)
and the other occurrences at the referenced ranges) and replace the bypass with
secure handling: ensure each call passes a validated list of arguments (no
shell=True), validate or sanitize the cmd input before use, handle
subprocess.TimeoutExpired and CalledProcessError exceptions and log/propagate
failures instead of silencing them, and ensure capture_output/text remain
intentional; update the code around the subprocess.run calls (refer to the
result = subprocess.run(...) invocation and the other similar subprocess.run
sites) to implement these safe patterns.
- Around line 257-262: Do not blindly overwrite SSH fields with possibly-None
parsed values; instead validate and preserve defaults: only set self.remote_user
= parsed.username if parsed.username is not None (otherwise keep existing
default "root"), only set self.remote_password = parsed.password if
parsed.password is not None, only set self.remote_ip = parsed.hostname if
parsed.hostname is not None (and raise or handle missing hostname), and set
self.remote_port = parsed.port if parsed.port is not None else keep default 22;
ensure the code uses these validated attributes when constructing remote targets
so you never get "None@None".
- Around line 368-370: The current code appends "sshpass", "-p", and
self.remote_password to ssh_pass (in the block building the SSH command), which
exposes the password via process arguments; change this to use "sshpass", "-e"
and remove the "-p" and self.remote_password from ssh_pass, then pass the
password via a transient environment variable SSHPASS when invoking
subprocess.run/call (add/merge an env dict with {"SSHPASS":
self.remote_password} or use an existing env copy) so the secret is not visible
in argv; also prefer using SSH keys if available and remove any logging of
self.remote_password in functions building or executing the SSH command (e.g.,
the code that constructs ssh_pass and the caller that runs subprocess).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: cb68d482-2dee-449b-ba0a-2c4b8e306e3a

📥 Commits

Reviewing files that changed from the base of the PR and between be947cf and 28e4e8e.

📒 Files selected for processing (1)
  • modelopt/onnx/quantization/autotune/benchmark.py

Signed-off-by: dmoodie <dmoodie@nvidia.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (3)
modelopt/onnx/quantization/autotune/benchmark.py (3)

297-299: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Redact --remoteAutoTuningConfig before logging command template

At Line 299, logging the full command can leak credentials from SSH URLs in remote autotuning config.

Proposed fix
-        self.logger.debug(f"Base command template: {' '.join(self._base_cmd)}")
+        redacted_cmd: list[str] = []
+        skip_next_remote_cfg_value = False
+        for arg in self._base_cmd:
+            if skip_next_remote_cfg_value:
+                skip_next_remote_cfg_value = False
+                redacted_cmd.append("[REDACTED_REMOTE_CONFIG]")
+                continue
+            if arg == "--remoteAutoTuningConfig":
+                redacted_cmd.append(arg)
+                skip_next_remote_cfg_value = True
+                continue
+            if arg.startswith("--remoteAutoTuningConfig="):
+                redacted_cmd.append("--remoteAutoTuningConfig=[REDACTED_REMOTE_CONFIG]")
+                continue
+            redacted_cmd.append(arg)
+        self.logger.debug(f"Base command template: {' '.join(redacted_cmd)}")

Based on learnings: “Avoid logging sensitive inputs, paths, tokens, proprietary model details, or overly verbose operational information that could leak security-relevant data.”

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/onnx/quantization/autotune/benchmark.py` around lines 297 - 299, The
debug log prints the full command template via self._base_cmd (extended with
trtexec_args), which can leak sensitive SSH URLs passed with the
--remoteAutoTuningConfig flag; before calling self.logger.debug, create a
sanitized copy of the command list (do not mutate self._base_cmd) and replace
any argument that startswith "--remoteAutoTuningConfig=" or follows the
"--remoteAutoTuningConfig" token with a redacted placeholder like
"--remoteAutoTuningConfig=<REDACTED>" and then log ' '.join(sanitized_cmd) in
the logger.debug call so the sensitive value is not exposed (keep use of
self._base_cmd, trtexec_args, and self.logger.debug).

257-262: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate SSH authority fields before assigning remote target

On Line 257 and Line 259, parsed.username/parsed.hostname can be None. That can produce invalid targets and fail later with unclear errors.

Proposed fix
-            self.remote_user = parsed.username
-            self.remote_password = parsed.password
-            self.remote_ip = parsed.hostname
-            self.remote_port = parsed.port
-            if self.remote_port is None:
-                self.remote_port = 22
+            self.remote_user = parsed.username or self.remote_user
+            self.remote_password = parsed.password or ""
+            self.remote_ip = parsed.hostname
+            if not self.remote_ip:
+                raise ValueError(
+                    "Missing hostname in --remoteAutoTuningConfig "
+                    "(expected ssh://user[:pass]@host[:port]?...)."
+                )
+            self.remote_port = parsed.port or 22
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/onnx/quantization/autotune/benchmark.py` around lines 257 - 262,
Validate parsed.username and parsed.hostname before assigning to
self.remote_user and self.remote_ip: if either is None, raise a clear ValueError
(or custom exception) that includes the original target string and explains that
SSH username and host are required; keep the existing fallback for parsed.port
(set to 22 if None) and still assign parsed.password to self.remote_password
(you may allow None for password but document it), and update the assignment
block that sets self.remote_user, self.remote_password, self.remote_ip,
self.remote_port to perform these checks and raise the informative error instead
of silently producing invalid targets.

340-340: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Remove forbidden # nosec suppressions on subprocess calls

# nosec is still present on Line 340, Line 380, Line 399, and Line 418. This violates the repo’s Python security policy for modelopt/**/*.py.

Proposed fix
-            result = subprocess.run(
-                cmd, capture_output=True, text=True, timeout=self.remote_timeout_sec
-            )  # nosec B603
+            result = subprocess.run(
+                cmd, capture_output=True, text=True, timeout=self.remote_timeout_sec
+            )
...
-                result = subprocess.run(
-                    scp_cmd, capture_output=True, text=True, timeout=self.remote_timeout_sec
-                )  # nosec B603
+                result = subprocess.run(
+                    scp_cmd, capture_output=True, text=True, timeout=self.remote_timeout_sec
+                )
...
-                result = subprocess.run(
-                    trtexec_safe_cmd,
-                    capture_output=True,
-                    text=True,
-                    timeout=self.remote_timeout_sec,
-                )  # nosec B603
+                result = subprocess.run(
+                    trtexec_safe_cmd,
+                    capture_output=True,
+                    text=True,
+                    timeout=self.remote_timeout_sec,
+                )
...
-                    result = subprocess.run(
-                        trtexec_safe_cmd,
-                        capture_output=True,
-                        text=True,
-                        timeout=self.remote_timeout_sec,
-                    )  # nosec B603
+                    result = subprocess.run(
+                        trtexec_safe_cmd,
+                        capture_output=True,
+                        text=True,
+                        timeout=self.remote_timeout_sec,
+                    )

As per coding guidelines: “Any use of '# nosec' comments to bypass Bandit security checks is not allowed.”

Also applies to: 380-380, 399-399, 418-418

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/onnx/quantization/autotune/benchmark.py` at line 340, Remove the
forbidden "# nosec" suppressions on subprocess calls and fix the underlying
security issue: locate the subprocess invocations (e.g., calls to subprocess.run
/ subprocess.Popen in benchmark.py) that currently have "# nosec B603", remove
those comments, and make the calls secure by passing arguments as a list (not a
single shell string), ensuring shell=False, sanitizing/validating any dynamic
inputs before use (or using shlex.split on trusted strings), and properly
handling exceptions/return codes; update the relevant call sites (the
subprocess.run/subprocess.Popen usages) to follow these patterns so Bandit
warnings are resolved without silencing them.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@modelopt/onnx/quantization/autotune/benchmark.py`:
- Around line 297-299: The debug log prints the full command template via
self._base_cmd (extended with trtexec_args), which can leak sensitive SSH URLs
passed with the --remoteAutoTuningConfig flag; before calling self.logger.debug,
create a sanitized copy of the command list (do not mutate self._base_cmd) and
replace any argument that startswith "--remoteAutoTuningConfig=" or follows the
"--remoteAutoTuningConfig" token with a redacted placeholder like
"--remoteAutoTuningConfig=<REDACTED>" and then log ' '.join(sanitized_cmd) in
the logger.debug call so the sensitive value is not exposed (keep use of
self._base_cmd, trtexec_args, and self.logger.debug).
- Around line 257-262: Validate parsed.username and parsed.hostname before
assigning to self.remote_user and self.remote_ip: if either is None, raise a
clear ValueError (or custom exception) that includes the original target string
and explains that SSH username and host are required; keep the existing fallback
for parsed.port (set to 22 if None) and still assign parsed.password to
self.remote_password (you may allow None for password but document it), and
update the assignment block that sets self.remote_user, self.remote_password,
self.remote_ip, self.remote_port to perform these checks and raise the
informative error instead of silently producing invalid targets.
- Line 340: Remove the forbidden "# nosec" suppressions on subprocess calls and
fix the underlying security issue: locate the subprocess invocations (e.g.,
calls to subprocess.run / subprocess.Popen in benchmark.py) that currently have
"# nosec B603", remove those comments, and make the calls secure by passing
arguments as a list (not a single shell string), ensuring shell=False,
sanitizing/validating any dynamic inputs before use (or using shlex.split on
trusted strings), and properly handling exceptions/return codes; update the
relevant call sites (the subprocess.run/subprocess.Popen usages) to follow these
patterns so Bandit warnings are resolved without silencing them.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 25d0d07d-dd26-491d-b62d-0b95541d748f

📥 Commits

Reviewing files that changed from the base of the PR and between 28e4e8e and 3660092.

📒 Files selected for processing (1)
  • modelopt/onnx/quantization/autotune/benchmark.py

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
modelopt/onnx/quantization/autotune/benchmark.py (2)

300-302: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Potential credential exposure in logs

Line 302 logs the full base command including trtexec_args, which may contain --remoteAutoTuningConfig=ssh://user:password@host.... This logs credentials at debug level.

Proposed fix: Redact credentials before logging
         self._base_cmd.extend(trtexec_args)

-        self.logger.debug(f"Base command template: {' '.join(self._base_cmd)}")
+        # Redact credentials from log output
+        redacted_cmd = [
+            re.sub(r"(ssh://[^:]*:)[^@]+(@)", r"\1****\2", arg) for arg in self._base_cmd
+        ]
+        self.logger.debug(f"Base command template: {' '.join(redacted_cmd)}")

As per coding guidelines (SECURITY.md): "Don't log secrets/credentials/tokens/SSH details."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/onnx/quantization/autotune/benchmark.py` around lines 300 - 302, The
debug log prints the full base command including trtexec_args which may contain
SSH credentials (e.g., --remoteAutoTuningConfig=ssh://user:password@host);
update the logging in the code path after self._base_cmd.extend(trtexec_args) so
you sanitize trtexec_args (or a copy of self._base_cmd) before calling
self.logger.debug. Implement a small redaction step in benchmark.py that detects
and masks credential patterns like ssh://user:password@host or key=value pairs
containing passwords/tokens (e.g., replace the password portion with **** or
remove the value) and then log the sanitized command using the same logger.debug
call instead of the raw command.

341-343: ⚠️ Potential issue | 🔴 Critical

Remove # nosec B603 comments—they violate security guidelines without required justification

The code contains four # nosec B603 comments (lines 343, 383, 402, 421) that bypass Bandit security checks. Per SECURITY.md and coding guidelines, # nosec comments are forbidden unless accompanied by:

  1. An inline comment explaining why the pattern is necessary and why it is safe in this specific context
  2. A PR description requesting review from @NVIDIA/modelopt-setup-codeowners with explicit justification

Currently, neither is present. The subprocess calls themselves are safe (list arguments, no shell=True, proper use of shlex.quote on user-controlled values), so removing the # nosec comments should allow Bandit to pass without lowering security posture.

Fix: Remove # nosec comments from all four subprocess.run calls
             result = subprocess.run(
                 cmd, capture_output=True, text=True, timeout=self.remote_timeout_sec
-            )  # nosec B603
+            )
...
                 result = subprocess.run(
                     scp_cmd, capture_output=True, text=True, timeout=self.remote_timeout_sec
-                )  # nosec B603
+                )
...
                 result = subprocess.run(
                     trtexec_safe_cmd,
                     capture_output=True,
                     text=True,
                     timeout=self.remote_timeout_sec,
-                )  # nosec B603
+                )
...
                     result = subprocess.run(
                         trtexec_safe_cmd,
                         capture_output=True,
                         text=True,
                         timeout=self.remote_timeout_sec,
-                    )  # nosec B603
+                    )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/onnx/quantization/autotune/benchmark.py` around lines 341 - 343,
Remove the forbidden "# nosec B603" annotations from the subprocess.run calls in
this module (the four occurrences that call subprocess.run(...,
timeout=self.remote_timeout_sec) and the other subprocess.run(...) invocations
using list arguments and shlex.quote); leave the subprocess.run call signatures
and argument lists unchanged. Do not add any other code or suppressions; if a
justification is later required, include an inline safety comment plus the PR
review request to `@NVIDIA/modelopt-setup-codeowners` per guidelines.
🧹 Nitpick comments (3)
modelopt/onnx/quantization/autotune/benchmark.py (3)

439-441: 💤 Low value

Consider adding specific timeout handling for better error messages

The generic except Exception at line 439 catches subprocess.TimeoutExpired, but the error message "Benchmark failed: {e}" doesn't clearly indicate a timeout occurred. A specific handler would improve debugging.

Proposed fix
+        except subprocess.TimeoutExpired as e:
+            self.logger.error(f"Benchmark timed out after {self.remote_timeout_sec}s: {e}")
+            return float("inf")
         except Exception as e:
             self.logger.error(f"Benchmark failed: {e}")
             return float("inf")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/onnx/quantization/autotune/benchmark.py` around lines 439 - 441, The
generic except in the benchmarking routine (the block that currently does except
Exception as e and calls self.logger.error(f"Benchmark failed: {e}") then
returns float("inf")) should be split to handle subprocess.TimeoutExpired
specifically: add an except subprocess.TimeoutExpired as e clause before the
generic except to log a clear timeout message (e.g.,
self.logger.error(f"Benchmark timed out after {timeout}s: {e}")) and return
float("inf"), then keep the broad except Exception as e for other failures;
reference the existing self.logger.error usage and the function/method
containing this try/except to locate where to add the new except.

256-265: 💤 Low value

Validate empty username in addition to None

urlparse("ssh://:password@host") returns username="" which passes the is None check but produces an invalid SSH target. Consider validating that the username is non-empty.

Proposed fix
             self.remote_user = parsed.username
             self.remote_password = parsed.password
             self.remote_ip = parsed.hostname
             self.remote_port = parsed.port
-            if self.remote_user is None:
+            if not self.remote_user:
                 raise ValueError("Unable to parse remote user from --remoteAutoTuningConfig")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/onnx/quantization/autotune/benchmark.py` around lines 256 - 265,
After parsing the SSH URL into parsed, tighten validation for parsed fields by
ensuring self.remote_user is non-empty (not None and not an empty string) and
likewise ensure self.remote_ip is a non-empty value; replace the existing "if
self.remote_user is None" and "if self.remote_ip is None" checks with truthy
checks (e.g., if not self.remote_user / if not self.remote_ip) so that cases
like urlparse("ssh://:password@host") are rejected, while leaving the
remote_port defaulting behavior (set to 22 when None) unchanged.

286-286: 💤 Low value

Redundant is_safe assignment

Line 286 sets self.is_safe = True inside the conditional block, but line 299 unconditionally reassigns self.is_safe = "--safe" in trtexec_args immediately after the block. Line 286 is redundant.

Proposed fix
                     self.trtexec_args.append("--safe")
-                    self.is_safe = True

Also applies to: 299-299

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/onnx/quantization/autotune/benchmark.py` at line 286, The assignment
self.is_safe = True inside the conditional is redundant because self.is_safe is
immediately overwritten later with self.is_safe = "--safe" in trtexec_args; in
the class/method where trtexec args are prepared (look for the method that
builds trtexec_args and sets self.is_safe in
modelopt/onnx/quantization/autotune/benchmark.py), remove the earlier
self.is_safe = True so the flag is set only once by the final assignment,
leaving any conditional logic that affects trtexec_args intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelopt/onnx/quantization/autotune/benchmark.py`:
- Around line 292-297: The warning message inside the except ImportError block
in benchmark.py (the "except ImportError as e" handler that currently calls
self.logger.warning(...)) is misleading because it says
"--remoteAutoTuningConfig" will be removed but the code immediately raises the
exception; update the log and error handling in that except block of the
benchmarking logic (reference: the except ImportError as e handler and
self.logger) so the message accurately reflects that remote autotuning is
unsupported and the operation will abort: either change the log to a clear
error/exception-level message (e.g., use self.logger.error or
self.logger.exception) that includes e (the exception details) and then
re-raise, or, if you intend to continue by removing the arg, implement the
actual removal logic for "--remoteAutoTuningConfig" before proceeding; do not
leave the old "Removing --remoteAutoTuningConfig" text while still raising.

---

Duplicate comments:
In `@modelopt/onnx/quantization/autotune/benchmark.py`:
- Around line 300-302: The debug log prints the full base command including
trtexec_args which may contain SSH credentials (e.g.,
--remoteAutoTuningConfig=ssh://user:password@host); update the logging in the
code path after self._base_cmd.extend(trtexec_args) so you sanitize trtexec_args
(or a copy of self._base_cmd) before calling self.logger.debug. Implement a
small redaction step in benchmark.py that detects and masks credential patterns
like ssh://user:password@host or key=value pairs containing passwords/tokens
(e.g., replace the password portion with **** or remove the value) and then log
the sanitized command using the same logger.debug call instead of the raw
command.
- Around line 341-343: Remove the forbidden "# nosec B603" annotations from the
subprocess.run calls in this module (the four occurrences that call
subprocess.run(..., timeout=self.remote_timeout_sec) and the other
subprocess.run(...) invocations using list arguments and shlex.quote); leave the
subprocess.run call signatures and argument lists unchanged. Do not add any
other code or suppressions; if a justification is later required, include an
inline safety comment plus the PR review request to
`@NVIDIA/modelopt-setup-codeowners` per guidelines.

---

Nitpick comments:
In `@modelopt/onnx/quantization/autotune/benchmark.py`:
- Around line 439-441: The generic except in the benchmarking routine (the block
that currently does except Exception as e and calls
self.logger.error(f"Benchmark failed: {e}") then returns float("inf")) should be
split to handle subprocess.TimeoutExpired specifically: add an except
subprocess.TimeoutExpired as e clause before the generic except to log a clear
timeout message (e.g., self.logger.error(f"Benchmark timed out after {timeout}s:
{e}")) and return float("inf"), then keep the broad except Exception as e for
other failures; reference the existing self.logger.error usage and the
function/method containing this try/except to locate where to add the new
except.
- Around line 256-265: After parsing the SSH URL into parsed, tighten validation
for parsed fields by ensuring self.remote_user is non-empty (not None and not an
empty string) and likewise ensure self.remote_ip is a non-empty value; replace
the existing "if self.remote_user is None" and "if self.remote_ip is None"
checks with truthy checks (e.g., if not self.remote_user / if not
self.remote_ip) so that cases like urlparse("ssh://:password@host") are
rejected, while leaving the remote_port defaulting behavior (set to 22 when
None) unchanged.
- Line 286: The assignment self.is_safe = True inside the conditional is
redundant because self.is_safe is immediately overwritten later with
self.is_safe = "--safe" in trtexec_args; in the class/method where trtexec args
are prepared (look for the method that builds trtexec_args and sets self.is_safe
in modelopt/onnx/quantization/autotune/benchmark.py), remove the earlier
self.is_safe = True so the flag is set only once by the final assignment,
leaving any conditional logic that affects trtexec_args intact.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 05a6afb2-73a0-4d73-a098-c67808a0e42a

📥 Commits

Reviewing files that changed from the base of the PR and between 3660092 and 2f75b5d.

📒 Files selected for processing (1)
  • modelopt/onnx/quantization/autotune/benchmark.py

Comment thread modelopt/onnx/quantization/autotune/benchmark.py
…uld be the one to decide on killing the process. Improve documentation, improve log messages

Signed-off-by: dmoodie <dmoodie@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant