Skip to content

[6106576] Restore llm_export_utils as deprecated shim for edgellm 0.6.1 compat#1356

Open
ajrasane wants to merge 1 commit intomainfrom
ajrasane/nvbug_6106576
Open

[6106576] Restore llm_export_utils as deprecated shim for edgellm 0.6.1 compat#1356
ajrasane wants to merge 1 commit intomainfrom
ajrasane/nvbug_6106576

Conversation

@ajrasane
Copy link
Copy Markdown
Contributor

@ajrasane ajrasane commented Apr 27, 2026

What does this PR do?

Type of change: Bug fix (backward-compat restoration)

#1210 (b3feebf, "Replace in-repo LLM ONNX export with TensorRT-Edge-LLM") removed modelopt/onnx/llm_export_utils/ from 0.44.0rc1 and pointed users at TensorRT-Edge-LLM as the migration target. The PR description openly flagged the change as not backward compatible.

The catch: TensorRT-Edge-LLM 0.6.1 itself imports the deleted symbol:

# tensorrt_edgellm/onnx_export/onnx_utils.py:24
from modelopt.onnx.llm_export_utils.surgeon_utils import fold_fp8_qdq_to_dq

Because that import is at module load time, every tensorrt-edgellm-* CLI — including tensorrt-edgellm-quantize-llm --help — fails immediately with ModuleNotFoundError. QA reports a 5/5 (100%) failure rate on tests/test_onnx_ptq/test_onnx_ptq_edge_llm.py, reproducible without a GPU. The "unused" framing in the original removal commit (d89138a6) only held inside this repo; the public API surface had an external consumer.

What this PR does

  • Restores the four original submodules under modelopt/onnx/llm_export_utils/ verbatim from d89138a6^:
    • __init__.py
    • surgeon_utils.py (contains the missing fold_fp8_qdq_to_dq, plus clear_inputs, clear_outputs, extract_layer_id, no_none_elements)
    • export_utils.py (ModelLoader, WrapperModelForCausalLM, RopeType, llm_to_onnx, torch_to_onnx)
    • quantization_utils.py (get_quant_config, quantize)
  • __init__.py emits a DeprecationWarning on import directing users to modelopt.onnx.export, modelopt.onnx.graph_surgery, or TensorRT-Edge-LLM.

The new modelopt.onnx.export and modelopt.onnx.graph_surgery packages do not expose fold_fp8_qdq_to_dq (verified by grep), so a pure import-redirect shim wouldn't have worked — the function itself has to come back.

Why a shim instead of fixing edgellm

We should still ship tensorrt-edgellm 0.6.2 that inlines the helper and drops the import, but every existing 0.6.1 install (and the version pin in its wheel — nvidia-modelopt[torch,onnx]==0.39.0) is broken in the meantime, and pip doesn't downgrade modelopt when 0.44.0rc1 is already installed. The shim unblocks them on the modelopt side immediately.

Usage

No new usage surface. External consumers continue to import the same paths as before; they get a DeprecationWarning pointing them at the successor APIs.

Testing

Verified the failing import succeeds and the warning fires:

```bash
$ python -W default::DeprecationWarning -c \
"from modelopt.onnx.llm_export_utils.surgeon_utils import fold_fp8_qdq_to_dq; print('OK')"
:1: DeprecationWarning: modelopt.onnx.llm_export_utils is deprecated and will be removed in a future release. Use modelopt.onnx.export and modelopt.onnx.graph_surgery, or migrate to TensorRT-Edge-LLM (https://github.com/NVIDIA/TensorRT-Edge-LLM).
OK
```

The four submodules `import` cleanly (`surgeon_utils`, `export_utils`, `quantization_utils`). All pre-commit hooks pass (ruff, mypy, license headers, bandit, RST formatting).

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed (`git commit -s -S`).
  • Make sure you read and follow the Security Best Practices.
  • Is this change backward compatible?: ✅ — restores a previously public API; new code should still migrate.
  • If you copied code from any other sources or added a new PIP dependency, did you follow guidance in `CONTRIBUTING.md`: N/A (restoration of code from this repo's own history).
  • Did you write any new necessary tests?: ❌ — restoration of removed code; the failing edgellm test path is the integration test.
  • Did you update Changelog?: ❌ — intentionally omitted; this restores a previously public API as a deprecated shim, and the prior removal wasn't itself called out in the changelog.

Additional Information

  • Original removal: Replace in-repo LLM ONNX export with TensorRT-Edge-LLM #1210 (`b3feebfe`) and prep commit `d89138a6` "Remove unused llm_export_utils package".
  • Followup: track edgellm-side fix (drop the modelopt import / inline `fold_fp8_qdq_to_dq`) so the shim can be removed in a future major.
  • Suggested cherry-pick to `release/0.44.0` so 0.44.0 GA ships without the regression.

@ajrasane ajrasane requested a review from a team as a code owner April 27, 2026 17:24
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 27, 2026

📝 Walkthrough

Walkthrough

Introduces a new modelopt.onnx.llm_export_utils package with utilities for LLM ONNX export pipelines. Includes a deprecation shim directing users to newer export paths, along with modules for HuggingFace model conversion to ONNX, quantization configuration and calibration, and FP8 quantization graph optimization.

Changes

Cohort / File(s) Summary
Package Initialization
modelopt/onnx/llm_export_utils/__init__.py
Deprecation notice emitted on import, instructing users to migrate to modelopt.onnx.export, modelopt.onnx.graph_surgery, or TensorRT-Edge-LLM.
ONNX Export Utilities
modelopt/onnx/llm_export_utils/export_utils.py
Model loading from HuggingFace with rope type detection, wrapper class standardizing model I/O around KV cache handling, and ONNX export function leveraging torch.onnx.export with opset 19.
Quantization Utilities
modelopt/onnx/llm_export_utils/quantization_utils.py
Quantization configuration builder for fp8, nvfp4, and int4_awq precisions with lm_head-specific precision options, calibration loop implementation, and end-to-end quantization orchestration with dataloader construction.
Graph Surgery Utilities
modelopt/onnx/llm_export_utils/surgeon_utils.py
ONNX graph helper functions for node/tensor I/O clearing, layer index parsing, and FP8 quantization folding that constant-folds weight quantizers into FP8-packed constants and rewires dequantize operations.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 6
✅ Passed checks (6 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns ✅ Passed No security anti-patterns detected: trust_remote_code defaults to False, no torch.load/eval/exec calls, no hardcoded unsafe defaults.
Title check ✅ Passed The title clearly and specifically describes the main change: restoring a deprecated package compatibility shim for TensorRT-Edge-LLM.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch ajrasane/nvbug_6106576

Comment @coderabbitai help to get the list of available commands and usage tips.

…mpat

PR #1210 (b3feebf) removed the modelopt.onnx.llm_export_utils package in
0.44.0rc1, pointing users at TensorRT-Edge-LLM as the migration target.
However, TensorRT-Edge-LLM 0.6.1 itself imports
modelopt.onnx.llm_export_utils.surgeon_utils.fold_fp8_qdq_to_dq at module
load time, so every tensorrt-edgellm-* CLI fails immediately with
ModuleNotFoundError - even tensorrt-edgellm-quantize-llm --help. The
"unused" framing in the original removal commit only held inside this
repo; the public API surface had an external consumer.

Restore the four original submodules verbatim under
modelopt/onnx/llm_export_utils/ and emit a DeprecationWarning on package
import directing users to modelopt.onnx.export, modelopt.onnx.graph_surgery,
or TensorRT-Edge-LLM. The new exporter / graph-surgery packages do not
expose fold_fp8_qdq_to_dq, so a pure import-redirect would not have worked
- the function itself has to come back.

Fixes nvbug 6106576 / OMNIML-3995.

Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>
@ajrasane ajrasane force-pushed the ajrasane/nvbug_6106576 branch from 2fcdab3 to e17c3c9 Compare April 27, 2026 17:29
@ajrasane ajrasane added the cherry-pick-0.44.0 After code freeze, cherry-pick to release branch for next rc (bulk update). Only for bug fixes / doc label Apr 27, 2026
@ajrasane ajrasane changed the title fix(onnx): restore llm_export_utils as deprecated shim for edgellm 0.6.1 compat [6106576] Restore llm_export_utils as deprecated shim for edgellm 0.6.1 compat Apr 27, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (6)
modelopt/onnx/llm_export_utils/surgeon_utils.py (3)

79-79: Consider using logger instead of print() for consistency.

Other modules in the codebase (e.g., modelopt/onnx/export/fp8_exporter.py) use logger from modelopt.onnx.logging_config for output. Using print() directly bypasses log level control and formatting.

♻️ Suggested change
+from modelopt.onnx.logging_config import logger
+
 ...
 
-    print("Replacing all (fp32 weights + fp8 QDQ) with (fp8 weights + DQ)...")
+    logger.info("Replacing all (fp32 weights + fp8 QDQ) with (fp8 weights + DQ)...")
 ...
-    print(f"fp8 qdq replaced with only dq completed in {end_time - start_time}s.")
+    logger.info(f"fp8 qdq replaced with only dq completed in {end_time - start_time}s.")

Also applies to: 118-118

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/onnx/llm_export_utils/surgeon_utils.py` at line 79, Replace the
direct print() calls in surgeon_utils.py (the "Replacing all (fp32 weights + fp8
QDQ) with (fp8 weights + DQ)..." message at the print on line ~79 and the
similar call at ~118) with the project logger: import logger from
modelopt.onnx.logging_config and emit the message via logger.info (or
logger.debug if more appropriate) so output respects log levels and formatting;
update any surrounding code to remove print and use logger consistently.

95-97: Assert used for runtime validation may be stripped in optimized mode.

Using assert for validating runtime conditions can be problematic since assertions are removed when Python runs with optimization flags (-O). Consider using a proper exception for this QDQ pair validation.

♻️ Suggested fix
-            assert dq_op.op == "TRT_FP8DequantizeLinear", (
-                f"QDQ does not occur in pairs. You reached {dq_op.op}"
-            )
+            if dq_op.op != "TRT_FP8DequantizeLinear":
+                raise ValueError(f"QDQ does not occur in pairs. You reached {dq_op.op}")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/onnx/llm_export_utils/surgeon_utils.py` around lines 95 - 97,
Replace the runtime assert used for QDQ pair validation in surgeon_utils.py with
an explicit exception so the check isn't removed under Python -O; specifically,
where the code asserts dq_op.op == "TRT_FP8DequantizeLinear" (in the QDQ pairing
logic inside the function that iterates/validates dq_op), raise a ValueError or
RuntimeError with the same descriptive message instead of using assert,
preserving the original message text and context.

53-56: Consider using a more specific exception type.

Using a bare Exception is less precise than a more specific exception like ValueError. This also makes it harder for callers to catch only this specific error.

♻️ Suggested improvement
     match = re.search(r"layers\.(\d+)", name)
     if match:
         return int(match.group(1))
-    raise Exception(f"{name} does not contain layer info!")
+    raise ValueError(f"{name} does not contain layer info!")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/onnx/llm_export_utils/surgeon_utils.py` around lines 53 - 56,
Replace the generic raise Exception in the layer-extraction logic with a more
specific exception type (e.g., raise ValueError) so callers can catch this
specific failure; update the code around the re.search(r"layers\.(\d+)", name)
check (the block that returns int(match.group(1)) on success) to raise
ValueError(f"{name} does not contain layer info!") (or similar) instead of
Exception.
modelopt/onnx/llm_export_utils/export_utils.py (2)

143-145: Consider using logger instead of print() for consistency.

Other modules in the ONNX export codebase use logger from modelopt.onnx.logging_config. Using print() bypasses log level control.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/onnx/llm_export_utils/export_utils.py` around lines 143 - 145,
Replace the print() call in export_utils.py with the module logger: import the
logger from modelopt.onnx.logging_config if not already imported, then call
logger.info(...) instead of print(...) to emit the message about Native ONNX
export completion; use the existing variables start_time, end_time, and
output_dir in the log message (e.g., include the elapsed time end_time -
start_time and the output_dir) so log level control and formatting are
respected.

72-75: Bare except Exception silently swallows errors.

The broad exception handler catches all exceptions when accessing model.model, which could mask legitimate errors (e.g., AttributeError from a different cause). Consider catching only AttributeError explicitly.

♻️ Suggested fix
         try:
             self.model = model.model
-        except Exception:
+        except AttributeError:
             self.model = model
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/onnx/llm_export_utils/export_utils.py` around lines 72 - 75, The
current broad try/except around "self.model = model.model" in export_utils.py
swallows all exceptions; change it to catch only AttributeError so real errors
aren't masked: replace the bare "except Exception" with "except AttributeError"
(or use getattr with a default) to fall back to "self.model = model" only when
the model attribute is genuinely missing, keeping the fallback behavior intact.
modelopt/onnx/llm_export_utils/quantization_utils.py (1)

120-130: Assert statements used for user input validation may be stripped in optimized mode.

Using assert to validate the precision and lm_head_precision arguments can cause silent failures when Python runs with -O flag. Consider using explicit if/raise ValueError for argument validation.

♻️ Suggested fix
-    assert precision in [
-        "fp8",
-        "int4_awq",
-        "nvfp4",
-    ], (
-        f"Only fp8(W8A8), int4_awq(W4A16), nvfp4(W4A4) is supported. You passed an unsupported precision: {precision}."
-    )
-
-    assert lm_head_precision in ["fp16"], (
-        f"Only fp16(unquantized) is supported for lm_head. You passed an unsupported precision: {lm_head_precision}."
-    )
+    if precision not in ["fp8", "int4_awq", "nvfp4"]:
+        raise ValueError(
+            f"Only fp8(W8A8), int4_awq(W4A16), nvfp4(W4A4) is supported. You passed an unsupported precision: {precision}."
+        )
+
+    if lm_head_precision not in ["fp16"]:
+        raise ValueError(
+            f"Only fp16(unquantized) is supported for lm_head. You passed an unsupported precision: {lm_head_precision}."
+        )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/onnx/llm_export_utils/quantization_utils.py` around lines 120 - 130,
Replace the assert-based validation for precision and lm_head_precision with
explicit checks that raise ValueError so validation is not skipped under Python
-O; specifically, where the code currently uses "assert precision in [...] ,
(f'...{precision}')" and "assert lm_head_precision in ['fp16'],
(f'...{lm_head_precision}')" change them to conditional if statements that raise
ValueError with the same descriptive messages referencing the variables
precision and lm_head_precision so callers get deterministic errors in all
runtime modes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelopt/onnx/llm_export_utils/export_utils.py`:
- Line 94: The function llm_to_onnx uses mutable default arguments
extra_inputs={} and extra_dyn_axes={} which can cause shared-state bugs; change
the signature to use None for those defaults and inside llm_to_onnx initialize
them with empty dicts (e.g., if extra_inputs is None: extra_inputs = {} and
similarly for extra_dyn_axes) so each call gets a fresh mapping while preserving
current behavior.

In `@modelopt/onnx/llm_export_utils/quantization_utils.py`:
- Around line 132-133: Remove the prohibited "# nosec B105" comment and avoid a
raw string literal comparison by replacing the literal "<unk>" with a
module-level constant or the tokenizer's own unk_token attribute; specifically,
introduce a constant (e.g., UNK_TOKEN = "<unk>") at the top of the module or use
tokenizer.unk_token if available, then change the check in quantization_utils.py
from if tokenizer.pad_token != "<unk>": to if tokenizer.pad_token != UNK_TOKEN:
(or if tokenizer.pad_token != tokenizer.unk_token:), and keep the assignment
tokenizer.pad_token = tokenizer.eos_token unchanged so Bandit no longer flags a
hardcoded string.

---

Nitpick comments:
In `@modelopt/onnx/llm_export_utils/export_utils.py`:
- Around line 143-145: Replace the print() call in export_utils.py with the
module logger: import the logger from modelopt.onnx.logging_config if not
already imported, then call logger.info(...) instead of print(...) to emit the
message about Native ONNX export completion; use the existing variables
start_time, end_time, and output_dir in the log message (e.g., include the
elapsed time end_time - start_time and the output_dir) so log level control and
formatting are respected.
- Around line 72-75: The current broad try/except around "self.model =
model.model" in export_utils.py swallows all exceptions; change it to catch only
AttributeError so real errors aren't masked: replace the bare "except Exception"
with "except AttributeError" (or use getattr with a default) to fall back to
"self.model = model" only when the model attribute is genuinely missing, keeping
the fallback behavior intact.

In `@modelopt/onnx/llm_export_utils/quantization_utils.py`:
- Around line 120-130: Replace the assert-based validation for precision and
lm_head_precision with explicit checks that raise ValueError so validation is
not skipped under Python -O; specifically, where the code currently uses "assert
precision in [...] , (f'...{precision}')" and "assert lm_head_precision in
['fp16'], (f'...{lm_head_precision}')" change them to conditional if statements
that raise ValueError with the same descriptive messages referencing the
variables precision and lm_head_precision so callers get deterministic errors in
all runtime modes.

In `@modelopt/onnx/llm_export_utils/surgeon_utils.py`:
- Line 79: Replace the direct print() calls in surgeon_utils.py (the "Replacing
all (fp32 weights + fp8 QDQ) with (fp8 weights + DQ)..." message at the print on
line ~79 and the similar call at ~118) with the project logger: import logger
from modelopt.onnx.logging_config and emit the message via logger.info (or
logger.debug if more appropriate) so output respects log levels and formatting;
update any surrounding code to remove print and use logger consistently.
- Around line 95-97: Replace the runtime assert used for QDQ pair validation in
surgeon_utils.py with an explicit exception so the check isn't removed under
Python -O; specifically, where the code asserts dq_op.op ==
"TRT_FP8DequantizeLinear" (in the QDQ pairing logic inside the function that
iterates/validates dq_op), raise a ValueError or RuntimeError with the same
descriptive message instead of using assert, preserving the original message
text and context.
- Around line 53-56: Replace the generic raise Exception in the layer-extraction
logic with a more specific exception type (e.g., raise ValueError) so callers
can catch this specific failure; update the code around the
re.search(r"layers\.(\d+)", name) check (the block that returns
int(match.group(1)) on success) to raise ValueError(f"{name} does not contain
layer info!") (or similar) instead of Exception.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: b74e375a-d03d-4e6c-bb6a-d671b4cf1562

📥 Commits

Reviewing files that changed from the base of the PR and between 5b41ba4 and d885663.

📒 Files selected for processing (5)
  • CHANGELOG.rst
  • modelopt/onnx/llm_export_utils/__init__.py
  • modelopt/onnx/llm_export_utils/export_utils.py
  • modelopt/onnx/llm_export_utils/quantization_utils.py
  • modelopt/onnx/llm_export_utils/surgeon_utils.py

return logits, past_key_values


def llm_to_onnx(model, output_dir, extra_inputs={}, extra_dyn_axes={}):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Mutable default argument is a Python antipattern.

Using {} as a default value for extra_inputs and extra_dyn_axes can lead to subtle bugs since the same dict object is shared across all calls. Use None and initialize inside the function.

🐛 Suggested fix
-def llm_to_onnx(model, output_dir, extra_inputs={}, extra_dyn_axes={}):
+def llm_to_onnx(model, output_dir, extra_inputs=None, extra_dyn_axes=None):
     """Export the WrapperModelForCausalLM to ONNX with fixed I/O names and shape definitions and save to `output_dir`.

     Parameters:
         model: torch.Module
         output_dir: str, the output_dir of the original ONNX.
         extra_inputs: dict, append additional inputs after kv_cache. Usually for VL models
         extra_dyn_axes: dict. Usually for VL models
     """
+    if extra_inputs is None:
+        extra_inputs = {}
+    if extra_dyn_axes is None:
+        extra_dyn_axes = {}
     start_time = time.time()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/onnx/llm_export_utils/export_utils.py` at line 94, The function
llm_to_onnx uses mutable default arguments extra_inputs={} and extra_dyn_axes={}
which can cause shared-state bugs; change the signature to use None for those
defaults and inside llm_to_onnx initialize them with empty dicts (e.g., if
extra_inputs is None: extra_inputs = {} and similarly for extra_dyn_axes) so
each call gets a fresh mapping while preserving current behavior.

Comment thread modelopt/onnx/llm_export_utils/quantization_utils.py
@github-actions
Copy link
Copy Markdown
Contributor

PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1356/

Built to branch gh-pages at 2026-04-27 17:33 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
modelopt/onnx/llm_export_utils/quantization_utils.py (1)

132-135: ⚠️ Potential issue | 🟠 Major

Fix the pad-token guard and remove the banned # nosec.

The condition is inverted: it currently rewrites every non-<unk> pad token to eos_token, which can silently change tokenizer behavior during calibration. Please only fall back when pad_token is unset or matches the tokenizer’s unk token, and drop the suppression.

Proposed fix
-    if tokenizer.pad_token != "<unk>":  # nosec B105
-        tokenizer.pad_token = tokenizer.eos_token
-    if tokenizer.pad_token is None:
+    if tokenizer.pad_token in (None, tokenizer.unk_token):
         tokenizer.pad_token = tokenizer.eos_token

As per coding guidelines: "Bandit security checks are enforced via pre-commit hooks. # nosec comments are not allowed as bypasses for security checks."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/onnx/llm_export_utils/quantization_utils.py` around lines 132 - 135,
The current guard in quantization_utils.py incorrectly overwrites any
non-"<unk>" pad token and includes a banned "# nosec" suppression; change the
logic to only set tokenizer.pad_token to tokenizer.eos_token when the pad token
is unset or equals the tokenizer's unk token (i.e., if tokenizer.pad_token is
None or tokenizer.pad_token == tokenizer.unk_token), and remove the "# nosec"
comment; locate the check around the references to tokenizer.pad_token,
tokenizer.eos_token and tokenizer.unk_token and replace the inverted condition
accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@modelopt/onnx/llm_export_utils/quantization_utils.py`:
- Around line 132-135: The current guard in quantization_utils.py incorrectly
overwrites any non-"<unk>" pad token and includes a banned "# nosec"
suppression; change the logic to only set tokenizer.pad_token to
tokenizer.eos_token when the pad token is unset or equals the tokenizer's unk
token (i.e., if tokenizer.pad_token is None or tokenizer.pad_token ==
tokenizer.unk_token), and remove the "# nosec" comment; locate the check around
the references to tokenizer.pad_token, tokenizer.eos_token and
tokenizer.unk_token and replace the inverted condition accordingly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 39299ea2-7451-4197-83e2-c2f207765bc4

📥 Commits

Reviewing files that changed from the base of the PR and between d885663 and e17c3c9.

📒 Files selected for processing (4)
  • modelopt/onnx/llm_export_utils/__init__.py
  • modelopt/onnx/llm_export_utils/export_utils.py
  • modelopt/onnx/llm_export_utils/quantization_utils.py
  • modelopt/onnx/llm_export_utils/surgeon_utils.py
✅ Files skipped from review due to trivial changes (2)
  • modelopt/onnx/llm_export_utils/init.py
  • modelopt/onnx/llm_export_utils/export_utils.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • modelopt/onnx/llm_export_utils/surgeon_utils.py

@ajrasane ajrasane self-assigned this Apr 27, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 27, 2026

Codecov Report

❌ Patch coverage is 0% with 183 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.76%. Comparing base (1ec931c) to head (e17c3c9).
⚠️ Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
modelopt/onnx/llm_export_utils/export_utils.py 0.00% 77 Missing ⚠️
modelopt/onnx/llm_export_utils/surgeon_utils.py 0.00% 53 Missing ⚠️
...delopt/onnx/llm_export_utils/quantization_utils.py 0.00% 51 Missing ⚠️
modelopt/onnx/llm_export_utils/__init__.py 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1356      +/-   ##
==========================================
- Coverage   75.72%   74.76%   -0.96%     
==========================================
  Files         471      475       +4     
  Lines       50375    50558     +183     
==========================================
- Hits        38146    37801     -345     
- Misses      12229    12757     +528     
Flag Coverage Δ
examples 40.14% <0.00%> (-0.49%) ⬇️
gpu 58.19% <0.00%> (-0.71%) ⬇️
unit 52.55% <0.00%> (-0.21%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cherry-pick-0.44.0 After code freeze, cherry-pick to release branch for next rc (bulk update). Only for bug fixes / doc

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant