[6106576] Restore llm_export_utils as deprecated shim for edgellm 0.6.1 compat#1356
[6106576] Restore llm_export_utils as deprecated shim for edgellm 0.6.1 compat#1356
Conversation
📝 WalkthroughWalkthroughIntroduces a new Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 6✅ Passed checks (6 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
…mpat PR #1210 (b3feebf) removed the modelopt.onnx.llm_export_utils package in 0.44.0rc1, pointing users at TensorRT-Edge-LLM as the migration target. However, TensorRT-Edge-LLM 0.6.1 itself imports modelopt.onnx.llm_export_utils.surgeon_utils.fold_fp8_qdq_to_dq at module load time, so every tensorrt-edgellm-* CLI fails immediately with ModuleNotFoundError - even tensorrt-edgellm-quantize-llm --help. The "unused" framing in the original removal commit only held inside this repo; the public API surface had an external consumer. Restore the four original submodules verbatim under modelopt/onnx/llm_export_utils/ and emit a DeprecationWarning on package import directing users to modelopt.onnx.export, modelopt.onnx.graph_surgery, or TensorRT-Edge-LLM. The new exporter / graph-surgery packages do not expose fold_fp8_qdq_to_dq, so a pure import-redirect would not have worked - the function itself has to come back. Fixes nvbug 6106576 / OMNIML-3995. Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>
2fcdab3 to
e17c3c9
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (6)
modelopt/onnx/llm_export_utils/surgeon_utils.py (3)
79-79: Consider using logger instead of print() for consistency.Other modules in the codebase (e.g.,
modelopt/onnx/export/fp8_exporter.py) useloggerfrommodelopt.onnx.logging_configfor output. Usingprint()directly bypasses log level control and formatting.♻️ Suggested change
+from modelopt.onnx.logging_config import logger + ... - print("Replacing all (fp32 weights + fp8 QDQ) with (fp8 weights + DQ)...") + logger.info("Replacing all (fp32 weights + fp8 QDQ) with (fp8 weights + DQ)...") ... - print(f"fp8 qdq replaced with only dq completed in {end_time - start_time}s.") + logger.info(f"fp8 qdq replaced with only dq completed in {end_time - start_time}s.")Also applies to: 118-118
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/onnx/llm_export_utils/surgeon_utils.py` at line 79, Replace the direct print() calls in surgeon_utils.py (the "Replacing all (fp32 weights + fp8 QDQ) with (fp8 weights + DQ)..." message at the print on line ~79 and the similar call at ~118) with the project logger: import logger from modelopt.onnx.logging_config and emit the message via logger.info (or logger.debug if more appropriate) so output respects log levels and formatting; update any surrounding code to remove print and use logger consistently.
95-97: Assert used for runtime validation may be stripped in optimized mode.Using
assertfor validating runtime conditions can be problematic since assertions are removed when Python runs with optimization flags (-O). Consider using a proper exception for this QDQ pair validation.♻️ Suggested fix
- assert dq_op.op == "TRT_FP8DequantizeLinear", ( - f"QDQ does not occur in pairs. You reached {dq_op.op}" - ) + if dq_op.op != "TRT_FP8DequantizeLinear": + raise ValueError(f"QDQ does not occur in pairs. You reached {dq_op.op}")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/onnx/llm_export_utils/surgeon_utils.py` around lines 95 - 97, Replace the runtime assert used for QDQ pair validation in surgeon_utils.py with an explicit exception so the check isn't removed under Python -O; specifically, where the code asserts dq_op.op == "TRT_FP8DequantizeLinear" (in the QDQ pairing logic inside the function that iterates/validates dq_op), raise a ValueError or RuntimeError with the same descriptive message instead of using assert, preserving the original message text and context.
53-56: Consider using a more specific exception type.Using a bare
Exceptionis less precise than a more specific exception likeValueError. This also makes it harder for callers to catch only this specific error.♻️ Suggested improvement
match = re.search(r"layers\.(\d+)", name) if match: return int(match.group(1)) - raise Exception(f"{name} does not contain layer info!") + raise ValueError(f"{name} does not contain layer info!")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/onnx/llm_export_utils/surgeon_utils.py` around lines 53 - 56, Replace the generic raise Exception in the layer-extraction logic with a more specific exception type (e.g., raise ValueError) so callers can catch this specific failure; update the code around the re.search(r"layers\.(\d+)", name) check (the block that returns int(match.group(1)) on success) to raise ValueError(f"{name} does not contain layer info!") (or similar) instead of Exception.modelopt/onnx/llm_export_utils/export_utils.py (2)
143-145: Consider using logger instead of print() for consistency.Other modules in the ONNX export codebase use
loggerfrommodelopt.onnx.logging_config. Usingprint()bypasses log level control.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/onnx/llm_export_utils/export_utils.py` around lines 143 - 145, Replace the print() call in export_utils.py with the module logger: import the logger from modelopt.onnx.logging_config if not already imported, then call logger.info(...) instead of print(...) to emit the message about Native ONNX export completion; use the existing variables start_time, end_time, and output_dir in the log message (e.g., include the elapsed time end_time - start_time and the output_dir) so log level control and formatting are respected.
72-75: Bareexcept Exceptionsilently swallows errors.The broad exception handler catches all exceptions when accessing
model.model, which could mask legitimate errors (e.g.,AttributeErrorfrom a different cause). Consider catching onlyAttributeErrorexplicitly.♻️ Suggested fix
try: self.model = model.model - except Exception: + except AttributeError: self.model = model🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/onnx/llm_export_utils/export_utils.py` around lines 72 - 75, The current broad try/except around "self.model = model.model" in export_utils.py swallows all exceptions; change it to catch only AttributeError so real errors aren't masked: replace the bare "except Exception" with "except AttributeError" (or use getattr with a default) to fall back to "self.model = model" only when the model attribute is genuinely missing, keeping the fallback behavior intact.modelopt/onnx/llm_export_utils/quantization_utils.py (1)
120-130: Assert statements used for user input validation may be stripped in optimized mode.Using
assertto validate theprecisionandlm_head_precisionarguments can cause silent failures when Python runs with-Oflag. Consider using explicitif/raise ValueErrorfor argument validation.♻️ Suggested fix
- assert precision in [ - "fp8", - "int4_awq", - "nvfp4", - ], ( - f"Only fp8(W8A8), int4_awq(W4A16), nvfp4(W4A4) is supported. You passed an unsupported precision: {precision}." - ) - - assert lm_head_precision in ["fp16"], ( - f"Only fp16(unquantized) is supported for lm_head. You passed an unsupported precision: {lm_head_precision}." - ) + if precision not in ["fp8", "int4_awq", "nvfp4"]: + raise ValueError( + f"Only fp8(W8A8), int4_awq(W4A16), nvfp4(W4A4) is supported. You passed an unsupported precision: {precision}." + ) + + if lm_head_precision not in ["fp16"]: + raise ValueError( + f"Only fp16(unquantized) is supported for lm_head. You passed an unsupported precision: {lm_head_precision}." + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/onnx/llm_export_utils/quantization_utils.py` around lines 120 - 130, Replace the assert-based validation for precision and lm_head_precision with explicit checks that raise ValueError so validation is not skipped under Python -O; specifically, where the code currently uses "assert precision in [...] , (f'...{precision}')" and "assert lm_head_precision in ['fp16'], (f'...{lm_head_precision}')" change them to conditional if statements that raise ValueError with the same descriptive messages referencing the variables precision and lm_head_precision so callers get deterministic errors in all runtime modes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@modelopt/onnx/llm_export_utils/export_utils.py`:
- Line 94: The function llm_to_onnx uses mutable default arguments
extra_inputs={} and extra_dyn_axes={} which can cause shared-state bugs; change
the signature to use None for those defaults and inside llm_to_onnx initialize
them with empty dicts (e.g., if extra_inputs is None: extra_inputs = {} and
similarly for extra_dyn_axes) so each call gets a fresh mapping while preserving
current behavior.
In `@modelopt/onnx/llm_export_utils/quantization_utils.py`:
- Around line 132-133: Remove the prohibited "# nosec B105" comment and avoid a
raw string literal comparison by replacing the literal "<unk>" with a
module-level constant or the tokenizer's own unk_token attribute; specifically,
introduce a constant (e.g., UNK_TOKEN = "<unk>") at the top of the module or use
tokenizer.unk_token if available, then change the check in quantization_utils.py
from if tokenizer.pad_token != "<unk>": to if tokenizer.pad_token != UNK_TOKEN:
(or if tokenizer.pad_token != tokenizer.unk_token:), and keep the assignment
tokenizer.pad_token = tokenizer.eos_token unchanged so Bandit no longer flags a
hardcoded string.
---
Nitpick comments:
In `@modelopt/onnx/llm_export_utils/export_utils.py`:
- Around line 143-145: Replace the print() call in export_utils.py with the
module logger: import the logger from modelopt.onnx.logging_config if not
already imported, then call logger.info(...) instead of print(...) to emit the
message about Native ONNX export completion; use the existing variables
start_time, end_time, and output_dir in the log message (e.g., include the
elapsed time end_time - start_time and the output_dir) so log level control and
formatting are respected.
- Around line 72-75: The current broad try/except around "self.model =
model.model" in export_utils.py swallows all exceptions; change it to catch only
AttributeError so real errors aren't masked: replace the bare "except Exception"
with "except AttributeError" (or use getattr with a default) to fall back to
"self.model = model" only when the model attribute is genuinely missing, keeping
the fallback behavior intact.
In `@modelopt/onnx/llm_export_utils/quantization_utils.py`:
- Around line 120-130: Replace the assert-based validation for precision and
lm_head_precision with explicit checks that raise ValueError so validation is
not skipped under Python -O; specifically, where the code currently uses "assert
precision in [...] , (f'...{precision}')" and "assert lm_head_precision in
['fp16'], (f'...{lm_head_precision}')" change them to conditional if statements
that raise ValueError with the same descriptive messages referencing the
variables precision and lm_head_precision so callers get deterministic errors in
all runtime modes.
In `@modelopt/onnx/llm_export_utils/surgeon_utils.py`:
- Line 79: Replace the direct print() calls in surgeon_utils.py (the "Replacing
all (fp32 weights + fp8 QDQ) with (fp8 weights + DQ)..." message at the print on
line ~79 and the similar call at ~118) with the project logger: import logger
from modelopt.onnx.logging_config and emit the message via logger.info (or
logger.debug if more appropriate) so output respects log levels and formatting;
update any surrounding code to remove print and use logger consistently.
- Around line 95-97: Replace the runtime assert used for QDQ pair validation in
surgeon_utils.py with an explicit exception so the check isn't removed under
Python -O; specifically, where the code asserts dq_op.op ==
"TRT_FP8DequantizeLinear" (in the QDQ pairing logic inside the function that
iterates/validates dq_op), raise a ValueError or RuntimeError with the same
descriptive message instead of using assert, preserving the original message
text and context.
- Around line 53-56: Replace the generic raise Exception in the layer-extraction
logic with a more specific exception type (e.g., raise ValueError) so callers
can catch this specific failure; update the code around the
re.search(r"layers\.(\d+)", name) check (the block that returns
int(match.group(1)) on success) to raise ValueError(f"{name} does not contain
layer info!") (or similar) instead of Exception.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: b74e375a-d03d-4e6c-bb6a-d671b4cf1562
📒 Files selected for processing (5)
CHANGELOG.rstmodelopt/onnx/llm_export_utils/__init__.pymodelopt/onnx/llm_export_utils/export_utils.pymodelopt/onnx/llm_export_utils/quantization_utils.pymodelopt/onnx/llm_export_utils/surgeon_utils.py
| return logits, past_key_values | ||
|
|
||
|
|
||
| def llm_to_onnx(model, output_dir, extra_inputs={}, extra_dyn_axes={}): |
There was a problem hiding this comment.
Mutable default argument is a Python antipattern.
Using {} as a default value for extra_inputs and extra_dyn_axes can lead to subtle bugs since the same dict object is shared across all calls. Use None and initialize inside the function.
🐛 Suggested fix
-def llm_to_onnx(model, output_dir, extra_inputs={}, extra_dyn_axes={}):
+def llm_to_onnx(model, output_dir, extra_inputs=None, extra_dyn_axes=None):
"""Export the WrapperModelForCausalLM to ONNX with fixed I/O names and shape definitions and save to `output_dir`.
Parameters:
model: torch.Module
output_dir: str, the output_dir of the original ONNX.
extra_inputs: dict, append additional inputs after kv_cache. Usually for VL models
extra_dyn_axes: dict. Usually for VL models
"""
+ if extra_inputs is None:
+ extra_inputs = {}
+ if extra_dyn_axes is None:
+ extra_dyn_axes = {}
start_time = time.time()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modelopt/onnx/llm_export_utils/export_utils.py` at line 94, The function
llm_to_onnx uses mutable default arguments extra_inputs={} and extra_dyn_axes={}
which can cause shared-state bugs; change the signature to use None for those
defaults and inside llm_to_onnx initialize them with empty dicts (e.g., if
extra_inputs is None: extra_inputs = {} and similarly for extra_dyn_axes) so
each call gets a fresh mapping while preserving current behavior.
|
There was a problem hiding this comment.
♻️ Duplicate comments (1)
modelopt/onnx/llm_export_utils/quantization_utils.py (1)
132-135:⚠️ Potential issue | 🟠 MajorFix the pad-token guard and remove the banned
# nosec.The condition is inverted: it currently rewrites every non-
<unk>pad token toeos_token, which can silently change tokenizer behavior during calibration. Please only fall back whenpad_tokenis unset or matches the tokenizer’s unk token, and drop the suppression.Proposed fix
- if tokenizer.pad_token != "<unk>": # nosec B105 - tokenizer.pad_token = tokenizer.eos_token - if tokenizer.pad_token is None: + if tokenizer.pad_token in (None, tokenizer.unk_token): tokenizer.pad_token = tokenizer.eos_tokenAs per coding guidelines: "Bandit security checks are enforced via pre-commit hooks.
# noseccomments are not allowed as bypasses for security checks."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/onnx/llm_export_utils/quantization_utils.py` around lines 132 - 135, The current guard in quantization_utils.py incorrectly overwrites any non-"<unk>" pad token and includes a banned "# nosec" suppression; change the logic to only set tokenizer.pad_token to tokenizer.eos_token when the pad token is unset or equals the tokenizer's unk token (i.e., if tokenizer.pad_token is None or tokenizer.pad_token == tokenizer.unk_token), and remove the "# nosec" comment; locate the check around the references to tokenizer.pad_token, tokenizer.eos_token and tokenizer.unk_token and replace the inverted condition accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@modelopt/onnx/llm_export_utils/quantization_utils.py`:
- Around line 132-135: The current guard in quantization_utils.py incorrectly
overwrites any non-"<unk>" pad token and includes a banned "# nosec"
suppression; change the logic to only set tokenizer.pad_token to
tokenizer.eos_token when the pad token is unset or equals the tokenizer's unk
token (i.e., if tokenizer.pad_token is None or tokenizer.pad_token ==
tokenizer.unk_token), and remove the "# nosec" comment; locate the check around
the references to tokenizer.pad_token, tokenizer.eos_token and
tokenizer.unk_token and replace the inverted condition accordingly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 39299ea2-7451-4197-83e2-c2f207765bc4
📒 Files selected for processing (4)
modelopt/onnx/llm_export_utils/__init__.pymodelopt/onnx/llm_export_utils/export_utils.pymodelopt/onnx/llm_export_utils/quantization_utils.pymodelopt/onnx/llm_export_utils/surgeon_utils.py
✅ Files skipped from review due to trivial changes (2)
- modelopt/onnx/llm_export_utils/init.py
- modelopt/onnx/llm_export_utils/export_utils.py
🚧 Files skipped from review as they are similar to previous changes (1)
- modelopt/onnx/llm_export_utils/surgeon_utils.py
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #1356 +/- ##
==========================================
- Coverage 75.72% 74.76% -0.96%
==========================================
Files 471 475 +4
Lines 50375 50558 +183
==========================================
- Hits 38146 37801 -345
- Misses 12229 12757 +528
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
What does this PR do?
Type of change: Bug fix (backward-compat restoration)
#1210 (b3feebf, "Replace in-repo LLM ONNX export with TensorRT-Edge-LLM") removed
modelopt/onnx/llm_export_utils/from 0.44.0rc1 and pointed users at TensorRT-Edge-LLM as the migration target. The PR description openly flagged the change as not backward compatible.The catch: TensorRT-Edge-LLM 0.6.1 itself imports the deleted symbol:
Because that import is at module load time, every
tensorrt-edgellm-*CLI — includingtensorrt-edgellm-quantize-llm --help— fails immediately withModuleNotFoundError. QA reports a 5/5 (100%) failure rate ontests/test_onnx_ptq/test_onnx_ptq_edge_llm.py, reproducible without a GPU. The "unused" framing in the original removal commit (d89138a6) only held inside this repo; the public API surface had an external consumer.What this PR does
modelopt/onnx/llm_export_utils/verbatim fromd89138a6^:__init__.pysurgeon_utils.py(contains the missingfold_fp8_qdq_to_dq, plusclear_inputs,clear_outputs,extract_layer_id,no_none_elements)export_utils.py(ModelLoader,WrapperModelForCausalLM,RopeType,llm_to_onnx,torch_to_onnx)quantization_utils.py(get_quant_config,quantize)__init__.pyemits aDeprecationWarningon import directing users tomodelopt.onnx.export,modelopt.onnx.graph_surgery, or TensorRT-Edge-LLM.The new
modelopt.onnx.exportandmodelopt.onnx.graph_surgerypackages do not exposefold_fp8_qdq_to_dq(verified bygrep), so a pure import-redirect shim wouldn't have worked — the function itself has to come back.Why a shim instead of fixing edgellm
We should still ship
tensorrt-edgellm0.6.2 that inlines the helper and drops the import, but every existing 0.6.1 install (and the version pin in its wheel —nvidia-modelopt[torch,onnx]==0.39.0) is broken in the meantime, and pip doesn't downgrade modelopt when 0.44.0rc1 is already installed. The shim unblocks them on the modelopt side immediately.Usage
No new usage surface. External consumers continue to import the same paths as before; they get a
DeprecationWarningpointing them at the successor APIs.Testing
Verified the failing import succeeds and the warning fires:
```bash
$ python -W default::DeprecationWarning -c \
"from modelopt.onnx.llm_export_utils.surgeon_utils import fold_fp8_qdq_to_dq; print('OK')"
:1: DeprecationWarning: modelopt.onnx.llm_export_utils is deprecated and will be removed in a future release. Use modelopt.onnx.export and modelopt.onnx.graph_surgery, or migrate to TensorRT-Edge-LLM (https://github.com/NVIDIA/TensorRT-Edge-LLM).
OK
```
The four submodules `import` cleanly (`surgeon_utils`, `export_utils`, `quantization_utils`). All pre-commit hooks pass (ruff, mypy, license headers, bandit, RST formatting).
Before your PR is "Ready for review"
Additional Information