[NVBug 6102977] Disable use_cache during export dummy forward for custom configs by meenchen · Pull Request #1402 · NVIDIA/Model-Optimizer

meenchen · 2026-05-06T21:58:44Z

The export step's collect_shared_input_modules() runs a dummy forward pass to trace shared input modules for resmoothing/requantization. This forward call hits the same AttributeError as PR #1324 on configs that don't assign use_cache (e.g., stepfun-ai/Step-3.5-Flash's Step3p5Config):

AttributeError: 'Step3p5Config' object has no attribute 'use_cache'

Wrap the dummy forward with the _disable_use_cache context manager so it sets/restores config.use_cache around the call, mirroring the calibration fix from #1324. Covers all callsites since both unified_export_hf and plugins/vllm_fakequant_hf funnel through collect_shared_input_modules.

What does this PR do?

Type of change: ?

Usage

# Add a code snippet demonstrating how to use this

Testing

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

Is this change backward compatible?: ✅ / ❌ / N/A
If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: ✅ / ❌ / N/A
Did you write any new necessary tests?: ✅ / ❌ / N/A
Did you update Changelog?: ✅ / ❌ / N/A

Additional Information

Summary by CodeRabbit

Bug Fixes
- Improved model export process by optimizing cache handling during profiling operations.

The export step's collect_shared_input_modules() runs a dummy forward pass to trace shared input modules for resmoothing/requantization. This forward call hits the same AttributeError as PR #1324 on configs that don't assign use_cache (e.g., stepfun-ai/Step-3.5-Flash's Step3p5Config): AttributeError: 'Step3p5Config' object has no attribute 'use_cache' Wrap the dummy forward with the _disable_use_cache context manager so it sets/restores config.use_cache around the call, mirroring the calibration fix from #1324. Covers all callsites since both unified_export_hf and plugins/vllm_fakequant_hf funnel through collect_shared_input_modules. Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>

coderabbitai · 2026-05-06T21:58:57Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 614b1991-bc52-417f-9687-8a50c74c8c32

📥 Commits

Reviewing files that changed from the base of the PR and between 1c2085f and 52d7a1e.

📒 Files selected for processing (1)

modelopt/torch/export/unified_export_hf.py

📝 Walkthrough

Walkthrough

A single import is added for a cache-disabling utility, and the dummy forward probe in collect_shared_input_modules is wrapped with an additional context manager to disable dataset use_cache during module input collection.

Changes

Cache Disabling During Input Module Collection Probe

Layer / File(s)	Summary
Utility Import `modelopt/torch/export/unified_export_hf.py`	`_disable_use_cache` is imported from `modelopt.torch.utils.dataset_utils`.
Probe Context Wrapping `modelopt/torch/export/unified_export_hf.py`	Dummy forward probe is nested within `_disable_use_cache(model)` context manager to prevent cache use during input module collection.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

🚥 Pre-merge checks | ✅ 6

✅ Passed checks (6 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically describes the main change: disabling use_cache during export dummy forward for custom configs, which directly addresses the AttributeError fix mentioned in the PR objectives.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns	✅ Passed	No security anti-patterns found. PR adds safe context manager usage with no torch.load, numpy.load, trust_remote_code, eval, exec, nosec, or new dependencies.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch weimingc/nvbug_6102977_v2

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-05-06T22:02:05Z

PR Preview Action v1.8.1
Preview removed because the pull request was closed.
2026-05-06 23:13 UTC

codecov · 2026-05-06T22:11:23Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.82%. Comparing base (097293b) to head (52d7a1e).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1402      +/-   ##
==========================================
+ Coverage   76.73%   76.82%   +0.09%     
==========================================
  Files         476      476              
  Lines       51306    51307       +1     
==========================================
+ Hits        39368    39419      +51     
+ Misses      11938    11888      -50

Flag	Coverage Δ
examples	`41.81% <100.00%> (+2.63%)`	⬆️
gpu	`59.84% <100.00%> (-0.58%)`	⬇️
regression	`15.20% <100.00%> (+0.07%)`	⬆️
unit	`52.54% <100.00%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

cjluo-nv

Bot review — DM the bot to share feedback.

Tiny, well-targeted bug fix (+5 -1). Reuses the existing _disable_use_cache context manager from dataset_utils to wrap the dummy forward pass in collect_shared_input_modules, mirroring the calibration fix from PR #1324. The context manager correctly handles configs that lack use_cache (sets it to False, restores/deletes on exit). No new code to speak of — just adding one more context manager to an existing with block.

cjluo-nv

Bot review — DM the bot to share feedback.

Tiny, correct bug fix (+5 -1). Reuses the existing _disable_use_cache context manager to wrap the dummy forward in collect_shared_input_modules, matching the established pattern from the calibration fix in PR #1324. Handles configs lacking use_cache correctly. No new code, no design concerns.

cjluo-nv

Bot review — DM the bot to share feedback.

Tiny, targeted bug fix (+5 -1). Reuses the existing _disable_use_cache context manager from modelopt.torch.utils.dataset_utils to wrap the dummy forward pass in collect_shared_input_modules, mirroring the calibration fix from PR #1324. The context manager correctly handles configs that lack use_cache (e.g., Step3p5Config) by setting it to False and cleaning up on exit. Import is at the top of the file. No design or correctness concerns.

cjluo-nv

Bot review — DM the bot to share feedback.

Tiny, well-targeted bug fix (+5 -1). Reuses the existing _disable_use_cache context manager from modelopt.torch.utils.dataset_utils to wrap the dummy forward pass in collect_shared_input_modules, mirroring the calibration fix from PR #1324. The context manager correctly handles configs that lack use_cache (e.g., Step3p5Config) by setting it to False and cleaning up on exit (deletes the attr if it wasn't there, restores it otherwise). Import is at the top of the file. Fix covers both unified_export_hf and plugins/vllm_fakequant_hf since both funnel through collect_shared_input_modules. No correctness or design concerns.

meenchen requested a review from a team as a code owner May 6, 2026 21:58

meenchen requested a review from jingyu-ml May 6, 2026 21:58

meenchen requested a review from cjluo-nv May 6, 2026 22:40

cjluo-nv approved these changes May 6, 2026

View reviewed changes

meenchen merged commit 579fc6c into main May 6, 2026
49 checks passed

meenchen deleted the weimingc/nvbug_6102977_v2 branch May 6, 2026 23:12

meenchen self-assigned this May 6, 2026

meenchen added the cherry-pick-0.44.0 After code freeze, cherry-pick to release branch for next rc (bulk update). Only for bug fixes / doc label May 6, 2026

kevalmorabia97 mentioned this pull request May 7, 2026

[CI] Update PR Template with claude review details #1404

Merged

cjluo-nv approved these changes May 7, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NVBug 6102977] Disable use_cache during export dummy forward for custom configs#1402

[NVBug 6102977] Disable use_cache during export dummy forward for custom configs#1402
meenchen merged 1 commit intomainfrom
weimingc/nvbug_6102977_v2

meenchen commented May 6, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 6, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Uh oh!

github-actions Bot commented May 6, 2026 •

edited

Loading

Uh oh!

codecov Bot commented May 6, 2026 •

edited

Loading

Uh oh!

cjluo-nv left a comment

Uh oh!

Uh oh!

cjluo-nv left a comment

Uh oh!

cjluo-nv left a comment

Uh oh!

cjluo-nv left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

meenchen commented May 6, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

github-actions Bot commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

cjluo-nv left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cjluo-nv left a comment

Choose a reason for hiding this comment

Uh oh!

cjluo-nv left a comment

Choose a reason for hiding this comment

Uh oh!

cjluo-nv left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

meenchen commented May 6, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 6, 2026 •

edited

Loading

github-actions Bot commented May 6, 2026 •

edited

Loading

codecov Bot commented May 6, 2026 •

edited

Loading