Skip to content

[NVBug 6102977] Disable use_cache during export dummy forward for custom configs#1402

Merged
meenchen merged 1 commit intomainfrom
weimingc/nvbug_6102977_v2
May 6, 2026
Merged

[NVBug 6102977] Disable use_cache during export dummy forward for custom configs#1402
meenchen merged 1 commit intomainfrom
weimingc/nvbug_6102977_v2

Conversation

@meenchen
Copy link
Copy Markdown
Contributor

@meenchen meenchen commented May 6, 2026

The export step's collect_shared_input_modules() runs a dummy forward pass to trace shared input modules for resmoothing/requantization. This forward call hits the same AttributeError as PR #1324 on configs that don't assign use_cache (e.g., stepfun-ai/Step-3.5-Flash's Step3p5Config):

AttributeError: 'Step3p5Config' object has no attribute 'use_cache'

Wrap the dummy forward with the _disable_use_cache context manager so it sets/restores config.use_cache around the call, mirroring the calibration fix from #1324. Covers all callsites since both unified_export_hf and plugins/vllm_fakequant_hf funnel through collect_shared_input_modules.

What does this PR do?

Type of change: ?

Usage

# Add a code snippet demonstrating how to use this

Testing

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

  • Is this change backward compatible?: ✅ / ❌ / N/A
  • If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: ✅ / ❌ / N/A
  • Did you write any new necessary tests?: ✅ / ❌ / N/A
  • Did you update Changelog?: ✅ / ❌ / N/A

Additional Information

Summary by CodeRabbit

  • Bug Fixes
    • Improved model export process by optimizing cache handling during profiling operations.

The export step's collect_shared_input_modules() runs a dummy forward pass to
trace shared input modules for resmoothing/requantization. This forward call
hits the same AttributeError as PR #1324 on configs that don't assign
use_cache (e.g., stepfun-ai/Step-3.5-Flash's Step3p5Config):

  AttributeError: 'Step3p5Config' object has no attribute 'use_cache'

Wrap the dummy forward with the _disable_use_cache context manager so it
sets/restores config.use_cache around the call, mirroring the calibration
fix from #1324. Covers all callsites since both unified_export_hf and
plugins/vllm_fakequant_hf funnel through collect_shared_input_modules.

Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>
@meenchen meenchen requested a review from a team as a code owner May 6, 2026 21:58
@meenchen meenchen requested a review from jingyu-ml May 6, 2026 21:58
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 6, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 614b1991-bc52-417f-9687-8a50c74c8c32

📥 Commits

Reviewing files that changed from the base of the PR and between 1c2085f and 52d7a1e.

📒 Files selected for processing (1)
  • modelopt/torch/export/unified_export_hf.py

📝 Walkthrough

Walkthrough

A single import is added for a cache-disabling utility, and the dummy forward probe in collect_shared_input_modules is wrapped with an additional context manager to disable dataset use_cache during module input collection.

Changes

Cache Disabling During Input Module Collection Probe

Layer / File(s) Summary
Utility Import
modelopt/torch/export/unified_export_hf.py
_disable_use_cache is imported from modelopt.torch.utils.dataset_utils.
Probe Context Wrapping
modelopt/torch/export/unified_export_hf.py
Dummy forward probe is nested within _disable_use_cache(model) context manager to prevent cache use during input module collection.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

🚥 Pre-merge checks | ✅ 6
✅ Passed checks (6 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: disabling use_cache during export dummy forward for custom configs, which directly addresses the AttributeError fix mentioned in the PR objectives.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns ✅ Passed No security anti-patterns found. PR adds safe context manager usage with no torch.load, numpy.load, trust_remote_code, eval, exec, nosec, or new dependencies.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch weimingc/nvbug_6102977_v2

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

PR Preview Action v1.8.1
Preview removed because the pull request was closed.
2026-05-06 23:13 UTC

@codecov
Copy link
Copy Markdown

codecov Bot commented May 6, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.82%. Comparing base (097293b) to head (52d7a1e).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1402      +/-   ##
==========================================
+ Coverage   76.73%   76.82%   +0.09%     
==========================================
  Files         476      476              
  Lines       51306    51307       +1     
==========================================
+ Hits        39368    39419      +51     
+ Misses      11938    11888      -50     
Flag Coverage Δ
examples 41.81% <100.00%> (+2.63%) ⬆️
gpu 59.84% <100.00%> (-0.58%) ⬇️
regression 15.20% <100.00%> (+0.07%) ⬆️
unit 52.54% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@meenchen meenchen requested a review from cjluo-nv May 6, 2026 22:40
Copy link
Copy Markdown
Collaborator

@cjluo-nv cjluo-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bot review — DM the bot to share feedback.

Tiny, well-targeted bug fix (+5 -1). Reuses the existing _disable_use_cache context manager from dataset_utils to wrap the dummy forward pass in collect_shared_input_modules, mirroring the calibration fix from PR #1324. The context manager correctly handles configs that lack use_cache (sets it to False, restores/deletes on exit). No new code to speak of — just adding one more context manager to an existing with block.

@meenchen meenchen merged commit 579fc6c into main May 6, 2026
49 checks passed
@meenchen meenchen deleted the weimingc/nvbug_6102977_v2 branch May 6, 2026 23:12
@meenchen meenchen self-assigned this May 6, 2026
@meenchen meenchen added the cherry-pick-0.44.0 After code freeze, cherry-pick to release branch for next rc (bulk update). Only for bug fixes / doc label May 6, 2026
Copy link
Copy Markdown
Collaborator

@cjluo-nv cjluo-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bot review — DM the bot to share feedback.

Tiny, correct bug fix (+5 -1). Reuses the existing _disable_use_cache context manager to wrap the dummy forward in collect_shared_input_modules, matching the established pattern from the calibration fix in PR #1324. Handles configs lacking use_cache correctly. No new code, no design concerns.

Copy link
Copy Markdown
Collaborator

@cjluo-nv cjluo-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bot review — DM the bot to share feedback.

Tiny, targeted bug fix (+5 -1). Reuses the existing _disable_use_cache context manager from modelopt.torch.utils.dataset_utils to wrap the dummy forward pass in collect_shared_input_modules, mirroring the calibration fix from PR #1324. The context manager correctly handles configs that lack use_cache (e.g., Step3p5Config) by setting it to False and cleaning up on exit. Import is at the top of the file. No design or correctness concerns.

Copy link
Copy Markdown
Collaborator

@cjluo-nv cjluo-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bot review — DM the bot to share feedback.

Tiny, well-targeted bug fix (+5 -1). Reuses the existing _disable_use_cache context manager from modelopt.torch.utils.dataset_utils to wrap the dummy forward pass in collect_shared_input_modules, mirroring the calibration fix from PR #1324. The context manager correctly handles configs that lack use_cache (e.g., Step3p5Config) by setting it to False and cleaning up on exit (deletes the attr if it wasn't there, restores it otherwise). Import is at the top of the file. Fix covers both unified_export_hf and plugins/vllm_fakequant_hf since both funnel through collect_shared_input_modules. No correctness or design concerns.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cherry-pick-0.44.0 After code freeze, cherry-pick to release branch for next rc (bulk update). Only for bug fixes / doc

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants