Skip to content

Fix sparsity-only export emitting empty hf_quant_config.json#1375

Open
kaix-nv wants to merge 1 commit intomainfrom
kaix/sa_export_fix
Open

Fix sparsity-only export emitting empty hf_quant_config.json#1375
kaix-nv wants to merge 1 commit intomainfrom
kaix/sa_export_fix

Conversation

@kaix-nv
Copy link
Copy Markdown
Contributor

@kaix-nv kaix-nv commented Apr 29, 2026

What does this PR do?

Type of change: ?

Bug fix. Fix sparsity-only export writing hf_quant_config.json with null quant_algo.

Testing

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

  • Is this change backward compatible?: ✅ / ❌ / N/A
  • If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: ✅ / ❌ / N/A
  • Did you write any new necessary tests?: ✅ / ❌ / N/A
  • Did you update Changelog?: ✅ / ❌ / N/A

Additional Information

Summary by CodeRabbit

  • Bug Fixes

    • Improved quantization metadata handling in model export to correctly identify quantized checkpoints based on algorithm presence.
  • Style

    • Reorganized imports across example files for consistency.

Signed-off-by: Kai Xu <kaix@nvidia.com>
@kaix-nv kaix-nv requested review from a team as code owners April 29, 2026 22:28
@kaix-nv kaix-nv requested a review from sugunav14 April 29, 2026 22:28
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 29, 2026

📝 Walkthrough

Walkthrough

Multiple import statements are reordered across example files. Additionally, quantization export logic is modified to conditionally treat checkpoints as quantized only when metadata includes quant_algo or kv_cache_quant_algo fields, suppressing hf_quant_config.json generation when neither field is present.

Changes

Cohort / File(s) Summary
vLLM Examples Import Reordering
examples/vllm_serve/fakequant_worker.py, examples/vllm_serve/vllm_ptq_utils.py, examples/vllm_serve/vllm_reload_utils.py, examples/vllm_serve/vllm_serve_fakequant.py
Import statements reordered within each file with no functional logic changes.
Quantization Export Logic
modelopt/torch/export/unified_export_hf.py
Conditional quantization metadata handling: checkpoint treated as quantized only when quant_algo or kv_cache_quant_algo present in metadata; hf_quant_config.json generation and quantization_config in config.json suppressed when neither field is set.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 6
✅ Passed checks (6 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and specifically describes the main bug fix: preventing empty hf_quant_config.json from being emitted during sparsity-only export, which aligns with the primary change in unified_export_hf.py.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns ✅ Passed Pull request security analysis: torch.load uses weights_only=True, trust_remote_code is configurable (defaults to False), no eval/exec on external input, no nosec comments, no restrictive license dependencies.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch kaix/sa_export_fix

Review rate limit: 9/10 reviews remaining, refill in 6 minutes.

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1375/

Built to branch gh-pages at 2026-04-29 22:31 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
modelopt/torch/export/unified_export_hf.py (1)

1192-1210: ⚡ Quick win

Please add a regression test for both export branches.

Add one test for sparsity-only export (assert no hf_quant_config.json / no quantization_config) and one for true quantized export (assert both are present).

As per coding guidelines: “Testing: ... add/adjust tests if the PR touches export output correctness.”

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/export/unified_export_hf.py` around lines 1192 - 1210, Add two
regression tests covering both branches in unified_export_hf.py: one for
sparsity-only export and one for true quantized export. For sparsity-only, call
the export function (the code path computing is_quantized_export using
quantization_details from hf_quant_config) with an hf_quant_config where
"quantization": {"quant_algo": None, "kv_cache_quant_algo": None} and assert
that no hf_quant_config.json file is written in export_dir and that the exported
metadata/returned hf_quant_config does not include a quantization_config. For
the quantized case, provide an hf_quant_config where "quantization":
{"quant_algo": "<some_algo>"} and assert hf_quant_config.json exists in
export_dir and that the export invoked convert_hf_quant_config_format (and the
exported metadata contains the expected quantization_config). Name tests to
reflect branches (e.g., test_export_sparsity_only_no_quant_config and
test_export_true_quantized_writes_quant_config) and clean up temporary
export_dir after assertions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@modelopt/torch/export/unified_export_hf.py`:
- Around line 1192-1210: Add two regression tests covering both branches in
unified_export_hf.py: one for sparsity-only export and one for true quantized
export. For sparsity-only, call the export function (the code path computing
is_quantized_export using quantization_details from hf_quant_config) with an
hf_quant_config where "quantization": {"quant_algo": None,
"kv_cache_quant_algo": None} and assert that no hf_quant_config.json file is
written in export_dir and that the exported metadata/returned hf_quant_config
does not include a quantization_config. For the quantized case, provide an
hf_quant_config where "quantization": {"quant_algo": "<some_algo>"} and assert
hf_quant_config.json exists in export_dir and that the export invoked
convert_hf_quant_config_format (and the exported metadata contains the expected
quantization_config). Name tests to reflect branches (e.g.,
test_export_sparsity_only_no_quant_config and
test_export_true_quantized_writes_quant_config) and clean up temporary
export_dir after assertions.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 0fbcaa6b-538a-4d83-b54f-b53ae8af9034

📥 Commits

Reviewing files that changed from the base of the PR and between c07ac21 and e0901f7.

📒 Files selected for processing (5)
  • examples/vllm_serve/fakequant_worker.py
  • examples/vllm_serve/vllm_ptq_utils.py
  • examples/vllm_serve/vllm_reload_utils.py
  • examples/vllm_serve/vllm_serve_fakequant.py
  • modelopt/torch/export/unified_export_hf.py

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 29, 2026

Codecov Report

❌ Patch coverage is 75.00000% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 76.97%. Comparing base (077e29a) to head (e0901f7).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
modelopt/torch/export/unified_export_hf.py 75.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1375      +/-   ##
==========================================
+ Coverage   76.48%   76.97%   +0.48%     
==========================================
  Files         471      471              
  Lines       50487    50490       +3     
==========================================
+ Hits        38617    38863     +246     
+ Misses      11870    11627     -243     
Flag Coverage Δ
examples 41.58% <75.00%> (+10.83%) ⬆️
gpu 59.72% <75.00%> (-0.44%) ⬇️
regression 14.91% <0.00%> (+0.19%) ⬆️
unit 52.78% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@kevalmorabia97 kevalmorabia97 added the cherry-pick-0.44.0 After code freeze, cherry-pick to release branch for next rc (bulk update). Only for bug fixes / doc label Apr 30, 2026
Copy link
Copy Markdown
Collaborator

@shengliangxu shengliangxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, by the way do you know what codebase/system is still reading hf_quant_config.json?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cherry-pick-0.44.0 After code freeze, cherry-pick to release branch for next rc (bulk update). Only for bug fixes / doc

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants