Fix sparsity-only export emitting empty hf_quant_config.json by kaix-nv · Pull Request #1375 · NVIDIA/Model-Optimizer

kaix-nv · 2026-04-29T22:28:46Z

What does this PR do?

Type of change: ?

Bug fix. Fix sparsity-only export writing hf_quant_config.json with null quant_algo.

Testing

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

Is this change backward compatible?: ✅ / ❌ / N/A
If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: ✅ / ❌ / N/A
Did you write any new necessary tests?: ✅ / ❌ / N/A
Did you update Changelog?: ✅ / ❌ / N/A

Additional Information

Summary by CodeRabbit

Bug Fixes
- Improved quantization metadata handling in model export to correctly identify quantized checkpoints based on algorithm presence.
Style
- Reorganized imports across example files for consistency.

Signed-off-by: Kai Xu <kaix@nvidia.com>

coderabbitai · 2026-04-29T22:29:00Z

📝 Walkthrough

Walkthrough

Multiple import statements are reordered across example files. Additionally, quantization export logic is modified to conditionally treat checkpoints as quantized only when metadata includes quant_algo or kv_cache_quant_algo fields, suppressing hf_quant_config.json generation when neither field is present.

Changes

Cohort / File(s)	Summary
vLLM Examples Import Reordering `examples/vllm_serve/fakequant_worker.py`, `examples/vllm_serve/vllm_ptq_utils.py`, `examples/vllm_serve/vllm_reload_utils.py`, `examples/vllm_serve/vllm_serve_fakequant.py`	Import statements reordered within each file with no functional logic changes.
Quantization Export Logic `modelopt/torch/export/unified_export_hf.py`	Conditional quantization metadata handling: checkpoint treated as quantized only when `quant_algo` or `kv_cache_quant_algo` present in metadata; `hf_quant_config.json` generation and `quantization_config` in `config.json` suppressed when neither field is set.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 6

✅ Passed checks (6 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly and specifically describes the main bug fix: preventing empty hf_quant_config.json from being emitted during sparsity-only export, which aligns with the primary change in unified_export_hf.py.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns	✅ Passed	Pull request security analysis: torch.load uses weights_only=True, trust_remote_code is configurable (defaults to False), no eval/exec on external input, no nosec comments, no restrictive license dependencies.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch kaix/sa_export_fix

_{Review rate limit: 9/10 reviews remaining, refill in 6 minutes.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-04-29T22:32:05Z

PR Preview Action v1.8.1
🚀 View preview at https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1375/
Built to branch `gh-pages` at 2026-04-29 22:31 UTC. Preview will be ready when the GitHub Pages deployment is complete.

coderabbitai

🧹 Nitpick comments (1)

modelopt/torch/export/unified_export_hf.py (1)

1192-1210: ⚡ Quick win

Please add a regression test for both export branches.

Add one test for sparsity-only export (assert no hf_quant_config.json / no quantization_config) and one for true quantized export (assert both are present).

As per coding guidelines: “Testing: ... add/adjust tests if the PR touches export output correctness.”

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/export/unified_export_hf.py` around lines 1192 - 1210, Add two
regression tests covering both branches in unified_export_hf.py: one for
sparsity-only export and one for true quantized export. For sparsity-only, call
the export function (the code path computing is_quantized_export using
quantization_details from hf_quant_config) with an hf_quant_config where
"quantization": {"quant_algo": None, "kv_cache_quant_algo": None} and assert
that no hf_quant_config.json file is written in export_dir and that the exported
metadata/returned hf_quant_config does not include a quantization_config. For
the quantized case, provide an hf_quant_config where "quantization":
{"quant_algo": "<some_algo>"} and assert hf_quant_config.json exists in
export_dir and that the export invoked convert_hf_quant_config_format (and the
exported metadata contains the expected quantization_config). Name tests to
reflect branches (e.g., test_export_sparsity_only_no_quant_config and
test_export_true_quantized_writes_quant_config) and clean up temporary
export_dir after assertions.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@modelopt/torch/export/unified_export_hf.py`:
- Around line 1192-1210: Add two regression tests covering both branches in
unified_export_hf.py: one for sparsity-only export and one for true quantized
export. For sparsity-only, call the export function (the code path computing
is_quantized_export using quantization_details from hf_quant_config) with an
hf_quant_config where "quantization": {"quant_algo": None,
"kv_cache_quant_algo": None} and assert that no hf_quant_config.json file is
written in export_dir and that the exported metadata/returned hf_quant_config
does not include a quantization_config. For the quantized case, provide an
hf_quant_config where "quantization": {"quant_algo": "<some_algo>"} and assert
hf_quant_config.json exists in export_dir and that the export invoked
convert_hf_quant_config_format (and the exported metadata contains the expected
quantization_config). Name tests to reflect branches (e.g.,
test_export_sparsity_only_no_quant_config and
test_export_true_quantized_writes_quant_config) and clean up temporary
export_dir after assertions.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 0fbcaa6b-538a-4d83-b54f-b53ae8af9034

📥 Commits

Reviewing files that changed from the base of the PR and between c07ac21 and e0901f7.

📒 Files selected for processing (5)

examples/vllm_serve/fakequant_worker.py
examples/vllm_serve/vllm_ptq_utils.py
examples/vllm_serve/vllm_reload_utils.py
examples/vllm_serve/vllm_serve_fakequant.py
modelopt/torch/export/unified_export_hf.py

codecov · 2026-04-29T22:41:15Z

Codecov Report

❌ Patch coverage is 75.00000% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 76.97%. Comparing base (077e29a) to head (e0901f7).
⚠️ Report is 3 commits behind head on main.

Files with missing lines	Patch %	Lines
modelopt/torch/export/unified_export_hf.py	75.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1375      +/-   ##
==========================================
+ Coverage   76.48%   76.97%   +0.48%     
==========================================
  Files         471      471              
  Lines       50487    50490       +3     
==========================================
+ Hits        38617    38863     +246     
+ Misses      11870    11627     -243

Flag	Coverage Δ
examples	`41.58% <75.00%> (+10.83%)`	⬆️
gpu	`59.72% <75.00%> (-0.44%)`	⬇️
regression	`14.91% <0.00%> (+0.19%)`	⬆️
unit	`52.78% <0.00%> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

shengliangxu

LGTM, by the way do you know what codebase/system is still reading hf_quant_config.json?

Fix sparsity-only export emitting empty hf_quant_config.json

e0901f7

Signed-off-by: Kai Xu <kaix@nvidia.com>

kaix-nv requested review from a team as code owners April 29, 2026 22:28

kaix-nv requested a review from sugunav14 April 29, 2026 22:28

coderabbitai Bot reviewed Apr 29, 2026

View reviewed changes

kevalmorabia97 added the cherry-pick-0.44.0 After code freeze, cherry-pick to release branch for next rc (bulk update). Only for bug fixes / doc label Apr 30, 2026

shengliangxu approved these changes May 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix sparsity-only export emitting empty hf_quant_config.json#1375

Fix sparsity-only export emitting empty hf_quant_config.json#1375
kaix-nv wants to merge 1 commit intomainfrom
kaix/sa_export_fix

kaix-nv commented Apr 29, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 29, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Uh oh!

github-actions Bot commented Apr 29, 2026

Built to branch `gh-pages` at 2026-04-29 22:31 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

coderabbitai Bot left a comment

Uh oh!

codecov Bot commented Apr 29, 2026 •

edited

Loading

Uh oh!

shengliangxu left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

kaix-nv commented Apr 29, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

github-actions Bot commented Apr 29, 2026

Built to branch gh-pages at 2026-04-29 22:31 UTC. Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

shengliangxu left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kaix-nv commented Apr 29, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 29, 2026 •

edited

Loading

Built to branch `gh-pages` at 2026-04-29 22:31 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

codecov Bot commented Apr 29, 2026 •

edited

Loading