Fix sparsity-only export emitting empty hf_quant_config.json#1375
Fix sparsity-only export emitting empty hf_quant_config.json#1375
Conversation
Signed-off-by: Kai Xu <kaix@nvidia.com>
📝 WalkthroughWalkthroughMultiple import statements are reordered across example files. Additionally, quantization export logic is modified to conditionally treat checkpoints as quantized only when metadata includes Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes 🚥 Pre-merge checks | ✅ 6✅ Passed checks (6 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Review rate limit: 9/10 reviews remaining, refill in 6 minutes. Comment |
|
There was a problem hiding this comment.
🧹 Nitpick comments (1)
modelopt/torch/export/unified_export_hf.py (1)
1192-1210: ⚡ Quick winPlease add a regression test for both export branches.
Add one test for sparsity-only export (assert no
hf_quant_config.json/ noquantization_config) and one for true quantized export (assert both are present).As per coding guidelines: “Testing: ... add/adjust tests if the PR touches export output correctness.”
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/export/unified_export_hf.py` around lines 1192 - 1210, Add two regression tests covering both branches in unified_export_hf.py: one for sparsity-only export and one for true quantized export. For sparsity-only, call the export function (the code path computing is_quantized_export using quantization_details from hf_quant_config) with an hf_quant_config where "quantization": {"quant_algo": None, "kv_cache_quant_algo": None} and assert that no hf_quant_config.json file is written in export_dir and that the exported metadata/returned hf_quant_config does not include a quantization_config. For the quantized case, provide an hf_quant_config where "quantization": {"quant_algo": "<some_algo>"} and assert hf_quant_config.json exists in export_dir and that the export invoked convert_hf_quant_config_format (and the exported metadata contains the expected quantization_config). Name tests to reflect branches (e.g., test_export_sparsity_only_no_quant_config and test_export_true_quantized_writes_quant_config) and clean up temporary export_dir after assertions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@modelopt/torch/export/unified_export_hf.py`:
- Around line 1192-1210: Add two regression tests covering both branches in
unified_export_hf.py: one for sparsity-only export and one for true quantized
export. For sparsity-only, call the export function (the code path computing
is_quantized_export using quantization_details from hf_quant_config) with an
hf_quant_config where "quantization": {"quant_algo": None,
"kv_cache_quant_algo": None} and assert that no hf_quant_config.json file is
written in export_dir and that the exported metadata/returned hf_quant_config
does not include a quantization_config. For the quantized case, provide an
hf_quant_config where "quantization": {"quant_algo": "<some_algo>"} and assert
hf_quant_config.json exists in export_dir and that the export invoked
convert_hf_quant_config_format (and the exported metadata contains the expected
quantization_config). Name tests to reflect branches (e.g.,
test_export_sparsity_only_no_quant_config and
test_export_true_quantized_writes_quant_config) and clean up temporary
export_dir after assertions.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 0fbcaa6b-538a-4d83-b54f-b53ae8af9034
📒 Files selected for processing (5)
examples/vllm_serve/fakequant_worker.pyexamples/vllm_serve/vllm_ptq_utils.pyexamples/vllm_serve/vllm_reload_utils.pyexamples/vllm_serve/vllm_serve_fakequant.pymodelopt/torch/export/unified_export_hf.py
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1375 +/- ##
==========================================
+ Coverage 76.48% 76.97% +0.48%
==========================================
Files 471 471
Lines 50487 50490 +3
==========================================
+ Hits 38617 38863 +246
+ Misses 11870 11627 -243
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
shengliangxu
left a comment
There was a problem hiding this comment.
LGTM, by the way do you know what codebase/system is still reading hf_quant_config.json?
What does this PR do?
Type of change: ?
Bug fix. Fix sparsity-only export writing
hf_quant_config.jsonwith nullquant_algo.Testing
Before your PR is "Ready for review"
Make sure you read and follow Contributor guidelines and your commits are signed (
git commit -s -S).Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded
trust_remote_code=True,torch.load(..., weights_only=False),pickle, etc.).CONTRIBUTING.md: ✅ / ❌ / N/AAdditional Information
Summary by CodeRabbit
Bug Fixes
Style