-
Notifications
You must be signed in to change notification settings - Fork 375
[minor] fixes for layerwise calib + MSE #1344
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -1657,34 +1657,36 @@ def layerwise_calibrate( | |
| input_getter = LayerActivationCollector(model) | ||
| input_getter._patch_all_layers(decoder_layers=transformer_layers) | ||
|
|
||
| resumed_inputs = ckpt.setup_resume(transformer_layers) if ckpt and start_layer > 0 else None | ||
| # When all layers are already done (start_layer == num_layers), skip input setup: | ||
| resumed_inputs = ( | ||
| ckpt.setup_resume(transformer_layers) if ckpt and 0 < start_layer < num_layers else None | ||
| ) | ||
|
|
||
| try: | ||
| # Bootstrap: get first layer's inputs (or use resumed inputs). | ||
| layer_inputs = input_getter.get_first_layer_inputs( | ||
| start_layer, resumed_inputs, forward_loop | ||
| ) | ||
| # Skip entirely when all layers are already calibrated (start_layer == num_layers). | ||
| if start_layer < num_layers: | ||
| layer_inputs = input_getter.get_first_layer_inputs( | ||
| start_layer, resumed_inputs, forward_loop | ||
| ) | ||
| else: | ||
| layer_inputs = None | ||
|
Comment on lines
+1660
to
+1673
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
set -euo pipefail
echo "Inspect detect_resume_point completion behavior:"
rg -n -C4 'def detect_resume_point|last \+ 1 >= total|return None' modelopt/torch/quantization/utils/layerwise_calib.py
echo
echo "Inspect _CheckpointState.from_folder start_layer assignment:"
rg -n -C6 'def from_folder|detect_resume_point|start =|return cls\(.*start_layer' modelopt/torch/quantization/utils/layerwise_calib.pyRepository: NVIDIA/Model-Optimizer Length of output: 5411 🏁 Script executed: sed -n '1640,1680p' modelopt/torch/quantization/model_calib.py | cat -nRepository: NVIDIA/Model-Optimizer Length of output: 2211 🏁 Script executed: # Double-check: is there any other code path that could set start_layer to num_layers?
rg -n 'start_layer.*=.*num_layers|start_layer.*=.*total' modelopt/torch/quantization/Repository: NVIDIA/Model-Optimizer Length of output: 340 The The code at lines 1651 and 1659 is designed to skip bootstrap when all layers are already calibrated ( The checkpoint helper should either set 🤖 Prompt for AI Agents |
||
|
|
||
| for layer_idx in range(start_layer, num_layers): | ||
| layer = transformer_layers[layer_idx] | ||
|
|
||
| def _layer_forward_loop(m, _inputs=layer_inputs): | ||
| for args, kwargs_input in _inputs: | ||
| # Reset past_key_values to prevent the KV cache from | ||
| # accumulating across multiple forward replays (e.g. | ||
| # max_calibrate then Hessian collection in GPTQ). | ||
| # The layer doesn't need stale KV data — each replay | ||
| # should start with a fresh cache. | ||
| if ( | ||
| "past_key_values" in kwargs_input | ||
| and kwargs_input["past_key_values"] is not None | ||
| ): | ||
| # Always clear past_key_values for each replay so layers | ||
| # that behave differently in decode vs prefill mode (e.g. | ||
| # NemotronH SSM/Mamba) always run in prefill mode where | ||
| # hidden_states has the full sequence length. | ||
| if "past_key_values" in kwargs_input: | ||
| kwargs_input = dict(kwargs_input) | ||
| cache = kwargs_input["past_key_values"] | ||
| if hasattr(cache, "reset"): | ||
| if cache is not None and hasattr(cache, "reset"): | ||
| cache.reset() | ||
| else: | ||
| kwargs_input["past_key_values"] = None | ||
| kwargs_input["past_key_values"] = None | ||
| m(*args, **kwargs_input) | ||
|
|
||
| with persistent_materialization(layer): | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid silent checkpoint-hash collisions from dropped algorithm fields.
The current hash input excludes all non-scalar values, so different algorithm configs can collapse to the same suffix and accidentally reuse incompatible layerwise checkpoints.
💡 Proposed fix
🤖 Prompt for AI Agents