Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion examples/llm_ptq/example_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -862,7 +862,13 @@ def resolve_checkpoint_dir(quant_cfg: dict, model_path: str) -> dict:
else:
name = Path(name).name

config_hash = hashlib.sha256(json.dumps(quant_cfg, default=str).encode()).hexdigest()[:8]
# Hash only the algorithm dict (scalar fields, fully JSON-serializable and deterministic).
alg_for_hash = {
k: v
for k, v in sorted(algorithm.items())
if k != "layerwise_checkpoint_dir" and isinstance(v, (str, int, float, bool, type(None)))
}
config_hash = hashlib.sha256(json.dumps(alg_for_hash, sort_keys=True).encode()).hexdigest()[:8]
Comment on lines +865 to +871
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid silent checkpoint-hash collisions from dropped algorithm fields.

The current hash input excludes all non-scalar values, so different algorithm configs can collapse to the same suffix and accidentally reuse incompatible layerwise checkpoints.

💡 Proposed fix
-    # Hash only the algorithm dict (scalar fields, fully JSON-serializable and deterministic).
-    alg_for_hash = {
-        k: v
-        for k, v in sorted(algorithm.items())
-        if k != "layerwise_checkpoint_dir" and isinstance(v, (str, int, float, bool, type(None)))
-    }
-    config_hash = hashlib.sha256(json.dumps(alg_for_hash, sort_keys=True).encode()).hexdigest()[:8]
+    def _stable_jsonable(obj):
+        if isinstance(obj, dict):
+            return {k: _stable_jsonable(v) for k, v in sorted(obj.items())}
+        if isinstance(obj, (list, tuple)):
+            return [_stable_jsonable(v) for v in obj]
+        if isinstance(obj, (str, int, float, bool, type(None))):
+            return obj
+        raise TypeError(
+            f"Unsupported algorithm field type for checkpoint hash: {type(obj).__name__}"
+        )
+
+    alg_for_hash = _stable_jsonable(
+        {k: v for k, v in algorithm.items() if k != "layerwise_checkpoint_dir"}
+    )
+    config_hash = hashlib.sha256(
+        json.dumps(alg_for_hash, sort_keys=True, separators=(",", ":")).encode()
+    ).hexdigest()[:8]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/llm_ptq/example_utils.py` around lines 865 - 871, The hash currently
drops all non-scalar fields from algorithm, risking collisions; instead, build
alg_for_hash from algorithm (still skipping layerwise_checkpoint_dir) but for
non-JSON-scalars include a deterministic placeholder that captures their
identity — e.g., replace each non-scalar value with a small stable descriptor
like its type name plus a short hash of repr(value) (or a deterministic
serialization), then compute config_hash from json.dumps(alg_for_hash,
sort_keys=True). This preserves uniqueness for nested/complex fields while
keeping the hash deterministic and still excluding layerwise_checkpoint_dir.


quant_cfg = copy.deepcopy(quant_cfg)
quant_cfg["algorithm"]["layerwise_checkpoint_dir"] = os.path.join(
Expand Down
34 changes: 18 additions & 16 deletions modelopt/torch/quantization/model_calib.py
Original file line number Diff line number Diff line change
Expand Up @@ -1657,34 +1657,36 @@ def layerwise_calibrate(
input_getter = LayerActivationCollector(model)
input_getter._patch_all_layers(decoder_layers=transformer_layers)

resumed_inputs = ckpt.setup_resume(transformer_layers) if ckpt and start_layer > 0 else None
# When all layers are already done (start_layer == num_layers), skip input setup:
resumed_inputs = (
ckpt.setup_resume(transformer_layers) if ckpt and 0 < start_layer < num_layers else None
)

try:
# Bootstrap: get first layer's inputs (or use resumed inputs).
layer_inputs = input_getter.get_first_layer_inputs(
start_layer, resumed_inputs, forward_loop
)
# Skip entirely when all layers are already calibrated (start_layer == num_layers).
if start_layer < num_layers:
layer_inputs = input_getter.get_first_layer_inputs(
start_layer, resumed_inputs, forward_loop
)
else:
layer_inputs = None
Comment on lines +1660 to +1673
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "Inspect detect_resume_point completion behavior:"
rg -n -C4 'def detect_resume_point|last \+ 1 >= total|return None' modelopt/torch/quantization/utils/layerwise_calib.py

echo
echo "Inspect _CheckpointState.from_folder start_layer assignment:"
rg -n -C6 'def from_folder|detect_resume_point|start =|return cls\(.*start_layer' modelopt/torch/quantization/utils/layerwise_calib.py

Repository: NVIDIA/Model-Optimizer

Length of output: 5411


🏁 Script executed:

sed -n '1640,1680p' modelopt/torch/quantization/model_calib.py | cat -n

Repository: NVIDIA/Model-Optimizer

Length of output: 2211


🏁 Script executed:

# Double-check: is there any other code path that could set start_layer to num_layers?
rg -n 'start_layer.*=.*num_layers|start_layer.*=.*total' modelopt/torch/quantization/

Repository: NVIDIA/Model-Optimizer

Length of output: 340


The start_layer == num_layers fast-path is unreachable due to checkpoint helper behavior.

The code at lines 1651 and 1659 is designed to skip bootstrap when all layers are already calibrated (start_layer == num_layers). However, detect_resume_point() returns None when calibration is complete, and _CheckpointState.from_folder() then sets start_layer = 0 (line 570 of layerwise_calib.py). This means resuming from a complete checkpoint will always have start_layer == 0, causing the bootstrap to execute unnecessarily.

The checkpoint helper should either set start_layer = num_layers when complete, or the code should check for completion via a dedicated flag instead of relying on start_layer == num_layers.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/model_calib.py` around lines 1651 - 1664, The
current fast-path relying on start_layer == num_layers is unreachable because
_CheckpointState.from_folder() resets start_layer to 0 when
detect_resume_point() returns None; fix this by having the checkpoint helper
indicate completion instead of hiding it: update _CheckpointState.from_folder
(and detect_resume_point) to set start_layer = num_layers (or expose a completed
flag like ckpt.completed) when calibration is already complete, and then use
that value/flag here (the start_layer check in model_calib.py that decides
whether to call ckpt.setup_resume and input_getter.get_first_layer_inputs).
Reference symbols to change: detect_resume_point, _CheckpointState.from_folder,
_CheckpointState (add completed or set start_layer=num_layers),
ckpt.setup_resume, start_layer, and input_getter.get_first_layer_inputs.


for layer_idx in range(start_layer, num_layers):
layer = transformer_layers[layer_idx]

def _layer_forward_loop(m, _inputs=layer_inputs):
for args, kwargs_input in _inputs:
# Reset past_key_values to prevent the KV cache from
# accumulating across multiple forward replays (e.g.
# max_calibrate then Hessian collection in GPTQ).
# The layer doesn't need stale KV data — each replay
# should start with a fresh cache.
if (
"past_key_values" in kwargs_input
and kwargs_input["past_key_values"] is not None
):
# Always clear past_key_values for each replay so layers
# that behave differently in decode vs prefill mode (e.g.
# NemotronH SSM/Mamba) always run in prefill mode where
# hidden_states has the full sequence length.
if "past_key_values" in kwargs_input:
kwargs_input = dict(kwargs_input)
cache = kwargs_input["past_key_values"]
if hasattr(cache, "reset"):
if cache is not None and hasattr(cache, "reset"):
cache.reset()
else:
kwargs_input["past_key_values"] = None
kwargs_input["past_key_values"] = None
m(*args, **kwargs_input)

with persistent_materialization(layer):
Expand Down
Loading