feat(quantization): add calibration cache to quantize_static by Rishi-Dave · Pull Request #28221 · microsoft/onnxruntime

Rishi-Dave · 2026-04-24T11:23:07Z

Summary

Add an optional calibration_cache_path parameter to quantize_static() so users can save and reload the calibration result (TensorsData) across runs.
Avoids re-running the expensive calibration inference pass when iterating on post-calibration options such as nodes_to_exclude, activation_type, or weight_type.
Cache format is JSON, mirroring the encoder already used by write_calibration_table — no new serialization surface area.

Motivation

Fixes #21908. Users commonly re-run quantize_static multiple times on the same model and calibration dataset while varying the set of excluded nodes or the quant types, to trade off accuracy vs. speed. Today, every call repeats the full calibration inference loop even though the calibration result is identical, which is costly on large calibration datasets. There was no supported way to persist the computed tensor ranges — write_calibration_table writes a lossy table (drops histogram data) and has no paired reader. This PR closes that gap.

Changes

python/tools/quantization/calibrate.py:
- Add TensorData.from_dict and TensorsData.from_dict classmethods (inverse of existing to_dict).
- Add module-level _CalibrationCacheEncoder(json.JSONEncoder), save_tensors_data(tensors, path), and load_tensors_data(path). The encoder handles TensorData/TensorsData/np.ndarray/CalibrationMethod/numpy scalars. Writes are atomic (tmp file + os.replace) and auto-create parent directories.
python/tools/quantization/quantize.py:
- quantize_static gains calibration_cache_path: str | Path | None = None. If the path exists, calibration is skipped and ranges are loaded from the cache. If the path is new, calibration runs and the result is saved. Raises ValueError if the cached calibration_method does not match the caller's calibrate_method.
- calibration_data_reader becomes optional; at least one of it or an existing cache must be provided, else ValueError.
python/tools/quantization/__init__.py: export TensorData, TensorsData, save_tensors_data, load_tensors_data.
Tests: new TestCalibrationCache in test/python/quantization/test_calibration.py covering MinMax roundtrip, Entropy roundtrip (with histogram), missing-path error, parent-dir auto-creation, numpy scalar bins handling, method-mismatch guard, end-to-end quantize_static cache hit/miss, and ValueError when neither reader nor cache is provided.

Test Plan

python -m pytest onnxruntime/test/python/quantization/test_calibration.py::TestCalibrationCache -v
python -m pytest onnxruntime/test/python/quantization/test_calibration.py::TestCalibrateMinMaxCalibrator -v (regression)
lintrunner -a on changed files: clean.

Backward Compatibility

calibration_data_reader changes from required-positional to optional-keyword. Existing call sites — whether positional or keyword — continue to work unchanged. The new behavior is only engaged when calibration_cache_path is provided.

Introduce an optional calibration_cache_path parameter on quantize_static so users can save the computed TensorsData after calibration and reload it on subsequent runs. This avoids repeating the expensive calibration inference pass when only post-calibration knobs (e.g. nodes_to_exclude, quant types) change between runs. The cache is a human-readable JSON file whose schema mirrors the encoder used by write_calibration_table: TensorData / TensorsData round-trip through new from_dict classmethods and module-level save_tensors_data / load_tensors_data helpers in calibrate.py. calibration_data_reader is now optional; at least one of it or an existing cache file must be provided. Fixes microsoft#21908

Copilot

Pull request overview

Adds a JSON-backed calibration cache for Python static quantization to allow reusing TensorsData across runs and skipping repeated calibration inference when inputs are unchanged.

Changes:

Add JSON serialization/deserialization utilities for TensorData/TensorsData (save/load calibration caches).
Extend quantize_static() with optional calibration_cache_path and make calibration_data_reader optional when an existing cache is provided.
Add test coverage for cache roundtrips and quantize_static cache hit/miss behavior.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File	Description
onnxruntime/python/tools/quantization/calibrate.py	Adds `from_dict` support plus JSON cache encoder and `save_tensors_data` / `load_tensors_data`.
onnxruntime/python/tools/quantization/quantize.py	Adds `calibration_cache_path` to `quantize_static` and reuses cached `TensorsData` when present.
onnxruntime/python/tools/quantization/init.py	Exports cache-related helpers and data structures via `onnxruntime.quantization`.
onnxruntime/test/python/quantization/test_calibration.py	Adds `TestCalibrationCache` covering serialization and end-to-end cache usage.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- load_tensors_data: reject non-file paths up front with a ValueError instead of letting Path.open raise IsADirectoryError or similar. - quantize_static: when extra_options['SmoothQuant']=True, require a non-None calibration_data_reader since the cache stores per-tensor ranges only and cannot drive the SmoothQuant transform. - quantize_static: treat the cache path as a hit only when it is a regular file; raise ValueError if it exists but is e.g. a directory, so callers get a clear message instead of a low-level IOError.

tianleiwu

Review Summary

The calibration cache feature is well-motivated and the implementation is clean overall. The atomic write pattern, method mismatch validation, and SmoothQuant guard are good design choices.

However, there is a correctness bug in the deserialization path that will cause ValueError at runtime when caching Entropy or Distribution calibration results. The root cause is that numpy scalars (produced by ndarray.min()/ndarray.max() in the Entropy calibrator at line 1154) are encoded as plain JSON numbers, but from_dict() doesn't convert them back to numpy-typed values before passing to TensorData.__init__(), which requires a .dtype attribute on float fields.

…failure - TensorData.from_dict: wrap plain int/float values for _floats keys as np.array(value, dtype=np.float32) so cache round-trip works for Entropy/Distribution calibration, where hist_edges.min()/.max() are serialized as numpy scalars and deserialize as plain Python floats. - save_tensors_data: wrap json.dump + os.replace in try/except BaseException that unlinks the .tmp file on failure, so partial serialization (or KeyboardInterrupt mid-write) does not leave stray .tmp files behind.

tianleiwu

Review Summary (Round 2)

Both concerns from the previous review round are addressed:

numpy scalar roundtrip bug (thread #4): from_dict() now coerces plain Python floats back to numpy arrays for float fields. Fixed.
Orphaned .tmp file (thread #5): save_tensors_data() now wraps the write in try/except BaseException and removes the temp file on failure. Fixed.

The implementation is clean, backward-compatible, and well-tested. One remaining minor test coverage suggestion below.

Verdict: APPROVE

tianleiwu · 2026-05-03T15:42:57Z

+        self.assertEqual(lo.shape, ())
+        self.assertEqual(hi.shape, ())
+
+    def test_save_load_tensors_data_entropy_roundtrip(self):


Nitpick — test uses 0-d ndarray, not actual numpy scalars

This test uses np.array(-0.5, dtype=np.float32) (a 0-d ndarray) for lowest/highest, but the real Entropy calibrator produces numpy scalars via hist_edges.min()/hist_edges.max(). As a result, this test doesn't directly exercise the from_dict fix that coerces plain float → np.float32.

Consider adding a case with actual numpy scalars:

def test_save_load_tensors_data_numpy_scalar_roundtrip(self): edges = np.array([0.0, 1.0, 2.0], dtype=np.float32) td = TensorsData( CalibrationMethod.Entropy, {"t": TensorData(lowest=edges.min(), highest=edges.max(), hist=edges[:-1], hist_edges=edges)}, ) cache_path = Path(self._tmp_dir.name) / "scalar_roundtrip.json" save_tensors_data(td, cache_path) loaded = load_tensors_data(cache_path) np.testing.assert_almost_equal(loaded["t"].range_value[0], 0.0) np.testing.assert_almost_equal(loaded["t"].range_value[1], 2.0)

tianleiwu requested a review from Copilot April 25, 2026 16:43

Copilot started reviewing on behalf of tianleiwu April 25, 2026 16:44 View session

Copilot AI reviewed Apr 25, 2026

View reviewed changes

Comment thread onnxruntime/python/tools/quantization/calibrate.py

Comment thread onnxruntime/python/tools/quantization/quantize.py

Comment thread onnxruntime/python/tools/quantization/quantize.py

tianleiwu requested changes May 3, 2026

View reviewed changes

Comment thread onnxruntime/python/tools/quantization/calibrate.py

Comment thread onnxruntime/python/tools/quantization/calibrate.py Outdated

tianleiwu approved these changes May 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(quantization): add calibration cache to quantize_static#28221

feat(quantization): add calibration cache to quantize_static#28221
Rishi-Dave wants to merge 3 commits intomicrosoft:mainfrom
Rishi-Dave:rishidave/feat/calibration-cache

Rishi-Dave commented Apr 24, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tianleiwu left a comment

Uh oh!

Uh oh!

Uh oh!

tianleiwu left a comment

Uh oh!

tianleiwu May 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Rishi-Dave commented Apr 24, 2026

Summary

Motivation

Changes

Test Plan

Backward Compatibility

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tianleiwu left a comment

Choose a reason for hiding this comment

Review Summary

Uh oh!

Uh oh!

Uh oh!

tianleiwu left a comment

Choose a reason for hiding this comment

Review Summary (Round 2)

Uh oh!

tianleiwu May 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants