feat(quantization): add ActivationRestrictedAsymmetric option#28237
feat(quantization): add ActivationRestrictedAsymmetric option#28237Rishi-Dave wants to merge 3 commits intomicrosoft:mainfrom
Conversation
…t8 zero-point snapping
When extra_options={"ActivationRestrictedAsymmetric": True} is passed to
quantize_static (or a QDQ config), uint8 activation zero-points are snapped
to 0 when rmin >= 0 (e.g. post-ReLU tensors) or 128 when rmin < 0. Scale
is recomputed so the dequantized range still covers [rmin, rmax] without
clipping.
- quant_utils: add snap_zero_point_to_uint8() helper (~28 LOC)
- base_quantizer: parse ActivationRestrictedAsymmetric extra-option flag
- onnx_quantizer: apply snap after compute_scale_zp in calc_quant_params
(uint8, non-symmetric activations only)
- qdq_quantizer: same snap in QDQ calc_quant_params path
- quantize: document new option in all four extra_options docstrings
- test_symmetric_flag: add TestRestrictedAsymmetricFlag (3 test methods)
Refs microsoft#21398
tianleiwu
left a comment
There was a problem hiding this comment.
Thanks for the focused change. The new option is consistently wired through the QOperator and QDQ paths, and the basic snap behavior is covered. I found one correctness issue that should be fixed before merge: the restricted asymmetric path recomputes scale after the existing quant-param helper and drops the MinimumRealRange guarantee. I also left a smaller test-discovery note.
| return [zero_point, scale] | ||
|
|
||
|
|
||
| def snap_zero_point_to_uint8(rmin, rmax): |
There was a problem hiding this comment.
This helper needs to preserve the existing MinimumRealRange behavior. Both call sites first compute params with compute_scale_zp(..., self.min_real_range), but then replace the result with snap_zero_point_to_uint8(rmin, rmax). For a narrow activation range such as [0, 1e-6] with MinimumRealRange=0.0001, this returns a scale based on 1e-6 / 255 instead of the documented minimum range, which can regress EP configs that rely on that option. Please pass min_real_range into this helper, or pre-adjust rmax the same way compute_scale_zp does before calculating the restricted asymmetric scale.
| unittest.main() | ||
|
|
||
|
|
||
| class TestRestrictedAsymmetricFlag(unittest.TestCase): |
There was a problem hiding this comment.
Please move this class above the if __name__ == "__main__": unittest.main() block, or move the main guard back to the end of the file. Import-based discovery can still find the class, but direct execution with python test_symmetric_flag.py starts unittest.main() before this class is defined, so these three new tests are skipped in that mode.
There was a problem hiding this comment.
Pull request overview
Adds a new Python quantization extra_options mode (ActivationRestrictedAsymmetric) to support uint8 activation zero-points restricted to {0, 128}, as required by some accelerators.
Changes:
- Add
snap_zero_point_to_uint8(rmin, rmax)helper to recompute (zp, scale) with zp snapped to 0 or 128. - Parse/propagate the new
ActivationRestrictedAsymmetricoption and apply it in both QOperator and QDQ quantization activation parameter calculation. - Document the option in
quantize.pyand add unit tests covering the expected snapping behavior.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| onnxruntime/python/tools/quantization/quant_utils.py | Adds snapping helper for restricted-asymmetric uint8 activations. |
| onnxruntime/python/tools/quantization/base_quantizer.py | Parses new ActivationRestrictedAsymmetric extra option. |
| onnxruntime/python/tools/quantization/onnx_quantizer.py | Applies snapping in QOperator activation quant-param calculation. |
| onnxruntime/python/tools/quantization/qdq_quantizer.py | Applies snapping in QDQ activation quant-param calculation. |
| onnxruntime/python/tools/quantization/quantize.py | Documents the new extra-option in public docstrings. |
| onnxruntime/test/python/quantization/test_symmetric_flag.py | Adds tests validating zp snapping behavior for positive/signed activation ranges. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| qmin, qmax = get_qmin_qmax_for_qType(quant_type, reduce_range=reduce_range, symmetric=symmetric) | ||
| zero, scale = compute_scale_zp(rmin, rmax, qmin, qmax, symmetric, self.min_real_range) | ||
| if self.is_activation_restricted_asymmetric and quant_type == onnx.TensorProto.UINT8 and not symmetric: | ||
| zero, scale = snap_zero_point_to_uint8(rmin, rmax) |
| zero, scale = snap_zero_point_to_uint8(rmin, rmax) | ||
|
|
| if rmax <= rmin: | ||
| # Degenerate range – return neutral values | ||
| return numpy.array(0, dtype=numpy.uint8), numpy.array(1.0, dtype=numpy.float32) | ||
| if rmin >= 0.0: | ||
| zero_point = numpy.array(0, dtype=numpy.uint8) | ||
| scale = numpy.array(rmax / 255.0, dtype=numpy.float32) | ||
| else: | ||
| zero_point = numpy.array(128, dtype=numpy.uint8) | ||
| # Choose scale that covers both negative and positive halves without clipping | ||
| scale_neg = -rmin / 128.0 # scale needed to represent rmin at q=0 | ||
| scale_pos = rmax / 127.0 # scale needed to represent rmax at q=255 |
| if __name__ == "__main__": | ||
| unittest.main() | ||
|
|
||
|
|
||
| class TestRestrictedAsymmetricFlag(unittest.TestCase): |
| def snap_zero_point_to_uint8(rmin, rmax): | ||
| """Snap a uint8 activation zero-point to 0 (when rmin >= 0) or 128 (when rmin < 0). | ||
|
|
||
| Used by the ActivationRestrictedAsymmetric quantization option. Recomputes scale so the | ||
| dequantized range still covers [rmin, rmax] without clipping. | ||
|
|
||
| :parameter rmin: calibrated minimum activation value (numpy scalar) | ||
| :parameter rmax: calibrated maximum activation value (numpy scalar) | ||
| :return: (zero_point, scale) with zero_point dtype uint8 and scale dtype float32 | ||
| """ | ||
| rmin = float(numpy.squeeze(rmin)) | ||
| rmax = float(numpy.squeeze(rmax)) | ||
| if rmax <= rmin: | ||
| # Degenerate range – return neutral values | ||
| return numpy.array(0, dtype=numpy.uint8), numpy.array(1.0, dtype=numpy.float32) | ||
| if rmin >= 0.0: | ||
| zero_point = numpy.array(0, dtype=numpy.uint8) | ||
| scale = numpy.array(rmax / 255.0, dtype=numpy.float32) | ||
| else: | ||
| zero_point = numpy.array(128, dtype=numpy.uint8) | ||
| # Choose scale that covers both negative and positive halves without clipping | ||
| scale_neg = -rmin / 128.0 # scale needed to represent rmin at q=0 | ||
| scale_pos = rmax / 127.0 # scale needed to represent rmax at q=255 | ||
| scale = numpy.array(max(scale_neg, scale_pos), dtype=numpy.float32) |
…c snap Address review feedback on PR microsoft#28237: - snap_zero_point_to_uint8 now accepts qmin/qmax and min_real_range, so the helper preserves the MinimumRealRange floor (matching compute_scale_zp behavior) and handles reduce_range=True correctly. The midpoint and scale formulas are derived from qmin/qmax instead of hardcoded UINT8 constants. - Both call sites in onnx_quantizer.py and qdq_quantizer.py now pass qmin, qmax, and self.min_real_range into the helper. - Move the unittest.main() guard to the end of test_symmetric_flag.py so TestRestrictedAsymmetricFlag is discovered when the file is run directly with python test_symmetric_flag.py.
…int_to_uint8 snap_zero_point_to_uint8 hardcoded uint8-asymmetric bounds (0/255/128/127) and returned scale=1.0 on degenerate ranges, which discarded any reduce_range or MinimumRealRange settings already applied by the caller. - Parameterize the helper on qmin, qmax, min_real_range. Default arg values reproduce the prior 0/255 math exactly. - Compute the snap pivot as mid = (qmin + qmax + 1) // 2 instead of hardcoding 128, so reduce_range (qmax=127) yields a valid in-range zp. - In the degenerate (rmax <= rmin) branch, derive scale from max(|rmin|, |rmax|) instead of returning 1.0; honor the min_real_range floor when provided. - Forward qmin, qmax, and self.min_real_range from both call sites in onnx_quantizer.py and qdq_quantizer.py to keep ActivationRestrictedAsymmetric consistent with compute_scale_zp. - Add tests for the reduce_range and min_real_range paths.
|
Thanks for the careful review. Pushed d184733 addressing the points:
|
Description
Adds a new
ActivationRestrictedAsymmetricextra-option to the Pythonquantization tools. When enabled, uint8 activation zero-points are snapped
to either 0 (when
rmin >= 0, e.g. post-ReLU/Sigmoid tensors) or 128(when
rmin < 0). The scale is recomputed so the dequantized range stillcovers
[rmin, rmax]without clipping.This restricted asymmetric mode is required by some hardware accelerators
that only support these two zero-point values for uint8 quantization,
without requiring the full restriction to symmetric (zero-point = 128 for
all tensors).
Motivation and Context
Fixes #21398.
Existing options cover only fully symmetric (
ActivationSymmetric→zero-point fixed at 128) or unrestricted asymmetric. There was no mode
that picks the closer of {0, 128} per tensor based on its observed range.
Changes
quant_utils.py: newsnap_zero_point_to_uint8(rmin, rmax)helper.base_quantizer.py: parse newActivationRestrictedAsymmetricextra-option.onnx_quantizer.pyandqdq_quantizer.py: apply snap aftercompute_scale_zpin the activation path. Guarded onquant_type == UINT8 and not symmetric. Weight and int8 paths areuntouched.
quantize.py: document the new option in the fourextra_optionsdocstrings.
test_symmetric_flag.py: newTestRestrictedAsymmetricFlagcoveringthree cases (positive range → zp=0, signed range → zp=128, and
option-disabled regression).
Testing
```
python -m pytest onnxruntime/test/python/quantization/test_symmetric_flag.py -v
```
All 7 tests pass (4 existing + 3 new). `lintrunner` is clean.