[WIP][Feature Request] Support ONNX Q/DQ Autotuning with Subgraph Mode by Hale423 · Pull Request #1015 · NVIDIA/Model-Optimizer

Hale423 · 2026-03-10T02:14:13Z

Pull Request: Expand ONNX Q/DQ Autotuning with Subgraph Mode

Branch: dev-wahao-autotune-subgraph-profile → main
Type: Feature

Summary

This PR expands on the existing ONNX Q/DQ Autotune framework by adding a subgraph workflow (--workflow subgraph) that uses TensorRT fusion boundaries for faster, fusion-aware QDQ placement optimization.

It also integrates @willg-nv's TorchRegionBuilder (from Draft PR #963) for PyTorch-style hierarchical region discovery, and adds TensorRT Python API benchmarking utilities, documentation, and examples.

Relation to existing PRs:

This PR supersedes Draft: Integrate TorchRegionBuilder to AutoQDQ #963 — Will Guo's TorchRegionBuilder is included here along with the subgraph workflow.
The base autotune package (region mode) was introduced by PRs Integrate Automated QDQ placement tool - part 4.3 #843, [OMNIML-3252][ONNX] Add real Q/DQ scales in Autotune #951, [OMNIML-3252][ONNX] MOQ + Autotune moq integration docs #1026; this PR builds on that foundation.

What's New

Area	Description
Subgraph workflow	`--workflow subgraph`: fusion-aware grouping from TensorRT `graph.json`; per-subgraph QDQ scheme profiling; optional per-layer timing with fallback to total latency.
Fusion grouping	`fusion_grouping.py`: parse TRT `graph.json`, build fusion groups, infer shapes. Auto-generates `graph.json` via `trtexec` FP16 build if not provided.
Subgraph extraction	`subgraph_extractor.py`: extract standalone ONNX subgraphs per fusion group for isolated benchmarking.
Torch region builder	`torch_region_builder.py`: PyTorch-style hierarchical region discovery using node name conventions (from #963).
TensorRT utils	`tensorrt_utils.py`: TRT Python API benchmark with timing cache, plugin support, and configurable warmup/timing runs.
Incremental validation	Per-group full-model validation: apply QDQ groups one-by-one, keep only those that improve latency. Saves `optimized_raw.onnx` + `optimized_final.onnx`.
Cache / resume	`autotune_cache.json` for Phase 2 (subgraph profiling) and Phase 3 (incremental validation).
trtexec compatibility	Profiling-flag retry: on "Unknown option", strip `--exportProfile`/`--profilingVerbosity` and retry with total latency.
CLI	`--workflow {region,subgraph}`, `--graph_json`, `--incremental_validation` / `--no-incremental-validation`.
Example	`examples/qdq_placement/`: README (Quick Start, region vs subgraph, best practices) and `set_batch_size.py`.

Key Files

Path	Role
`modelopt/onnx/quantization/autotune/__main__.py`	CLI: `--workflow`, `--graph-json`, `--incremental-validation`, `--use-trtexec`, `--trtexec-args`, etc.
`modelopt/onnx/quantization/autotune/subgraph_workflow.py`	Subgraph pipeline: Phase 1 (fusion grouping), Phase 2 (subgraph profiling), Phase 3 (full-model + incremental validation), cache I/O.
`modelopt/onnx/quantization/autotune/fusion_grouping.py`	Parse `graph.json`, create fusion groups, `generate_graph_json()` (trtexec FP16 build when no graph is provided).
`modelopt/onnx/quantization/autotune/subgraph_extractor.py`	Extract subgraph ONNX from full model given group inputs/outputs and shapes.
`modelopt/onnx/quantization/autotune/tensorrt_utils.py`	TRT Python API benchmark runner with timing cache, plugin support, and dynamic shape handling.
`modelopt/onnx/quantization/autotune/torch_region_builder.py`	PyTorch-style hierarchical region discovery for region mode.
`modelopt/onnx/quantization/autotune/benchmark.py`	trtexec benchmark runner: optional `export_profile_path`, profiling-flag dedup and "Unknown option" retry.
`modelopt/onnx/quantization/autotune/workflows.py`	Dispatcher and `benchmark_onnx_model()`; passes through `export_profile_path` when using trtexec.
`modelopt/onnx/quantization/autotune/qdq_utils.py`	Quantized tensor discovery helpers.
`examples/qdq_placement/README.md`	User-facing example: prerequisites, Quick Start (region + subgraph), output layout, subgraph best practices.
`examples/qdq_placement/set_batch_size.py`	ResNet50 fixed-batch script for reproducible benchmarking.
`tests/unit/onnx/quantization/autotune/test_config.py`	Config class unit tests.

How to Test

Region mode (default):

cd examples/qdq_placement
curl -L -o resnet50_Opset17.onnx https://github.com/onnx/models/raw/main/Computer_Vision/resnet50_Opset17_torch_hub/resnet50_Opset17.onnx
python3 set_batch_size.py resnet50_Opset17.onnx --batch-size 128 --output resnet50.bs128.onnx
python3 -m modelopt.onnx.quantization.autotune --model resnet50.bs128.onnx --output ./resnet50_results --quant-type int8 --schemes-per-region 20
# Expect: ./resnet50_results/optimized_final.onnx and logs under ./resnet50_results/logs/

Subgraph mode with trtexec (FP8, optional graph.json):

python3 -m modelopt.onnx.quantization.autotune \
  --model resnet50.bs128.onnx \
  --output ./resnet50_subgraph \
  --workflow subgraph \
  --quant-type fp8 \
  --use-trtexec \
  --warmup-runs 5 \
  --timing-runs 20 \
  --incremental-validation \
  --trtexec-args "--stronglyTyped" \
  --schemes-per-region 30
# If --graph-json is omitted, first run will trigger trtexec to generate graph.json under output dir.
# Expect: optimized_raw.onnx, optimized_final.onnx, autotune_cache.json, logs/, subgraphs/

Resume: Kill the subgraph run mid-way, then re-run the same command; it should resume from autotune_cache.json.

Checklist

Rebased onto latest main with DCO sign-off
All CodeRabbit review comments addressed (license headers, mutable defaults, bug fixes)
CI / unit tests pass
Region mode end-to-end verified
Subgraph mode end-to-end with --use-trtexec verified
Interrupted subgraph run resumes after re-run
examples/qdq_placement/README.md matches behavior

Documentation

Example: examples/qdq_placement/README.md — Quick Start, subgraph best practices, output layout, optional graph generation.
Guides / API: docs/source/guides/9_qdq_placement.rst and docs/source/reference/2_qdq_placement.rst align with the CLI and behavior above.

Notes

The base autotune package (region mode) was introduced in previous PRs (Integrate Automated QDQ placement tool - part 4.3 #843, [OMNIML-3252][ONNX] Add real Q/DQ scales in Autotune #951, [OMNIML-3252][ONNX] MOQ + Autotune moq integration docs #1026); this PR adds the subgraph workflow and supporting infrastructure on top.
Subgraph mode is recommended for large or dynamic-shape models; region mode remains the default for compatibility and smaller models.
trtexec versions that do not support --exportProfile / --profilingVerbosity are handled by retrying without those flags and using total latency for scheme selection.

Summary by CodeRabbit

New Features
- Automated Q/DQ placement optimizer: region and new subgraph workflows, fusion-aware grouping, per-subgraph heuristic schemes, incremental validation, resume/crash recovery, and calibration support.
- Expanded CLI: workflow selection, calibration options, incremental validation, and profiling/export options.
Utilities
- Batch-size fixer, subgraph extractor, fusion/region inspection, and TensorRT benchmarking (CLI & Python) with timing-cache and profiling fallbacks.
Documentation
- Comprehensive guides, reference docs, examples, and quick-starts for autotuning and deployment.
Tests
- Extensive unit tests for workflows, utilities, and tooling.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP][Feature Request] Support ONNX Q/DQ Autotuning with Subgraph Mode#1015

[WIP][Feature Request] Support ONNX Q/DQ Autotuning with Subgraph Mode#1015
Hale423 wants to merge 5 commits intoNVIDIA:mainfrom
Hale423:dev-wahao-autotune-subgraph-profile

Hale423 commented Mar 10, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Hale423 commented Mar 10, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request: Expand ONNX Q/DQ Autotuning with Subgraph Mode

Summary

What's New

Key Files

How to Test

Checklist

Documentation

Notes

Summary by CodeRabbit

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Hale423 commented Mar 10, 2026 •

edited by coderabbitai Bot

Loading