Parallelize the CommutationAnalysis pass by mtreinish · Pull Request #16014 · Qiskit/qiskit

mtreinish · 2026-04-13T17:08:32Z

Building on top of #15988 which removed the internal caching from the
commutation checker. This commit parallelizes the commutation analysis
pass so that the analysis per qubit is done in multiple threads and
aggregated together in the end. The speed-up this enables is fairly
modest because we're spending more of the time in the serial portion of
the pass. But even so it's a simple code change that does speed the pass
and by extension CommutativeCancellation (since the rust function that
is parallelized is called inside that pass too).

In general the pass could probably use a rearchitecting as I think a lot
of the issues stem with how it's collecting data which seems overly
specific to how the pass worked from Python. However, since there is a
CommutativeOptimization pass that is designed to replace the
CommutationAnalysis/CommutativeCancellation passes spending too much
time on this is probably not worth it.

~~This PR is based on top of #15999 and will need to be rebased after that PR merges. In the meantime you can view just the contents of this PR by looking at the HEAD commit: a52d87f~~ This has been rebased now.

AI/LLM disclosure

I didn't use LLM tooling, or only used it privately.
I used the following tool to help write this PR description:
I used the following tool to generate or modify code:

qiskit-bot · 2026-04-13T17:08:36Z

One or more of the following people are relevant to this code:

@Qiskit/terra-core

mtreinish · 2026-04-13T18:10:17Z

I ran a quick asv benchmark and the results were better than I remembered. That being said the time_bv_like regression is caused by the overhead from the parallel path for small circuits since that circuit is just 200 cx gates that all cancel out when you commute through the 2 single qubit gates in the circuit.

| Change   | Before [0d63a936]    | After [a52d87fb]    |   Ratio | Benchmark (Parameter)                                                            |
|----------|----------------------|---------------------|---------|----------------------------------------------------------------------------------|
| -        | 9.93±0s              | 8.79±0.01s          |    0.89 | utility_scale.UtilityScaleBenchmarks.time_hwb12('cx')                            |
| -        | 823±4μs              | 683±8μs             |    0.83 | passes.CommutativeAnalysisPassBenchmarks.time_commutative_cancellation(5, 1024)  |
| -        | 1.95±0.01ms          | 1.25±0.01ms         |    0.64 | passes.CommutativeAnalysisPassBenchmarks.time_commutative_cancellation(14, 1024) |
| -        | 2.76±0.01ms          | 1.65±0.01ms         |    0.6  | passes.CommutativeAnalysisPassBenchmarks.time_commutative_cancellation(20, 1024) |
| -        | 17.1±0.07ms          | 6.37±0.2ms          |    0.37 | passes.PassBenchmarks.time_commutation_analysis(5, 1024)                         |
| -        | 77.7±0.1ms           | 20.6±0.3ms          |    0.27 | passes.PassBenchmarks.time_commutation_analysis(20, 1024)                        |
| -        | 52.9±0.2ms           | 13.2±0.2ms          |    0.25 | passes.PassBenchmarks.time_commutation_analysis(14, 1024)                        |

Benchmarks that have stayed the same:

| Change   | Before [0d63a936]    | After [a52d87fb]    | Ratio   | Benchmark (Parameter)                                                                                           |
|----------|----------------------|---------------------|---------|-----------------------------------------------------------------------------------------------------------------|
|          | 0                    | 0                   | n/a     | utility_scale.UtilityScaleBenchmarks.track_bvlike_depth('cx')                                                   |
|          | 0                    | 0                   | n/a     | utility_scale.UtilityScaleBenchmarks.track_bvlike_depth('cz')                                                   |
|          | 0                    | 0                   | n/a     | utility_scale.UtilityScaleBenchmarks.track_bvlike_depth('ecr')                                                  |
|          | 3.97±0.01ms          | 4.22±0.09ms         | 1.06    | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm_backend_with_prop(2)                 |
|          | 4.14±0.01ms          | 4.37±0.07ms         | 1.05    | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm_backend_with_prop(3)                 |
|          | 5.44±0.02ms          | 5.66±0.08ms         | 1.04    | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm(3)                                   |
|          | 40.5±0.2ms           | 41.8±0.5ms          | 1.03    | quantum_info.PauliListBench.time_group_qubit_wise_commuting(400, 500)                                           |
|          | 5.23±0.01ms          | 5.39±0.01ms         | 1.03    | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm(2)                                   |
|          | 26.6±0.1ms           | 27.0±0.3ms          | 1.02    | quantum_info.PauliListBench.time_group_qubit_wise_commuting(100, 500)                                           |
|          | 28.1±0.4ms           | 28.6±0.7ms          | 1.02    | utility_scale.UtilityScaleBenchmarks.time_bv_100('ecr')                                                         |
|          | 175±1ms              | 176±0.6ms           | 1.01    | passes.MultipleBasisPassBenchmarks.time_optimize_1q_commutation(20, 1024, ['u', 'cx', 'id'])                    |
|          | 38.4±0.1ms           | 38.9±0.09ms         | 1.01    | passes.MultipleBasisPassBenchmarks.time_optimize_1q_commutation(5, 1024, ['rx', 'ry', 'rz', 'r', 'rxx', 'id'])  |
|          | 36.1±0.03ms          | 36.4±0.1ms          | 1.01    | passes.MultipleBasisPassBenchmarks.time_optimize_1q_commutation(5, 1024, ['u', 'cx', 'id'])                     |
|          | 3.22±0.01μs          | 3.26±0.04μs         | 1.01    | quantum_info.PauliBench.time_commutes(100)                                                                      |
|          | 3.35±0.01μs          | 3.37±0.04μs         | 1.01    | quantum_info.PauliBench.time_commutes(300)                                                                      |
|          | 3.38±0.01μs          | 3.41±0.03μs         | 1.01    | quantum_info.PauliBench.time_commutes(500)                                                                      |
|          | 47.8±0.3μs           | 48.3±0.5μs          | 1.01    | quantum_info.PauliListBench.time_commutes(200, 500)                                                             |
|          | 84.0±0.8μs           | 84.6±0.8μs          | 1.01    | quantum_info.PauliListBench.time_commutes(400, 500)                                                             |
|          | 155±6μs              | 156±6μs             | 1.01    | quantum_info.PauliListBench.time_commutes_with_all(100, 500)                                                    |
|          | 3.62±0.01ms          | 3.66±0.03ms         | 1.01    | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm(0)                                   |
|          | 5.63±0.02ms          | 5.68±0.02ms         | 1.01    | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm(1)                                   |
|          | 14.4±0.1ms           | 14.6±0.1ms          | 1.01    | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm_backend_with_prop(0)                 |
|          | 18.9±0.09ms          | 19.1±0.1ms          | 1.01    | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm_backend_with_prop(1)                 |
|          | 5.87±0.02ms          | 5.92±0.1ms          | 1.01    | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_qv_14_x_14(0)                                        |
|          | 11.6±0.1ms           | 11.8±0.3ms          | 1.01    | utility_scale.UtilityScaleBenchmarks.time_circSU2('cx')                                                         |
|          | 3.03±0.03s           | 3.07±0.07s          | 1.01    | utility_scale.UtilityScaleBenchmarks.time_circSU2_89('cx')                                                      |
|          | 126±0.6ms            | 125±0.8ms           | 1.00    | passes.MultipleBasisPassBenchmarks.time_optimize_1q_commutation(14, 1024, ['rz', 'x', 'sx', 'cx', 'id'])        |
|          | 109±0.5ms            | 109±0.3ms           | 1.00    | passes.MultipleBasisPassBenchmarks.time_optimize_1q_commutation(14, 1024, ['u', 'cx', 'id'])                    |
|          | 187±1ms              | 186±1ms             | 1.00    | passes.MultipleBasisPassBenchmarks.time_optimize_1q_commutation(20, 1024, ['rx', 'ry', 'rz', 'r', 'rxx', 'id']) |
|          | 199±0.8ms            | 199±0.7ms           | 1.00    | passes.MultipleBasisPassBenchmarks.time_optimize_1q_commutation(20, 1024, ['rz', 'x', 'sx', 'cx', 'id'])        |
|          | 42.9±0.2ms           | 43.1±0.1ms          | 1.00    | passes.MultipleBasisPassBenchmarks.time_optimize_1q_commutation(5, 1024, ['rz', 'x', 'sx', 'cx', 'id'])         |
|          | 3.35±0.01μs          | 3.36±0.02μs         | 1.00    | quantum_info.PauliBench.time_commutes(200)                                                                      |
|          | 3.40±0.02μs          | 3.41±0.02μs         | 1.00    | quantum_info.PauliBench.time_commutes(400)                                                                      |
|          | 30.6±0.1μs           | 30.5±0.2μs          | 1.00    | quantum_info.PauliListBench.time_commutes(100, 500)                                                             |
|          | 45.9±0.3ms           | 45.8±0.3ms          | 1.00    | quantum_info.PauliListBench.time_group_qubit_wise_commuting(500, 500)                                           |
|          | 43.9±0.2μs           | 44.0±0.06μs         | 1.00    | quantum_info.PauliListQargsBench.time_commutes_with_qargs(100, 500)                                             |
|          | 81.3±0.3μs           | 81.1±0.4μs          | 1.00    | quantum_info.PauliListQargsBench.time_commutes_with_qargs(200, 500)                                             |
|          | 121±0.5μs            | 122±0.9μs           | 1.00    | quantum_info.PauliListQargsBench.time_commutes_with_qargs(300, 500)                                             |
|          | 161±1μs              | 161±0.6μs           | 1.00    | quantum_info.PauliListQargsBench.time_commutes_with_qargs(400, 500)                                             |
|          | 202±0.7μs            | 203±0.4μs           | 1.00    | quantum_info.PauliListQargsBench.time_commutes_with_qargs(500, 500)                                             |
|          | 34.3±0.2ms           | 34.4±0.3ms          | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.time_schedule_qv_14_x_14(0)                                         |
|          | 28.8±0.1ms           | 28.6±0.3ms          | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.time_schedule_qv_14_x_14(1)                                         |
|          | 16.3±0.1ms           | 16.2±0.3ms          | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_qv_14_x_14(2)                                        |
|          | 1429                 | 1429                | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_quantum_volume_transpile_50_x_20(0)                     |
|          | 1323                 | 1323                | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_quantum_volume_transpile_50_x_20(1)                     |
|          | 1251                 | 1251                | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_quantum_volume_transpile_50_x_20(2)                     |
|          | 1331                 | 1331                | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_quantum_volume_transpile_50_x_20(3)                     |
|          | 2705                 | 2705                | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm(0)                            |
|          | 2005                 | 2005                | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm(1)                            |
|          | 7                    | 7                   | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm(2)                            |
|          | 7                    | 7                   | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm(3)                            |
|          | 11117                | 11117               | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm_backend_with_prop(0)          |
|          | 5015                 | 5015                | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm_backend_with_prop(1)          |
|          | 16                   | 16                  | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm_backend_with_prop(2)          |
|          | 16                   | 16                  | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm_backend_with_prop(3)          |
|          | 1035                 | 1035                | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_qv_14_x_14(0)                                 |
|          | 767                  | 767                 | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_qv_14_x_14(1)                                 |
|          | 577                  | 577                 | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_qv_14_x_14(2)                                 |
|          | 641                  | 641                 | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_qv_14_x_14(3)                                 |
|          | 13.6±0.2ms           | 13.5±0.2ms          | 1.00    | utility_scale.UtilityScaleBenchmarks.time_circSU2('ecr')                                                        |
|          | 339±0.5ms            | 337±0.8ms           | 1.00    | utility_scale.UtilityScaleBenchmarks.time_qv('cz')                                                              |
|          | 337±0.6ms            | 337±0.6ms           | 1.00    | utility_scale.UtilityScaleBenchmarks.time_qv('ecr')                                                             |
|          | 665                  | 665                 | 1.00    | utility_scale.UtilityScaleBenchmarks.track_bv_100_depth('cx')                                                   |
|          | 665                  | 665                 | 1.00    | utility_scale.UtilityScaleBenchmarks.track_bv_100_depth('cz')                                                   |
|          | 665                  | 665                 | 1.00    | utility_scale.UtilityScaleBenchmarks.track_bv_100_depth('ecr')                                                  |
|          | 1423                 | 1423                | 1.00    | utility_scale.UtilityScaleBenchmarks.track_circSU2_89_depth('cx')                                               |
|          | 1423                 | 1423                | 1.00    | utility_scale.UtilityScaleBenchmarks.track_circSU2_89_depth('cz')                                               |
|          | 1423                 | 1423                | 1.00    | utility_scale.UtilityScaleBenchmarks.track_circSU2_89_depth('ecr')                                              |
|          | 300                  | 300                 | 1.00    | utility_scale.UtilityScaleBenchmarks.track_circSU2_depth('cx')                                                  |
|          | 300                  | 300                 | 1.00    | utility_scale.UtilityScaleBenchmarks.track_circSU2_depth('cz')                                                  |
|          | 300                  | 300                 | 1.00    | utility_scale.UtilityScaleBenchmarks.track_circSU2_depth('ecr')                                                 |
|          | 382321               | 382321              | 1.00    | utility_scale.UtilityScaleBenchmarks.track_hwb12_depth('cx')                                                    |
|          | 383978               | 383978              | 1.00    | utility_scale.UtilityScaleBenchmarks.track_hwb12_depth('cz')                                                    |
|          | 383192               | 383192              | 1.00    | utility_scale.UtilityScaleBenchmarks.track_hwb12_depth('ecr')                                                   |
|          | 1617                 | 1617                | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qaoa_depth('cx')                                                     |
|          | 1622                 | 1622                | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qaoa_depth('cz')                                                     |
|          | 1622                 | 1622                | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qaoa_depth('ecr')                                                    |
|          | 1801                 | 1801                | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qft_depth('cx')                                                      |
|          | 1815                 | 1815                | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qft_depth('cz')                                                      |
|          | 1815                 | 1815                | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qft_depth('ecr')                                                     |
|          | 2628                 | 2628                | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qv_depth('cx')                                                       |
|          | 2628                 | 2628                | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qv_depth('cz')                                                       |
|          | 2628                 | 2628                | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qv_depth('ecr')                                                      |
|          | 366                  | 366                 | 1.00    | utility_scale.UtilityScaleBenchmarks.track_square_heisenberg_depth('cx')                                        |
|          | 366                  | 366                 | 1.00    | utility_scale.UtilityScaleBenchmarks.track_square_heisenberg_depth('cz')                                        |
|          | 366                  | 366                 | 1.00    | utility_scale.UtilityScaleBenchmarks.track_square_heisenberg_depth('ecr')                                       |
|          | 116±0.5ms            | 116±0.5ms           | 0.99    | passes.MultipleBasisPassBenchmarks.time_optimize_1q_commutation(14, 1024, ['rx', 'ry', 'rz', 'r', 'rxx', 'id']) |
|          | 67.6±0.6μs           | 66.8±0.6μs          | 0.99    | quantum_info.PauliListBench.time_commutes(300, 500)                                                             |
|          | 105±1μs              | 104±0.4μs           | 0.99    | quantum_info.PauliListBench.time_commutes(500, 500)                                                             |
|          | 205±9μs              | 203±10μs            | 0.99    | quantum_info.PauliListBench.time_commutes_with_all(200, 500)                                                    |
|          | 277±20μs             | 275±8μs             | 0.99    | quantum_info.PauliListBench.time_commutes_with_all(400, 500)                                                    |
|          | 32.6±0.2ms           | 32.3±0.4ms          | 0.99    | quantum_info.PauliListBench.time_group_qubit_wise_commuting(200, 500)                                           |
|          | 37.5±0.2ms           | 37.2±0.09ms         | 0.99    | quantum_info.PauliListBench.time_group_qubit_wise_commuting(300, 500)                                           |
|          | 28.6±0.2ms           | 28.4±0.1ms          | 0.99    | transpiler_levels.TranspilerLevelBenchmarks.time_quantum_volume_transpile_50_x_20(1)                            |
|          | 8.61±0.08ms          | 8.53±0.06ms         | 0.99    | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_qv_14_x_14(1)                                        |
|          | 31.8±0.4ms           | 31.6±0.3ms          | 0.99    | utility_scale.UtilityScaleBenchmarks.time_bv_100('cx')                                                          |
|          | 28.5±0.5ms           | 28.3±0.3ms          | 0.99    | utility_scale.UtilityScaleBenchmarks.time_bv_100('cz')                                                          |
|          | 13.3±0.1ms           | 13.1±0.09ms         | 0.99    | utility_scale.UtilityScaleBenchmarks.time_circSU2('cz')                                                         |
|          | 3.76±0.01ms          | 3.72±0.04ms         | 0.99    | utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100('cx')                                                 |
|          | 3.75±0.01ms          | 3.70±0.04ms         | 0.99    | utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100('cz')                                                 |
|          | 13.1±0.03ms          | 13.0±0.2ms          | 0.99    | utility_scale.UtilityScaleBenchmarks.time_parse_square_heisenberg_n100('cx')                                    |
|          | 13.1±0.04ms          | 13.0±0.09ms         | 0.99    | utility_scale.UtilityScaleBenchmarks.time_parse_square_heisenberg_n100('ecr')                                   |
|          | 12.7±0.3ms           | 12.4±0.2ms          | 0.98    | transpiler_levels.TranspilerLevelBenchmarks.time_quantum_volume_transpile_50_x_20(0)                            |
|          | 105±0.6ms            | 103±0.07ms          | 0.98    | transpiler_levels.TranspilerLevelBenchmarks.time_quantum_volume_transpile_50_x_20(2)                            |
|          | 123±0.09ms           | 121±0.5ms           | 0.98    | transpiler_levels.TranspilerLevelBenchmarks.time_quantum_volume_transpile_50_x_20(3)                            |
|          | 18.1±0.1ms           | 17.8±0.2ms          | 0.98    | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_qv_14_x_14(3)                                        |
|          | 219±0.9ms            | 216±1ms             | 0.98    | utility_scale.UtilityScaleBenchmarks.time_parse_hwb12('cx')                                                     |
|          | 219±0.5ms            | 216±1ms             | 0.98    | utility_scale.UtilityScaleBenchmarks.time_parse_hwb12('cz')                                                     |
|          | 219±0.5ms            | 215±2ms             | 0.98    | utility_scale.UtilityScaleBenchmarks.time_parse_hwb12('ecr')                                                    |
|          | 3.75±0.02ms          | 3.69±0.04ms         | 0.98    | utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100('ecr')                                                |
|          | 40.9±0.1ms           | 40.1±0.4ms          | 0.98    | utility_scale.UtilityScaleBenchmarks.time_parse_qft_n100('cx')                                                  |
|          | 40.8±0.1ms           | 40.1±0.5ms          | 0.98    | utility_scale.UtilityScaleBenchmarks.time_parse_qft_n100('cz')                                                  |
|          | 40.9±0.1ms           | 40.1±0.4ms          | 0.98    | utility_scale.UtilityScaleBenchmarks.time_parse_qft_n100('ecr')                                                 |
|          | 13.1±0.02ms          | 12.9±0.2ms          | 0.98    | utility_scale.UtilityScaleBenchmarks.time_parse_square_heisenberg_n100('cz')                                    |
|          | 130±0.4ms            | 128±0.3ms           | 0.98    | utility_scale.UtilityScaleBenchmarks.time_qaoa('ecr')                                                           |
|          | 317±0.2ms            | 309±0.5ms           | 0.98    | utility_scale.UtilityScaleBenchmarks.time_qv('cx')                                                              |
|          | 44.0±0.1ms           | 43.0±0.2ms          | 0.98    | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('cz')                                               |
|          | 43.4±0.3ms           | 42.5±0.2ms          | 0.98    | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('ecr')                                              |
|          | 3.07±0.03s           | 2.99±0.01s          | 0.97    | utility_scale.UtilityScaleBenchmarks.time_circSU2_89('ecr')                                                     |
|          | 137±0.3ms            | 134±0.4ms           | 0.97    | utility_scale.UtilityScaleBenchmarks.time_qaoa('cz')                                                            |
|          | 320±9μs              | 306±7μs             | 0.96    | quantum_info.PauliListBench.time_commutes_with_all(500, 500)                                                    |
|          | 109±0.4ms            | 104±0.3ms           | 0.96    | utility_scale.UtilityScaleBenchmarks.time_qaoa('cx')                                                            |
|          | 39.3±0.07ms          | 37.8±0.2ms          | 0.96    | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('cx')                                               |
|          | 3.19±0.1s            | 3.02±0.04s          | 0.95    | utility_scale.UtilityScaleBenchmarks.time_circSU2_89('cz')                                                      |
|          | 12.9±0s              | 12.2±0.01s          | 0.95    | utility_scale.UtilityScaleBenchmarks.time_hwb12('ecr')                                                          |
|          | 244±10μs             | 231±4μs             | 0.94    | quantum_info.PauliListBench.time_commutes_with_all(300, 500)                                                    |
|          | 211±0.5ms            | 198±0.7ms           | 0.94    | utility_scale.UtilityScaleBenchmarks.time_qft('ecr')                                                            |
|          | 14.0±0.01s           | 13.0±0.02s          | 0.93    | utility_scale.UtilityScaleBenchmarks.time_hwb12('cz')                                                           |
|          | 207±0.7ms            | 193±0.8ms           | 0.93    | utility_scale.UtilityScaleBenchmarks.time_qft('cz')                                                             |
|          | 175±0.2ms            | 159±0.5ms           | 0.91    | utility_scale.UtilityScaleBenchmarks.time_qft('cx')                                                             |

Benchmarks that have got worse:

| Change   | Before [0d63a936]    | After [a52d87fb]    |   Ratio | Benchmark (Parameter)                                   |
|----------|----------------------|---------------------|---------|---------------------------------------------------------|
| +        | 3.06±0.01ms          | 3.55±0.09ms         |    1.16 | utility_scale.UtilityScaleBenchmarks.time_bvlike('cz')  |
| +        | 3.06±0.01ms          | 3.52±0.07ms         |    1.15 | utility_scale.UtilityScaleBenchmarks.time_bvlike('cx')  |
| +        | 3.06±0.03ms          | 3.51±0.08ms         |    1.15 | utility_scale.UtilityScaleBenchmarks.time_bvlike('ecr') |

Building on top of Qiskit#15988 which removed the internal caching from the commutation checker. This commit parallelizes the commutation analysis pass so that the analysis per qubit is done in multiple threads and aggregated together in the end. The speed-up this enables is fairly modest because we're spending more of the time in the serial portion of the pass. But even so it's a simple code change that does speed the pass and by extension CommutativeCancellation. In general the pass could probably use a rearchitecting as I think a lot of the issues stem with how it's collecting data which seems overly specific to how the pass worked from Python. However, since there is a CommutativeOptimization pass that is designed to replace the CommutationAnalysis/CommutativeCancellation passes spending too much time on this is probably not worth it. The other change made to facilitate this is removing the scratch map from the CommutationChecker. Specifically this required mutable access to check if two gates commute but in a parallel context we won't be able to get mutable access. This scratch space isn't a huge speedup as it just saved an allocation when checking PPR commutations.

This commit moves away from using an IndexMap for the CommutationSet and NodeIndices types and replaces both with an outer Vec. They were both keyed on qubits which is a contiguous range of 0..N_u32. A vec is more natural for this at the cost of allocating and creating a vec large enough to store all the qubits even if there isn't an entry for each qubit. This will have two advantages the first for the parallel path in this PR this will result in a deterministic iteration order when we build the output dict for the python pass. The second is it should be even faster both for the serial and parallel.

mtreinish · 2026-05-06T16:54:54Z

Now that #15999 has merged I've rebased this on main and it should be ready now.

coveralls · 2026-05-06T17:29:58Z

Coverage Report for CI Build 25449046976

Coverage increased (+0.008%) to 87.629%

Details

Coverage increased (+0.008%) from the base build.
Patch coverage: 7 uncovered changes across 2 files (97 of 104 lines covered, 93.27%).
5 coverage regressions across 1 file.

Uncovered Changes

File	Changed	Covered	%
crates/transpiler/src/passes/commutation_analysis.rs	83	77	92.77%
crates/transpiler/src/commutation_checker.rs	17	16	94.12%

Coverage Regressions

5 previously-covered lines in 1 file lost coverage.

File	Lines Losing Coverage	Coverage
crates/qasm2/src/lex.rs	5	92.03%

Coverage Stats


Relevant Lines:	122084
Covered Lines:	106981
Line Coverage:	87.63%
Coverage Strength:	958300.94 hits per line

💛 - Coveralls

mtreinish added this to the 2.5.0 milestone Apr 13, 2026

mtreinish requested a review from a team as a code owner April 13, 2026 17:08

mtreinish added on hold Can not fix yet performance Rust This PR or issue is related to Rust code in the repository mod: transpiler Issues and PRs related to Transpiler labels Apr 13, 2026

alexanderivrii reviewed Apr 14, 2026

View reviewed changes

Comment thread crates/transpiler/src/passes/commutation_analysis.rs Outdated

ShellyGarion added this to Qiskit 2.5 Apr 15, 2026

github-project-automation Bot moved this to Ready in Qiskit 2.5 Apr 15, 2026

mtreinish assigned alexanderivrii May 2, 2026

mtreinish added 3 commits May 6, 2026 12:03

Fix rebase issues and run cargo fmt

63c57e1

mtreinish force-pushed the parallel-analysis-commute!! branch from a52d87f to 63c57e1 Compare May 6, 2026 16:53

mtreinish removed the on hold Can not fix yet label May 6, 2026

mtreinish requested a review from alexanderivrii May 7, 2026 01:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallelize the CommutationAnalysis pass#16014

Parallelize the CommutationAnalysis pass#16014
mtreinish wants to merge 3 commits intoQiskit:mainfrom
mtreinish:parallel-analysis-commute!!

mtreinish commented Apr 13, 2026 •

edited

Loading

Uh oh!

qiskit-bot commented Apr 13, 2026

Uh oh!

mtreinish commented Apr 13, 2026

Uh oh!

Uh oh!

mtreinish commented May 6, 2026

Uh oh!

coveralls commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

mtreinish commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

AI/LLM disclosure

Uh oh!

qiskit-bot commented Apr 13, 2026

Uh oh!

mtreinish commented Apr 13, 2026

Uh oh!

Uh oh!

mtreinish commented May 6, 2026

Uh oh!

coveralls commented May 6, 2026

Coverage Report for CI Build 25449046976

Coverage increased (+0.008%) to 87.629%

Details

Uncovered Changes

Coverage Regressions

Coverage Stats

💛 - Coveralls

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

mtreinish commented Apr 13, 2026 •

edited

Loading