Skip to content

Parallelize the CommutationAnalysis pass#16014

Open
mtreinish wants to merge 3 commits intoQiskit:mainfrom
mtreinish:parallel-analysis-commute!!
Open

Parallelize the CommutationAnalysis pass#16014
mtreinish wants to merge 3 commits intoQiskit:mainfrom
mtreinish:parallel-analysis-commute!!

Conversation

@mtreinish
Copy link
Copy Markdown
Member

@mtreinish mtreinish commented Apr 13, 2026

Building on top of #15988 which removed the internal caching from the
commutation checker. This commit parallelizes the commutation analysis
pass so that the analysis per qubit is done in multiple threads and
aggregated together in the end. The speed-up this enables is fairly
modest because we're spending more of the time in the serial portion of
the pass. But even so it's a simple code change that does speed the pass
and by extension CommutativeCancellation (since the rust function that
is parallelized is called inside that pass too).

In general the pass could probably use a rearchitecting as I think a lot
of the issues stem with how it's collecting data which seems overly
specific to how the pass worked from Python. However, since there is a
CommutativeOptimization pass that is designed to replace the
CommutationAnalysis/CommutativeCancellation passes spending too much
time on this is probably not worth it.

This PR is based on top of #15999 and will need to be rebased after that PR merges. In the meantime you can view just the contents of this PR by looking at the HEAD commit: a52d87f This has been rebased now.

AI/LLM disclosure

  • I didn't use LLM tooling, or only used it privately.
  • I used the following tool to help write this PR description:
  • I used the following tool to generate or modify code:

@mtreinish mtreinish added this to the 2.5.0 milestone Apr 13, 2026
@mtreinish mtreinish requested a review from a team as a code owner April 13, 2026 17:08
@mtreinish mtreinish added on hold Can not fix yet performance Rust This PR or issue is related to Rust code in the repository mod: transpiler Issues and PRs related to Transpiler labels Apr 13, 2026
@qiskit-bot
Copy link
Copy Markdown
Collaborator

One or more of the following people are relevant to this code:

  • @Qiskit/terra-core

@mtreinish
Copy link
Copy Markdown
Member Author

I ran a quick asv benchmark and the results were better than I remembered. That being said the time_bv_like regression is caused by the overhead from the parallel path for small circuits since that circuit is just 200 cx gates that all cancel out when you commute through the 2 single qubit gates in the circuit.

| Change   | Before [0d63a936]    | After [a52d87fb]    |   Ratio | Benchmark (Parameter)                                                            |
|----------|----------------------|---------------------|---------|----------------------------------------------------------------------------------|
| -        | 9.93±0s              | 8.79±0.01s          |    0.89 | utility_scale.UtilityScaleBenchmarks.time_hwb12('cx')                            |
| -        | 823±4μs              | 683±8μs             |    0.83 | passes.CommutativeAnalysisPassBenchmarks.time_commutative_cancellation(5, 1024)  |
| -        | 1.95±0.01ms          | 1.25±0.01ms         |    0.64 | passes.CommutativeAnalysisPassBenchmarks.time_commutative_cancellation(14, 1024) |
| -        | 2.76±0.01ms          | 1.65±0.01ms         |    0.6  | passes.CommutativeAnalysisPassBenchmarks.time_commutative_cancellation(20, 1024) |
| -        | 17.1±0.07ms          | 6.37±0.2ms          |    0.37 | passes.PassBenchmarks.time_commutation_analysis(5, 1024)                         |
| -        | 77.7±0.1ms           | 20.6±0.3ms          |    0.27 | passes.PassBenchmarks.time_commutation_analysis(20, 1024)                        |
| -        | 52.9±0.2ms           | 13.2±0.2ms          |    0.25 | passes.PassBenchmarks.time_commutation_analysis(14, 1024)                        |

Benchmarks that have stayed the same:

| Change   | Before [0d63a936]    | After [a52d87fb]    | Ratio   | Benchmark (Parameter)                                                                                           |
|----------|----------------------|---------------------|---------|-----------------------------------------------------------------------------------------------------------------|
|          | 0                    | 0                   | n/a     | utility_scale.UtilityScaleBenchmarks.track_bvlike_depth('cx')                                                   |
|          | 0                    | 0                   | n/a     | utility_scale.UtilityScaleBenchmarks.track_bvlike_depth('cz')                                                   |
|          | 0                    | 0                   | n/a     | utility_scale.UtilityScaleBenchmarks.track_bvlike_depth('ecr')                                                  |
|          | 3.97±0.01ms          | 4.22±0.09ms         | 1.06    | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm_backend_with_prop(2)                 |
|          | 4.14±0.01ms          | 4.37±0.07ms         | 1.05    | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm_backend_with_prop(3)                 |
|          | 5.44±0.02ms          | 5.66±0.08ms         | 1.04    | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm(3)                                   |
|          | 40.5±0.2ms           | 41.8±0.5ms          | 1.03    | quantum_info.PauliListBench.time_group_qubit_wise_commuting(400, 500)                                           |
|          | 5.23±0.01ms          | 5.39±0.01ms         | 1.03    | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm(2)                                   |
|          | 26.6±0.1ms           | 27.0±0.3ms          | 1.02    | quantum_info.PauliListBench.time_group_qubit_wise_commuting(100, 500)                                           |
|          | 28.1±0.4ms           | 28.6±0.7ms          | 1.02    | utility_scale.UtilityScaleBenchmarks.time_bv_100('ecr')                                                         |
|          | 175±1ms              | 176±0.6ms           | 1.01    | passes.MultipleBasisPassBenchmarks.time_optimize_1q_commutation(20, 1024, ['u', 'cx', 'id'])                    |
|          | 38.4±0.1ms           | 38.9±0.09ms         | 1.01    | passes.MultipleBasisPassBenchmarks.time_optimize_1q_commutation(5, 1024, ['rx', 'ry', 'rz', 'r', 'rxx', 'id'])  |
|          | 36.1±0.03ms          | 36.4±0.1ms          | 1.01    | passes.MultipleBasisPassBenchmarks.time_optimize_1q_commutation(5, 1024, ['u', 'cx', 'id'])                     |
|          | 3.22±0.01μs          | 3.26±0.04μs         | 1.01    | quantum_info.PauliBench.time_commutes(100)                                                                      |
|          | 3.35±0.01μs          | 3.37±0.04μs         | 1.01    | quantum_info.PauliBench.time_commutes(300)                                                                      |
|          | 3.38±0.01μs          | 3.41±0.03μs         | 1.01    | quantum_info.PauliBench.time_commutes(500)                                                                      |
|          | 47.8±0.3μs           | 48.3±0.5μs          | 1.01    | quantum_info.PauliListBench.time_commutes(200, 500)                                                             |
|          | 84.0±0.8μs           | 84.6±0.8μs          | 1.01    | quantum_info.PauliListBench.time_commutes(400, 500)                                                             |
|          | 155±6μs              | 156±6μs             | 1.01    | quantum_info.PauliListBench.time_commutes_with_all(100, 500)                                                    |
|          | 3.62±0.01ms          | 3.66±0.03ms         | 1.01    | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm(0)                                   |
|          | 5.63±0.02ms          | 5.68±0.02ms         | 1.01    | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm(1)                                   |
|          | 14.4±0.1ms           | 14.6±0.1ms          | 1.01    | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm_backend_with_prop(0)                 |
|          | 18.9±0.09ms          | 19.1±0.1ms          | 1.01    | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm_backend_with_prop(1)                 |
|          | 5.87±0.02ms          | 5.92±0.1ms          | 1.01    | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_qv_14_x_14(0)                                        |
|          | 11.6±0.1ms           | 11.8±0.3ms          | 1.01    | utility_scale.UtilityScaleBenchmarks.time_circSU2('cx')                                                         |
|          | 3.03±0.03s           | 3.07±0.07s          | 1.01    | utility_scale.UtilityScaleBenchmarks.time_circSU2_89('cx')                                                      |
|          | 126±0.6ms            | 125±0.8ms           | 1.00    | passes.MultipleBasisPassBenchmarks.time_optimize_1q_commutation(14, 1024, ['rz', 'x', 'sx', 'cx', 'id'])        |
|          | 109±0.5ms            | 109±0.3ms           | 1.00    | passes.MultipleBasisPassBenchmarks.time_optimize_1q_commutation(14, 1024, ['u', 'cx', 'id'])                    |
|          | 187±1ms              | 186±1ms             | 1.00    | passes.MultipleBasisPassBenchmarks.time_optimize_1q_commutation(20, 1024, ['rx', 'ry', 'rz', 'r', 'rxx', 'id']) |
|          | 199±0.8ms            | 199±0.7ms           | 1.00    | passes.MultipleBasisPassBenchmarks.time_optimize_1q_commutation(20, 1024, ['rz', 'x', 'sx', 'cx', 'id'])        |
|          | 42.9±0.2ms           | 43.1±0.1ms          | 1.00    | passes.MultipleBasisPassBenchmarks.time_optimize_1q_commutation(5, 1024, ['rz', 'x', 'sx', 'cx', 'id'])         |
|          | 3.35±0.01μs          | 3.36±0.02μs         | 1.00    | quantum_info.PauliBench.time_commutes(200)                                                                      |
|          | 3.40±0.02μs          | 3.41±0.02μs         | 1.00    | quantum_info.PauliBench.time_commutes(400)                                                                      |
|          | 30.6±0.1μs           | 30.5±0.2μs          | 1.00    | quantum_info.PauliListBench.time_commutes(100, 500)                                                             |
|          | 45.9±0.3ms           | 45.8±0.3ms          | 1.00    | quantum_info.PauliListBench.time_group_qubit_wise_commuting(500, 500)                                           |
|          | 43.9±0.2μs           | 44.0±0.06μs         | 1.00    | quantum_info.PauliListQargsBench.time_commutes_with_qargs(100, 500)                                             |
|          | 81.3±0.3μs           | 81.1±0.4μs          | 1.00    | quantum_info.PauliListQargsBench.time_commutes_with_qargs(200, 500)                                             |
|          | 121±0.5μs            | 122±0.9μs           | 1.00    | quantum_info.PauliListQargsBench.time_commutes_with_qargs(300, 500)                                             |
|          | 161±1μs              | 161±0.6μs           | 1.00    | quantum_info.PauliListQargsBench.time_commutes_with_qargs(400, 500)                                             |
|          | 202±0.7μs            | 203±0.4μs           | 1.00    | quantum_info.PauliListQargsBench.time_commutes_with_qargs(500, 500)                                             |
|          | 34.3±0.2ms           | 34.4±0.3ms          | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.time_schedule_qv_14_x_14(0)                                         |
|          | 28.8±0.1ms           | 28.6±0.3ms          | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.time_schedule_qv_14_x_14(1)                                         |
|          | 16.3±0.1ms           | 16.2±0.3ms          | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_qv_14_x_14(2)                                        |
|          | 1429                 | 1429                | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_quantum_volume_transpile_50_x_20(0)                     |
|          | 1323                 | 1323                | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_quantum_volume_transpile_50_x_20(1)                     |
|          | 1251                 | 1251                | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_quantum_volume_transpile_50_x_20(2)                     |
|          | 1331                 | 1331                | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_quantum_volume_transpile_50_x_20(3)                     |
|          | 2705                 | 2705                | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm(0)                            |
|          | 2005                 | 2005                | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm(1)                            |
|          | 7                    | 7                   | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm(2)                            |
|          | 7                    | 7                   | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm(3)                            |
|          | 11117                | 11117               | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm_backend_with_prop(0)          |
|          | 5015                 | 5015                | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm_backend_with_prop(1)          |
|          | 16                   | 16                  | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm_backend_with_prop(2)          |
|          | 16                   | 16                  | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm_backend_with_prop(3)          |
|          | 1035                 | 1035                | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_qv_14_x_14(0)                                 |
|          | 767                  | 767                 | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_qv_14_x_14(1)                                 |
|          | 577                  | 577                 | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_qv_14_x_14(2)                                 |
|          | 641                  | 641                 | 1.00    | transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_qv_14_x_14(3)                                 |
|          | 13.6±0.2ms           | 13.5±0.2ms          | 1.00    | utility_scale.UtilityScaleBenchmarks.time_circSU2('ecr')                                                        |
|          | 339±0.5ms            | 337±0.8ms           | 1.00    | utility_scale.UtilityScaleBenchmarks.time_qv('cz')                                                              |
|          | 337±0.6ms            | 337±0.6ms           | 1.00    | utility_scale.UtilityScaleBenchmarks.time_qv('ecr')                                                             |
|          | 665                  | 665                 | 1.00    | utility_scale.UtilityScaleBenchmarks.track_bv_100_depth('cx')                                                   |
|          | 665                  | 665                 | 1.00    | utility_scale.UtilityScaleBenchmarks.track_bv_100_depth('cz')                                                   |
|          | 665                  | 665                 | 1.00    | utility_scale.UtilityScaleBenchmarks.track_bv_100_depth('ecr')                                                  |
|          | 1423                 | 1423                | 1.00    | utility_scale.UtilityScaleBenchmarks.track_circSU2_89_depth('cx')                                               |
|          | 1423                 | 1423                | 1.00    | utility_scale.UtilityScaleBenchmarks.track_circSU2_89_depth('cz')                                               |
|          | 1423                 | 1423                | 1.00    | utility_scale.UtilityScaleBenchmarks.track_circSU2_89_depth('ecr')                                              |
|          | 300                  | 300                 | 1.00    | utility_scale.UtilityScaleBenchmarks.track_circSU2_depth('cx')                                                  |
|          | 300                  | 300                 | 1.00    | utility_scale.UtilityScaleBenchmarks.track_circSU2_depth('cz')                                                  |
|          | 300                  | 300                 | 1.00    | utility_scale.UtilityScaleBenchmarks.track_circSU2_depth('ecr')                                                 |
|          | 382321               | 382321              | 1.00    | utility_scale.UtilityScaleBenchmarks.track_hwb12_depth('cx')                                                    |
|          | 383978               | 383978              | 1.00    | utility_scale.UtilityScaleBenchmarks.track_hwb12_depth('cz')                                                    |
|          | 383192               | 383192              | 1.00    | utility_scale.UtilityScaleBenchmarks.track_hwb12_depth('ecr')                                                   |
|          | 1617                 | 1617                | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qaoa_depth('cx')                                                     |
|          | 1622                 | 1622                | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qaoa_depth('cz')                                                     |
|          | 1622                 | 1622                | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qaoa_depth('ecr')                                                    |
|          | 1801                 | 1801                | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qft_depth('cx')                                                      |
|          | 1815                 | 1815                | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qft_depth('cz')                                                      |
|          | 1815                 | 1815                | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qft_depth('ecr')                                                     |
|          | 2628                 | 2628                | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qv_depth('cx')                                                       |
|          | 2628                 | 2628                | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qv_depth('cz')                                                       |
|          | 2628                 | 2628                | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qv_depth('ecr')                                                      |
|          | 366                  | 366                 | 1.00    | utility_scale.UtilityScaleBenchmarks.track_square_heisenberg_depth('cx')                                        |
|          | 366                  | 366                 | 1.00    | utility_scale.UtilityScaleBenchmarks.track_square_heisenberg_depth('cz')                                        |
|          | 366                  | 366                 | 1.00    | utility_scale.UtilityScaleBenchmarks.track_square_heisenberg_depth('ecr')                                       |
|          | 116±0.5ms            | 116±0.5ms           | 0.99    | passes.MultipleBasisPassBenchmarks.time_optimize_1q_commutation(14, 1024, ['rx', 'ry', 'rz', 'r', 'rxx', 'id']) |
|          | 67.6±0.6μs           | 66.8±0.6μs          | 0.99    | quantum_info.PauliListBench.time_commutes(300, 500)                                                             |
|          | 105±1μs              | 104±0.4μs           | 0.99    | quantum_info.PauliListBench.time_commutes(500, 500)                                                             |
|          | 205±9μs              | 203±10μs            | 0.99    | quantum_info.PauliListBench.time_commutes_with_all(200, 500)                                                    |
|          | 277±20μs             | 275±8μs             | 0.99    | quantum_info.PauliListBench.time_commutes_with_all(400, 500)                                                    |
|          | 32.6±0.2ms           | 32.3±0.4ms          | 0.99    | quantum_info.PauliListBench.time_group_qubit_wise_commuting(200, 500)                                           |
|          | 37.5±0.2ms           | 37.2±0.09ms         | 0.99    | quantum_info.PauliListBench.time_group_qubit_wise_commuting(300, 500)                                           |
|          | 28.6±0.2ms           | 28.4±0.1ms          | 0.99    | transpiler_levels.TranspilerLevelBenchmarks.time_quantum_volume_transpile_50_x_20(1)                            |
|          | 8.61±0.08ms          | 8.53±0.06ms         | 0.99    | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_qv_14_x_14(1)                                        |
|          | 31.8±0.4ms           | 31.6±0.3ms          | 0.99    | utility_scale.UtilityScaleBenchmarks.time_bv_100('cx')                                                          |
|          | 28.5±0.5ms           | 28.3±0.3ms          | 0.99    | utility_scale.UtilityScaleBenchmarks.time_bv_100('cz')                                                          |
|          | 13.3±0.1ms           | 13.1±0.09ms         | 0.99    | utility_scale.UtilityScaleBenchmarks.time_circSU2('cz')                                                         |
|          | 3.76±0.01ms          | 3.72±0.04ms         | 0.99    | utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100('cx')                                                 |
|          | 3.75±0.01ms          | 3.70±0.04ms         | 0.99    | utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100('cz')                                                 |
|          | 13.1±0.03ms          | 13.0±0.2ms          | 0.99    | utility_scale.UtilityScaleBenchmarks.time_parse_square_heisenberg_n100('cx')                                    |
|          | 13.1±0.04ms          | 13.0±0.09ms         | 0.99    | utility_scale.UtilityScaleBenchmarks.time_parse_square_heisenberg_n100('ecr')                                   |
|          | 12.7±0.3ms           | 12.4±0.2ms          | 0.98    | transpiler_levels.TranspilerLevelBenchmarks.time_quantum_volume_transpile_50_x_20(0)                            |
|          | 105±0.6ms            | 103±0.07ms          | 0.98    | transpiler_levels.TranspilerLevelBenchmarks.time_quantum_volume_transpile_50_x_20(2)                            |
|          | 123±0.09ms           | 121±0.5ms           | 0.98    | transpiler_levels.TranspilerLevelBenchmarks.time_quantum_volume_transpile_50_x_20(3)                            |
|          | 18.1±0.1ms           | 17.8±0.2ms          | 0.98    | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_qv_14_x_14(3)                                        |
|          | 219±0.9ms            | 216±1ms             | 0.98    | utility_scale.UtilityScaleBenchmarks.time_parse_hwb12('cx')                                                     |
|          | 219±0.5ms            | 216±1ms             | 0.98    | utility_scale.UtilityScaleBenchmarks.time_parse_hwb12('cz')                                                     |
|          | 219±0.5ms            | 215±2ms             | 0.98    | utility_scale.UtilityScaleBenchmarks.time_parse_hwb12('ecr')                                                    |
|          | 3.75±0.02ms          | 3.69±0.04ms         | 0.98    | utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100('ecr')                                                |
|          | 40.9±0.1ms           | 40.1±0.4ms          | 0.98    | utility_scale.UtilityScaleBenchmarks.time_parse_qft_n100('cx')                                                  |
|          | 40.8±0.1ms           | 40.1±0.5ms          | 0.98    | utility_scale.UtilityScaleBenchmarks.time_parse_qft_n100('cz')                                                  |
|          | 40.9±0.1ms           | 40.1±0.4ms          | 0.98    | utility_scale.UtilityScaleBenchmarks.time_parse_qft_n100('ecr')                                                 |
|          | 13.1±0.02ms          | 12.9±0.2ms          | 0.98    | utility_scale.UtilityScaleBenchmarks.time_parse_square_heisenberg_n100('cz')                                    |
|          | 130±0.4ms            | 128±0.3ms           | 0.98    | utility_scale.UtilityScaleBenchmarks.time_qaoa('ecr')                                                           |
|          | 317±0.2ms            | 309±0.5ms           | 0.98    | utility_scale.UtilityScaleBenchmarks.time_qv('cx')                                                              |
|          | 44.0±0.1ms           | 43.0±0.2ms          | 0.98    | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('cz')                                               |
|          | 43.4±0.3ms           | 42.5±0.2ms          | 0.98    | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('ecr')                                              |
|          | 3.07±0.03s           | 2.99±0.01s          | 0.97    | utility_scale.UtilityScaleBenchmarks.time_circSU2_89('ecr')                                                     |
|          | 137±0.3ms            | 134±0.4ms           | 0.97    | utility_scale.UtilityScaleBenchmarks.time_qaoa('cz')                                                            |
|          | 320±9μs              | 306±7μs             | 0.96    | quantum_info.PauliListBench.time_commutes_with_all(500, 500)                                                    |
|          | 109±0.4ms            | 104±0.3ms           | 0.96    | utility_scale.UtilityScaleBenchmarks.time_qaoa('cx')                                                            |
|          | 39.3±0.07ms          | 37.8±0.2ms          | 0.96    | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('cx')                                               |
|          | 3.19±0.1s            | 3.02±0.04s          | 0.95    | utility_scale.UtilityScaleBenchmarks.time_circSU2_89('cz')                                                      |
|          | 12.9±0s              | 12.2±0.01s          | 0.95    | utility_scale.UtilityScaleBenchmarks.time_hwb12('ecr')                                                          |
|          | 244±10μs             | 231±4μs             | 0.94    | quantum_info.PauliListBench.time_commutes_with_all(300, 500)                                                    |
|          | 211±0.5ms            | 198±0.7ms           | 0.94    | utility_scale.UtilityScaleBenchmarks.time_qft('ecr')                                                            |
|          | 14.0±0.01s           | 13.0±0.02s          | 0.93    | utility_scale.UtilityScaleBenchmarks.time_hwb12('cz')                                                           |
|          | 207±0.7ms            | 193±0.8ms           | 0.93    | utility_scale.UtilityScaleBenchmarks.time_qft('cz')                                                             |
|          | 175±0.2ms            | 159±0.5ms           | 0.91    | utility_scale.UtilityScaleBenchmarks.time_qft('cx')                                                             |

Benchmarks that have got worse:

| Change   | Before [0d63a936]    | After [a52d87fb]    |   Ratio | Benchmark (Parameter)                                   |
|----------|----------------------|---------------------|---------|---------------------------------------------------------|
| +        | 3.06±0.01ms          | 3.55±0.09ms         |    1.16 | utility_scale.UtilityScaleBenchmarks.time_bvlike('cz')  |
| +        | 3.06±0.01ms          | 3.52±0.07ms         |    1.15 | utility_scale.UtilityScaleBenchmarks.time_bvlike('cx')  |
| +        | 3.06±0.03ms          | 3.51±0.08ms         |    1.15 | utility_scale.UtilityScaleBenchmarks.time_bvlike('ecr') |

Comment thread crates/transpiler/src/passes/commutation_analysis.rs Outdated
mtreinish added 3 commits May 6, 2026 12:03
Building on top of Qiskit#15988 which removed the internal caching from the
commutation checker. This commit parallelizes the commutation analysis
pass so that the analysis per qubit is done in multiple threads and
aggregated together in the end. The speed-up this enables is fairly
modest because we're spending more of the time in the serial portion of
the pass. But even so it's a simple code change that does speed the pass
and by extension CommutativeCancellation.

In general the pass could probably use a rearchitecting as I think a lot
of the issues stem with how it's collecting data which seems overly
specific to how the pass worked from Python. However, since there is a
CommutativeOptimization pass that is designed to replace the
CommutationAnalysis/CommutativeCancellation passes spending too much
time on this is probably not worth it.

The other change made to facilitate this is removing the scratch map
from the CommutationChecker. Specifically this required mutable access
to check if two gates commute but in a parallel context we won't be able
to get mutable access. This scratch space isn't a huge speedup as it
just saved an allocation when checking PPR commutations.
This commit moves away from using an IndexMap for the CommutationSet and
NodeIndices types and replaces both with an outer Vec. They were both
keyed on qubits which is a contiguous range of 0..N_u32. A vec is more
natural for this at the cost of allocating and creating a vec large enough
to store all the qubits even if there isn't an entry for each qubit.

This will have two advantages the first for the parallel path in this PR
this will result in a deterministic iteration order when we build the
output dict for the python pass. The second is it should be even faster
both for the serial and parallel.
@mtreinish mtreinish force-pushed the parallel-analysis-commute!! branch from a52d87f to 63c57e1 Compare May 6, 2026 16:53
@mtreinish mtreinish removed the on hold Can not fix yet label May 6, 2026
@mtreinish
Copy link
Copy Markdown
Member Author

Now that #15999 has merged I've rebased this on main and it should be ready now.

@coveralls
Copy link
Copy Markdown

Coverage Report for CI Build 25449046976

Coverage increased (+0.008%) to 87.629%

Details

  • Coverage increased (+0.008%) from the base build.
  • Patch coverage: 7 uncovered changes across 2 files (97 of 104 lines covered, 93.27%).
  • 5 coverage regressions across 1 file.

Uncovered Changes

File Changed Covered %
crates/transpiler/src/passes/commutation_analysis.rs 83 77 92.77%
crates/transpiler/src/commutation_checker.rs 17 16 94.12%

Coverage Regressions

5 previously-covered lines in 1 file lost coverage.

File Lines Losing Coverage Coverage
crates/qasm2/src/lex.rs 5 92.03%

Coverage Stats

Coverage Status
Relevant Lines: 122084
Covered Lines: 106981
Line Coverage: 87.63%
Coverage Strength: 958300.94 hits per line

💛 - Coveralls

@mtreinish mtreinish requested a review from alexanderivrii May 7, 2026 01:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

mod: transpiler Issues and PRs related to Transpiler performance Rust This PR or issue is related to Rust code in the repository

Projects

Status: Ready

Development

Successfully merging this pull request may close these issues.

5 participants