Parallelize the CommutationAnalysis pass#16014
Open
mtreinish wants to merge 3 commits intoQiskit:mainfrom
Open
Parallelize the CommutationAnalysis pass#16014mtreinish wants to merge 3 commits intoQiskit:mainfrom
mtreinish wants to merge 3 commits intoQiskit:mainfrom
Conversation
Collaborator
|
One or more of the following people are relevant to this code:
|
Member
Author
|
I ran a quick asv benchmark and the results were better than I remembered. That being said the |
Building on top of Qiskit#15988 which removed the internal caching from the commutation checker. This commit parallelizes the commutation analysis pass so that the analysis per qubit is done in multiple threads and aggregated together in the end. The speed-up this enables is fairly modest because we're spending more of the time in the serial portion of the pass. But even so it's a simple code change that does speed the pass and by extension CommutativeCancellation. In general the pass could probably use a rearchitecting as I think a lot of the issues stem with how it's collecting data which seems overly specific to how the pass worked from Python. However, since there is a CommutativeOptimization pass that is designed to replace the CommutationAnalysis/CommutativeCancellation passes spending too much time on this is probably not worth it. The other change made to facilitate this is removing the scratch map from the CommutationChecker. Specifically this required mutable access to check if two gates commute but in a parallel context we won't be able to get mutable access. This scratch space isn't a huge speedup as it just saved an allocation when checking PPR commutations.
This commit moves away from using an IndexMap for the CommutationSet and NodeIndices types and replaces both with an outer Vec. They were both keyed on qubits which is a contiguous range of 0..N_u32. A vec is more natural for this at the cost of allocating and creating a vec large enough to store all the qubits even if there isn't an entry for each qubit. This will have two advantages the first for the parallel path in this PR this will result in a deterministic iteration order when we build the output dict for the python pass. The second is it should be even faster both for the serial and parallel.
a52d87f to
63c57e1
Compare
Member
Author
|
Now that #15999 has merged I've rebased this on main and it should be ready now. |
Coverage Report for CI Build 25449046976Coverage increased (+0.008%) to 87.629%Details
Uncovered Changes
Coverage Regressions5 previously-covered lines in 1 file lost coverage.
Coverage Stats
💛 - Coveralls |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Building on top of #15988 which removed the internal caching from the
commutation checker. This commit parallelizes the commutation analysis
pass so that the analysis per qubit is done in multiple threads and
aggregated together in the end. The speed-up this enables is fairly
modest because we're spending more of the time in the serial portion of
the pass. But even so it's a simple code change that does speed the pass
and by extension CommutativeCancellation (since the rust function that
is parallelized is called inside that pass too).
In general the pass could probably use a rearchitecting as I think a lot
of the issues stem with how it's collecting data which seems overly
specific to how the pass worked from Python. However, since there is a
CommutativeOptimization pass that is designed to replace the
CommutationAnalysis/CommutativeCancellation passes spending too much
time on this is probably not worth it.
This PR is based on top of #15999 and will need to be rebased after that PR merges. In the meantime you can view just the contents of this PR by looking at the HEAD commit: a52d87fThis has been rebased now.AI/LLM disclosure