Improve implementation of MPO-MPO multiplication

The current MPO * MPO implementation performs site-wise multiplication followed by a global changebonds truncation. This approach has the issue that it pushes all the norm to one tensor at the end. Alternatively, one may perform a sequential QR in a left-right pass while multiplying the MPOs to maintain canonical form and then truncate the bonds on the right-left pass.