Done (SR) eemumu_AV/cudagraph Use 2 streams to hide the memory copy of the output back to the host. We move other stuff first, then generate more complex example and see if 2 streams are enough or not
Done (SR)
eemumu_AV/cudagraph
Use 2 streams to hide the memory copy of the output back to the host.
We move other stuff first, then generate more complex example and see if 2 streams are enough or not