This is just a placeholder to discuss the idea of implementing smaller kernels.
A lot of pointers already exist related to this:
One of the main points towards using smaller kernels is the need to allow each ixx/oxxx and each ffv function to handle pointers to large buffers for many events and to do the indexing themselves. This is discussed in #175 (comment) for instance. Presently instead only the ixx/oxx functions are able to find an event in the input array, but then their output (and all inputs/outputs of the ffv functions) refer for CUDA to a single event. This is the first thing that must be changed to allow smaller kernels.
This is just a placeholder to discuss the idea of implementing smaller kernels.
A lot of pointers already exist related to this:
One of the main points towards using smaller kernels is the need to allow each ixx/oxxx and each ffv function to handle pointers to large buffers for many events and to do the indexing themselves. This is discussed in #175 (comment) for instance. Presently instead only the ixx/oxx functions are able to find an event in the input array, but then their output (and all inputs/outputs of the ffv functions) refer for CUDA to a single event. This is the first thing that must be changed to allow smaller kernels.