NCC compiler and/or VE have major performance problems with lambdas.
Please consider this code: https://gist.github.com/raver119/f9aca08f8c3895b840a96d36b37e6bd5
On x86 gcc 7.4 output is:
Time omp: [215 us]; Time lambda: [222 us]
On Aurora output is:
export OMP_NUM_THREADS=8
...
Time omp: [61 us]; Time lambda: [904 us]
export OMP_NUM_THREADS=1
...
Time omp: [202 us]; Time lambda: [4131 us]
NCC compiler and/or VE have major performance problems with lambdas.
Please consider this code: https://gist.github.com/raver119/f9aca08f8c3895b840a96d36b37e6bd5
On x86 gcc 7.4 output is:
On Aurora output is: