Hello,
I am trying to carry out a CUDA application that implements matrix multiplication of 2 of 1024 * 1024 matrices. However, the process is killed because of my RAM fullness. I have searched memory leakage problems of the simulator, and I observed that there are lots of memory leakage points as mentioned below:
==30102== LEAK SUMMARY:
==30102== definitely lost: 18,642 bytes in 2,187 blocks
==30102== indirectly lost: 59,510 bytes in 1,188 blocks
==30102== possibly lost: 288 bytes in 1 blocks
==30102== still reachable: 345,513,712 bytes in 3,556,543 blocks ( I think those are the uninitialized objects)
==30102== of which reachable via heuristic:
==30102== newarray : 11,416 bytes in 47 blocks
==30102== suppressed: 0 bytes in 0 blocks
Furthermore, Valgrind stops searching for possible mem leakages after it reaches some threshold (It displays that More than 10000000 total errors detected. I'm not reporting any more.). I also tried one of the tested configurations (implemented with turning architecture) to see if the problem related to the GPU architecture I configured. Still, I got similar memory leakage results.
I am attaching both configurations and leakage outputs obtained via Valgrind. Specified flags for Valgrind is included in the output files.
The definite leakage is related to the destructor of __libc_csu_init function. Also, there are cuda_runtime_api.cc linked memory leakage points. I hope you can fix it.
turing_config.zip
ampere_config.zip
Hello,
I am trying to carry out a CUDA application that implements matrix multiplication of 2 of 1024 * 1024 matrices. However, the process is killed because of my RAM fullness. I have searched memory leakage problems of the simulator, and I observed that there are lots of memory leakage points as mentioned below:
==30102== LEAK SUMMARY:
==30102== definitely lost: 18,642 bytes in 2,187 blocks
==30102== indirectly lost: 59,510 bytes in 1,188 blocks
==30102== possibly lost: 288 bytes in 1 blocks
==30102== still reachable: 345,513,712 bytes in 3,556,543 blocks ( I think those are the uninitialized objects)
==30102== of which reachable via heuristic:
==30102== newarray : 11,416 bytes in 47 blocks
==30102== suppressed: 0 bytes in 0 blocks
Furthermore, Valgrind stops searching for possible mem leakages after it reaches some threshold (It displays that More than 10000000 total errors detected. I'm not reporting any more.). I also tried one of the tested configurations (implemented with turning architecture) to see if the problem related to the GPU architecture I configured. Still, I got similar memory leakage results.
I am attaching both configurations and leakage outputs obtained via Valgrind. Specified flags for Valgrind is included in the output files.
The definite leakage is related to the destructor of __libc_csu_init function. Also, there are cuda_runtime_api.cc linked memory leakage points. I hope you can fix it.
turing_config.zip
ampere_config.zip