Run Nsight on the code base #334
Replies: 14 comments
-
|
Suggested runs (@AdelekeBankole) :
|
Beta Was this translation helpful? Give feedback.
-
|
To help with the initial scaling study I made some modifications to the It should allow us to create a basic plots of 'problem size' vs 'runtime'. I am currently trying to run one on CSD3, CPU and a single core for range of |
Beta Was this translation helpful? Give feedback.
-
|
The question we want to answer is: Where are the bottlenecks, where does the code spend time? |
Beta Was this translation helpful? Give feedback.
-
|
Another question to answer is: How much time is spent transferring between GPU and CPU? |
Beta Was this translation helpful? Give feedback.
-
|
Turns out that we have been running NSight on CSD3 a bit wrong. The workflow that was working at the beginning of August stopped working recently. Turns out that the reason for that was CUDA 13 release on the 5th of August! My understanding of the problem and our solution is the following (but I cannot guarantee that it is 100% correct): By default, The fix to the problem is to make
Apparently in this configuration |
Beta Was this translation helpful? Give feedback.
-
|
The raw Nsyst results are now on GoogleDrive in Please note that the names of the kernels are long (i.e. file does not open well in LibreCalc due too many characters in the cell!). A short summary is:
Not much. In the most involved case of the run with IO, the cumulative time for memory transfers is:
|
Beta Was this translation helpful? Give feedback.
-
|
Since you're using the non-hydrostatic model its here: |
Beta Was this translation helpful? Give feedback.
-
|
Note from the meeting today. @johnryantaylor suggested to switch off part of the physics to see if/how it will influence performance of the
I hope I did not misinterpret anything 🤞 |
Beta Was this translation helpful? Give feedback.
-
|
Also the next steps for investigating the kernels are going to be to turn them under To learn the tool we will probably start with the |
Beta Was this translation helpful? Give feedback.
-
|
The light integration doesn't occur in the compute_Gc kernel but during the update_state step |
Beta Was this translation helpful? Give feedback.
-
|
Thanks @jagoosw, Mikolaj found that the GPU utilization is pretty low during the compute_Gc kernel. I thought that it might be useful to progressively simplify the LOBSTER model to see if certain functions aren't being distributed very evenly across the GPU. Can you think of good places to look? |
Beta Was this translation helpful? Give feedback.
-
|
Maybe we can close this issue since we have a workflow for running the NSight on OceanBioME on CSD3. Kernels that we should target for optimisation now have separate issues. |
Beta Was this translation helpful? Give feedback.
-
|
Perhaps you could write up how todo it and convert this roll a discussion? |
Beta Was this translation helpful? Give feedback.
-
|
I have wrote up the instruction on running the NSight profilers on CSD3: csd3_nsight_v2.tar.gz |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Profile the code base, using the example runs from #259, with Nsight.
Tasks
512x512x256(256x256x64) gridBeta Was this translation helpful? Give feedback.
All reactions