Advanced Topics in cuPyNumeric (profiling & debugging) #1242

NathanGraddon · 2025-11-28T19:15:18Z

Integration of Part 4: Advanced Topics in cuPyNumeric (profiling & debugging) rst file.

…eviewed on github for ease

…nto a single file that displays on github

Updated references and formatting in profiling_debugging.rst for clarity and consistency.

Corrected numbering and formatting in profiling debugging documentation.

Updated profiler output images and descriptions for both inefficient and efficient CPU, utility, I/O, system, channel, GPU, and framebuffer results.

Updated formatting for section titles and removed example text.

copy-pr-bot · 2025-11-28T19:15:22Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

manopapad · 2025-11-29T05:03:38Z

@ipdemes @shriram-jagan @Jacobfaib could you please take a look over this tutorial that Nathan (Sunita Chandrasekaran's student from University of Delaware) wrote for us? Also @lightsighter FYI

ipdemes · 2025-12-03T21:04:03Z

docs/cupynumeric/source/user/profiling_debugging.rst

+
+**3.) After a run completes, in the directory you ran the command you’ll see:**
+
+- A folder: ``legate_prof/``, a self-contained HTML report


I don't think this folder is generated with the recent legate/legate_prof

Yes, you're right, I checked my latest runs. That section is now updated. Thank you.

…script. Removed unnecessary line breaks and adjusted formatting for clarity.

Updated usage examples to include the --provenance flag for diagnostic commands.

dongb · 2026-01-20T23:34:46Z

@manopapad , what's the status of the review?

shriram-jagan

This looks really good, please address some of the comments I left.

shriram-jagan · 2026-01-29T19:42:03Z

docs/cupynumeric/source/user/profiling_debugging.rst

+see:
+
+- Setting up your environment and running  `cuPyNumeric <https://docs.nvidia.com/cupynumeric/latest/user/tutorial.html>`_
+- Extending cuPyNumeric with `Legate Task <https://docs.nvidia.com/cupynumeric/25.10/user/task.html>`_


https://docs.nvidia.com/cupynumeric/latest/user/task.html instead of pointing to 25.10

Great point, fixed!

shriram-jagan · 2026-01-29T20:29:19Z

docs/cupynumeric/source/user/profiling_debugging.rst

+multi-node clusters. Previous sections covered how to get code running; here
+the focus shifts to making workloads production-ready. At scale, success is
+not just about adding GPUs or nodes, it requires ensuring that applications
+remain efficient, stable, and resilient under load. That means finding


not sure what "stable, and resilient under load" means in this context. I'd probably leave out this sentence that defines what success is at scale and instead continue to the next sentence which is more specific and relatable.

I agree, I can see how that would be ambiguous, its been removed.

shriram-jagan · 2026-01-29T20:31:18Z

docs/cupynumeric/source/user/profiling_debugging.rst

+
+   * - **What you'll gain:** By combining profiling tools with solid
+       OOM-handling strategies, you can significantly improve the
+       efficiency, scalability, and reliability of cuPyNumeric


by reliability, do you mean that the library doesn't fail on different processor variants or architectures? (you don't have to update the doc with the definition, maybe just tell me what it is in a comment)

Yeah its definitely a broad statement, what I meant by "reliability" here is execution stability. Applying profiling and OOM-handling practices makes cuPyNumeric runs less likely to fail (OOM/job crashes) and reduces stalling/underutilization from memory pressure and tiny tasks, especially at scale. Which would in turn make it a more reliable.

Although, it doesn't exactly mean the program is reliable in a way such that it would still always work with different architectures, as that would require more context such as specific runtime and hardware environment which would be outside the scope of the profiler and OOM sections to an extent.

Please feel free to let me know if you think this part should be altered for clarity or removed, I'd be more than happy to change it!

Applying profiling and OOM-handling practices makes cuPyNumeric runs less likely to fail (OOM/job crashes) and reduces stalling/underutilization from memory pressure and tiny tasks, especially at scale. Which would in turn make it a more reliable.

I like this part and I understand now what you are trying to convey. Profiling gives you an understanding of how your application is performing at scale. In particular, it helps you understand different metrics -- memory pressure and tiny tasks, like you mentioned, are a couple of them. Can you rephrase the original sentence and make it more specific instead of saying "reliability". Somehow I don't feel comfortable using the word reliability in the context of profiling when we have asynchronous runtime underneath.

Yes, I have this now "What you'll gain: By combining profiling with practical OOM-handling strategies, you can improve efficiency and scaling by identifying memory pressure and over-granular execution, while reducing OOM crashes and runtime stalls across CPUs, GPUs, and multi-node systems.".

shriram-jagan · 2026-01-29T20:35:55Z

docs/cupynumeric/source/user/profiling_debugging.rst

+
+**For more detail, see the official references:**
+
+-  `Usage — NVIDIA legate <https://docs.nvidia.com/legate/24.11/usage.html>`_


https://docs.nvidia.com/legate/latest/manual/usage/index.html

here and elsewhere in this page, please link to "latest" instead of linking to a specific version

Done, all sections should be fixed!

shriram-jagan · 2026-01-29T20:38:19Z

docs/cupynumeric/source/user/profiling_debugging.rst

+
+   # Multi-GPU/Multi-Node: multiple ranks (pass them all: e.g: N0, N1, N2, etc)
+   legate_prof view /path/to/legate_*.prof
+


should we leave a link to the legate profiler stanford page?

I think that could be a good idea, is this what you were referring to?: https://legion.stanford.edu/profiling/index.html

Hey Bo or Manolis, what do you think? If yes, I can go ahead and add it in

@manopapad @dongb

shriram-jagan · 2026-01-29T20:46:36Z

docs/cupynumeric/source/user/profiling_debugging.rst

+in many tiny tasks; runtime overhead dominates useful computation.
+
+Profiler Output and Interpretation - Inefficient CPU Results
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


I guess it might be easy to interpret if you tell the user how the data is presented in the profiler -- profiler's x-axis is time and y-axis is utilization and each panel in the profile is some kind of resource, so you essentially resource utilization in each panel (e.g., mem, processor utilization (cpu/gpu/omp), etc).

Good catch, I added a general interpretation under each profiling section, for inefficient (2x) and efficient (2x). Under the first image in each part:

"Interpretation: The profiler is presented as a timeline. The x-axis is time, the y-axis is organized by resource/utilization lanes. each horizontal lane represents a particular resource stream (CPU workers, GPU Device/Host, runtime/Utility threads, memory pools like Framebuffer/Zerocopy, and copy/Channel). Colored boxes show work on that resource; the box width is how long it ran, gaps indicate idle/waiting, and dense “barcode” slivers usually mean many tiny tasks (high overhead), while long solid blocks indicate fewer, larger tasks (better utilization)."

"each horizontal lane" -> "Each horizontal lane". Looks good.

fixed, thanks!

shriram-jagan · 2026-01-29T21:02:38Z

docs/cupynumeric/source/user/profiling_debugging.rst

+production-ready code. Profiling turns performance tuning from guesswork into
+an intentional, data-driven process that elevates code quality from functional
+to excellent.
+


can you also mention how you can "trace back" dependencies by say, looking at a task in a panel (GPU utilization) and finding its task ID, and then searching for that ID and looking at other panels (say utility to see when it got mapped, or channel to see if there were any data movement, etc)? this is how we can interactively find what operations were associated with a task.

note that the profiler allows you to search by other keys as well, not just ID.

Absolutely, its been added in the wrap up section.

Updated links to the latest NVIDIA Legate documentation.

Added detailed interpretation of profiler timelines for CPU and GPU resources.

Added details about the traceable view feature in profiling, explaining how to use task identifiers to connect performance symptoms to runtime activities.

shriram-jagan · 2026-01-30T18:37:45Z

@NathanGraddon, left a few nits. looks good on my end.

Enhanced the explanation of benefits from profiling tools and OOM-handling strategies, emphasizing memory pressure identification and execution granularity.

NathanGraddon and others added 24 commits November 26, 2025 11:41

Place holder file for content

f824154

All images from tutorial added

a0acf2d

rst outline + text

37e7647

moved into cupunumeric\user

cadef49

removed from wrong file location

dd8ae86

filled rst file will accurate content

5005d8b

Fix header formatting in profiling_debugging.rst

903a19f

figure --> image

2eb05ad

Fix formatting of OOM Error Message section

82013cf

Fix numbering format in profiling_debugging.rst

569c7ed

split into 3 sections, original file is a wrapper now so it can be pr…

fcce93d

…eviewed on github for ease

test

7933267

Update profiling and debugging documentation

1d67d8d

removed unneeded additional files, was able to condense all content i…

4425a52

…nto a single file that displays on github

Update formatting in profiling_debugging.rst

a29388c

Revise profiling and debugging documentation

f98b966

Updated references and formatting in profiling_debugging.rst for clarity and consistency.

Fix numbering and formatting in profiling_debugging.rst

7bb8fca

Corrected numbering and formatting in profiling debugging documentation.

Update formatting and emphasis in profiling documentation

bfe0b18

Revise profiling_debugging documentation with new images

1edbcd1

Updated profiler output images and descriptions for both inefficient and efficient CPU, utility, I/O, system, channel, GPU, and framebuffer results.

Fix image paths in profiling_debugging.rst

d80db45

better quality images

2e55243

Format section titles and clean up examples

d1135c6

Updated formatting for section titles and removed example text.

Update profiling_debugging.rst with formatting fixes

7a6bc2f

Refactor profiling_debugging.rst for clarity and format

e28e39e

ipdemes reviewed Dec 3, 2025

View reviewed changes

NathanGraddon added 3 commits December 3, 2025 18:13

Removed outdated generated profiler viewer folder that contains html …

e2c36d1

…script. Removed unnecessary line breaks and adjusted formatting for clarity.

Merge branch 'nv-legate:main' into profiling_debugging

d649fe9

Add --provenance flag to diagnostic command examples

b83353b

Updated usage examples to include the --provenance flag for diagnostic commands.

Fix capitalization of 'Legion' in documentation

1e9f398

mag1cp1n pushed a commit to mag1cp1n/cupynumeric that referenced this pull request Dec 17, 2025

Add missing header for CPU builds (nv-legate#1242)

6ea03bb

NathanGraddon added 6 commits December 20, 2025 15:33

Merge branch 'nv-legate:main' into profiling_debugging

477a326

Merge branch 'nv-legate:main' into profiling_debugging

36df86e

Merge branch 'nv-legate:main' into profiling_debugging

8174ff9

Merge branch 'nv-legate:main' into profiling_debugging

fa6cfe0

Merge branch 'nv-legate:main' into profiling_debugging

acca8cf

Merge branch 'nv-legate:main' into profiling_debugging

6a7057d

NathanGraddon added 3 commits January 22, 2026 19:28

Merge branch 'nv-legate:main' into profiling_debugging

3c736d3

Merge branch 'nv-legate:main' into profiling_debugging

641e5c3

Merge branch 'nv-legate:main' into profiling_debugging

af4d960

manopapad requested review from mfoerste4 and shriram-jagan January 28, 2026 23:03

shriram-jagan reviewed Jan 29, 2026

View reviewed changes

NathanGraddon added 6 commits January 29, 2026 23:48

Merge branch 'nv-legate:main' into profiling_debugging

f9e48dc

Update link for Extending cuPyNumeric section

b9688c5

Update profiling_debugging.rst for production readiness

787d97f

Fix links to NVIDIA Legate usage and overview

de67dca

Updated links to the latest NVIDIA Legate documentation.

Enhance profiling_debugging.rst with profiler interpretation

f894c18

Added detailed interpretation of profiler timelines for CPU and GPU resources.

Enhance profiling_debugging.rst with traceable view feature

94038ac

Added details about the traceable view feature in profiling, explaining how to use task identifiers to connect performance symptoms to runtime activities.

NathanGraddon added 2 commits January 30, 2026 14:10

Fix capitalization in profiling_debugging.rst

4b01307

Revise profiling and OOM-handling benefits description

494d7f6

Enhanced the explanation of benefits from profiling tools and OOM-handling strategies, emphasizing memory pressure identification and execution granularity.


		3.) After a run completes, in the directory you ran the command you’ll see:

		- A folder: ``legate_prof/``, a self-contained HTML report


		For more detail, see the official references:

		- `Usage — NVIDIA legate <https://docs.nvidia.com/legate/24.11/usage.html>`_


		# Multi-GPU/Multi-Node: multiple ranks (pass them all: e.g: N0, N1, N2, etc)
		legate_prof view /path/to/legate_*.prof

Advanced Topics in cuPyNumeric (profiling & debugging) #1242

Are you sure you want to change the base?

Advanced Topics in cuPyNumeric (profiling & debugging) #1242

Uh oh!

Conversation

NathanGraddon commented Nov 28, 2025

Uh oh!

copy-pr-bot bot commented Nov 28, 2025

Uh oh!

manopapad commented Nov 29, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dongb commented Jan 20, 2026

Uh oh!

shriram-jagan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NathanGraddon Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shriram-jagan commented Jan 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

NathanGraddon Jan 30, 2026 •

edited

Loading