Skip to content

patch_ib Memory Copy Reduction#1531

Draft
danieljvickers wants to merge 2 commits into
MFlowCode:masterfrom
danieljvickers:1453-patch-mem-cpy-reduction
Draft

patch_ib Memory Copy Reduction#1531
danieljvickers wants to merge 2 commits into
MFlowCode:masterfrom
danieljvickers:1453-patch-mem-cpy-reduction

Conversation

@danieljvickers
Copy link
Copy Markdown
Member

Description

The patch_ib array was being updated on the CPU. This had the downstream effect of more host-device memory copies that strictly required. This PR modified the code to use the GPU copy of patch_ib everywhere and only copy out during file IO. This should remove memory copies and enable parallelism that was not present before.

Fixes #1453

Type of change

  • Refactor

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 2, 2026

Claude Code Review

Head SHA: 0fd4443

Files changed:

  • 4
  • src/simulation/m_data_output.fpp
  • src/simulation/m_ib_patches.fpp
  • src/simulation/m_ibm.fpp
  • src/simulation/m_time_steppers.fpp

Findings

[Correctness] Missing GPU_UPDATE(host) before s_update_mib in m_time_steppers.fpp

The diff converts a CPU loop into a GPU_PARALLEL_LOOP that updates patch_ib(i)%vel, patch_ib(i)%angular_vel, patch_ib(i)%angles, patch_ib(i)%x/y/z_centroid, and patch_ib(i)%moment (via s_compute_moment_of_inertia) entirely on the device. The old code ran a CPU loop then pushed with GPU_UPDATE(device). The new code removes that push—correctly—because the authoritative copy is now on the device. However, it removes the subsequent GPU_UPDATE(device='[patch_ib]') without adding a corresponding GPU_UPDATE(host='[patch_ib(1:num_ibs)]') before call s_update_mib(num_ibs).

s_update_mib is a CPU subroutine and reads the host copy of patch_ib, which is stale after the GPU parallel loop. All the Runge–Kutta position and velocity updates computed on the device (centroids, velocities, angles, moment of inertia) will be invisible to s_update_mib, leading to incorrect IB geometry propagation.

Fix: Add $:GPU_UPDATE(host='[patch_ib(1:num_ibs)]') between $:END_GPU_PARALLEL_LOOP() and call s_update_mib(num_ibs).


[Correctness] Missing GPU_UPDATE(host) before s_apply_ib_patches in m_ibm.fpp s_update_immersed_boundaries

The diff wraps the do i = 1, num_ibs rotation-matrix loop in a GPU_PARALLEL_LOOP and removes the previously trailing GPU_UPDATE(device='[patch_ib]'). Under the old scheme the loop ran on CPU (host authoritative), then pushed to device. Under the new scheme the loop runs on GPU (device authoritative), but no GPU_UPDATE(host) is emitted before the immediately following s_apply_ib_patches call and the surrounding MPI broadcast code referenced by the "APPLY-IB-PATCHES" nvtx range.

s_apply_ib_patches and MPI routines run on CPU and read patch_ib from host. The rotation matrices computed by s_update_ib_rotation_matrix (now a GPU_ROUTINE(seq) executing entirely on device) will be stale on the host, causing incorrect IB geometry at each time-step update.

Fix: Add $:GPU_UPDATE(host='[patch_ib(1:num_ibs)]') after $:END_GPU_PARALLEL_LOOP() and before the apply/MPI section.


[Correctness] Missing GPU_UPDATE(host) after initialization loop in m_ibm.fpp s_initialize_ibm_module

The diff moves GPU_UPDATE(device='[patch_ib(1:num_ibs)]') to before the do i = 1, num_ibs loop and wraps that loop in a GPU_PARALLEL_LOOP. s_compute_moment_of_inertia (now GPU_ROUTINE(seq)) writes patch_ib(ib_idx)%moment from the GPU, and s_update_ib_rotation_matrix (also GPU_ROUTINE(seq)) writes rotation-matrix fields. Both writes happen only on the device.

The immediately following block visible in the diff is:

! allocate some arrays for MPI communication, if required by this simulation
#ifdef MFC_MPI

MPI routines are CPU operations that read the host copy of patch_ib, which holds pre-initialization stale values. Moment-of-inertia and rotation data will be wrong for all subsequent MPI-broadcast steps during startup.

Fix: Add $:GPU_UPDATE(host='[patch_ib(1:num_ibs)]') after $:END_GPU_PARALLEL_LOOP() and before the #ifdef MFC_MPI block.

@danieljvickers danieljvickers marked this pull request as draft June 2, 2026 22:10
@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 3, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 60.63%. Comparing base (2d0d2fd) to head (0fd4443).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #1531   +/-   ##
=======================================
  Coverage   60.63%   60.63%           
=======================================
  Files          73       73           
  Lines       20219    20219           
  Branches     2937     2937           
=======================================
  Hits        12259    12259           
  Misses       5972     5972           
  Partials     1988     1988           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

Patch Mem Cpy Reduction

1 participant