Skip to content

Use device_unmap in destructors#2391

Open
timofeymukha wants to merge 10 commits intoExtremeFLOW:developfrom
timofeymukha:bug/use_unmap
Open

Use device_unmap in destructors#2391
timofeymukha wants to merge 10 commits intoExtremeFLOW:developfrom
timofeymukha:bug/use_unmap

Conversation

@timofeymukha
Copy link
Collaborator

Propagates the use of device_unmap through the codebase.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Propagates use of device_unmap in destructors/free routines across multiple Fortran modules to unify host/device memory teardown and reduce manual device_free/device_deassociate handling.

Changes:

  • Replaced many device_free (and some device_deassociate) calls with device_unmap in destructors.
  • Moved unmapping logic to occur alongside host deallocation (allocated(...)-guarded blocks).
  • Simplified module imports by dropping now-unused c_associated in several files.

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/wall_models/wall_model.f90 Unmaps device mappings for index arrays during base destructor cleanup.
src/time_schemes/time_scheme_controller.f90 Uses device_unmap for mapped coefficient arrays in destructor.
src/time_schemes/runge_kutta_scheme.f90 Unmaps RK coefficient arrays before deallocation.
src/source_terms/gradient_jump_penalty.f90 Converts extensive device cleanup from device_free to device_unmap tied to host allocations.
src/simulation_components/probes.F90 Switches probe output buffers to device_unmap per-field mapping during teardown.
src/sem/space.f90 Unmaps many SEM space arrays before deallocation, removing separate device-free section.
src/sem/local_interpolation.f90 Unmaps interpolation weights before deallocation.
src/sem/interpolation.f90 Unmaps interpolation matrices before deallocation, removing standalone device-free block.
src/sem/dofmap.f90 Unmaps coordinate arrays before deallocation, removing standalone device-free block.
src/sem/coef.f90 Unmaps many coefficient/device-mapped arrays before deallocation, removing large standalone device-free block.
src/math/vector.f90 Uses device_unmap to replace device_deassociate + device_free in vector destructor.
src/math/schwarz.f90 Unmaps mapped work buffers; keeps device_free for device-only allocations.
src/math/matrix.f90 Uses device_unmap to replace device_deassociate + device_free in matrix destructor.
src/math/fdm.f90 Unmaps mapped arrays before deallocation, removing standalone device-free block.
src/global_interpolation/global_interpolation.f90 Unmaps mapped owner arrays and replaces manual unmap/free sequence with device_unmap.
src/fluid/bcknd/advection/adv_oifs.f90 Unmaps mapped arrays before deallocation, removing standalone device-free block.
src/fluid/bcknd/advection/adv_no_dealias.f90 Unmaps mapped temp buffer before deallocation.
src/fluid/bcknd/advection/adv_dealias.f90 Unmaps mapped scratch arrays before deallocation, removing standalone device-free block.
src/filter/elementwise_filter.f90 Unmaps mapped filter matrices before deallocation, removing standalone device-free block.

@njansson
Copy link
Collaborator

Wouldn't it be a bit more safe if we still do the test with c_associated, to avoid that we try to unmap and free a device pointer? (or if we want to avoid these if, we could add the check in unmap, not sure what is best but somehow it's nice not to hide too many important details)

@timfelle
Copy link
Collaborator

timfelle commented Mar 18, 2026

Wouldn't it be a bit more safe if we still do the test with c_associated, to avoid that we try to unmap and free a device pointer? (or if we want to avoid these if, we could add the check in unmap, not sure what is best but somehow it's nice not to hide too many important details)

From what I can see, all the updates here mirror the initializers. So I think we should not add the extra c_associated guards. These types are written such that if devices are enabled we map the arrays. By using unmap, you will be throwing an error if someone messed with the internals and deallocated or nullified something. I might even suggest we move the unmap outside the allocated check. Of course in that case you need to unmap, then do the allocation check.

@njansson
Copy link
Collaborator

Wouldn't it be a bit more safe if we still do the test with c_associated, to avoid that we try to unmap and free a device pointer? (or if we want to avoid these if, we could add the check in unmap, not sure what is best but somehow it's nice not to hide too many important details)

From what I can see, all the updates here mirror the initializers. So I think we should not add the extra c_associated guards. These types are written such that if devices are enabled we map the arrays. By using unmap, you will be throwing an error if someone messed with the internals and deallocated or nullified something. I might even suggest we move the unmap outside the allocated check. Of course in that case you need to unmap, then do the allocation check.

Fully agree that we do mirror init and in principle it will be fine. But without checking the content of the pointer, we don't know if something bad has happened during the run. E.g. some consumer has deallocated an array etc (such an error will be a pain to find (now we just silently ignores it, which is probably also not the best thing))

@timfelle
Copy link
Collaborator

timfelle commented Mar 18, 2026

Wouldn't it be a bit more safe if we still do the test with c_associated, to avoid that we try to unmap and free a device pointer? (or if we want to avoid these if, we could add the check in unmap, not sure what is best but somehow it's nice not to hide too many important details)

From what I can see, all the updates here mirror the initializers. So I think we should not add the extra c_associated guards. These types are written such that if devices are enabled we map the arrays. By using unmap, you will be throwing an error if someone messed with the internals and deallocated or nullified something. I might even suggest we move the unmap outside the allocated check. Of course in that case you need to unmap, then do the allocation check.

Fully agree that we do mirror init and in principle it will be fine. But without checking the content of the pointer, we don't know if something bad has happened during the run. E.g. some consumer has deallocated an array etc (such an error will be a pain to find (now we just silently ignores it, which is probably also not the best thing))

Which is why I argue that we move it outside the if (allocated block.
The unmap does nothing, if the host array is not mapped, and the device pointer is not associated. Otherwise it will throw an error if something is wrong with the mapping, no longer associated to the expected pointer etc.

@timofeymukha
Copy link
Collaborator Author

timofeymukha commented Mar 18, 2026

My master plan to settle everything in yesterday's small PR has clearly failed :-).

It's true that if someone deallocates the host array randomly, the device pointer will leak. But passing an unallocated array to a dummy argument which is not itself allocatable is illegal, or? I guess we have to have an else if clause where we check the c_associate on x_d and free that if associated. I would not put that stuff in unmap, it should just reverse a valid map.

So in the end we will have about the same verbosity level as for just adding a dissassociate directly and not using unmap, but unmap gives us extra validity checks.

To be honest, in 90% of these types, one should just use vector_t and matrix_t instead of storing x and x_d. Then, it will just be a matter of calling vector_t%free, and we can be very verbose and careful in that destructor.

@njansson
Copy link
Collaborator

njansson commented Mar 18, 2026

My master plan to settle everything in yesterday's small PR has clearly failed :-).

It's true that if someone deallocates the host array randomly, the device pointer will leak. But passing an unallocated array to a dummy argument which is not itself allocatable is illegal, or? I guess we have to have an else if clause where we check the c_associate on x_d and free that if associated. I would not put that stuff in unmap, it should just reverse a valid map.

So in the end we will have about the same verbosity level as for just adding a dissassociate directly and not using unmap, but unmap gives us extra validity checks.

To be honest, in 90% of these types, one should just use vector_t and matrix_t instead of storing x and x_d. Then, it will just be a matter of calling vector_t%free, and we can be very verbose and careful in that destructor.

My master plan to settle everything in yesterday's small PR has clearly failed :-).

It's true that if someone deallocates the host array randomly, the device pointer will leak. But passing an unallocated array to a dummy argument which is not itself allocatable is illegal, or?

True, my understanding of the standard is that passing an unallocated to a not (marked) allocatable dummy argument should be illegal (but a language lawyer might be needed decide on that)

I guess we have to have an else if clause where we check the c_associate on x_d and free that if associated. I would not put that stuff in unmap, it should just reverse a valid map.

Agree, check and then call unmap instead of a foolproof unmap, like a free in C, it would accept anything ;)

So in the end we will have about the same verbosity level as for just adding a dissassociate directly and not using unmap, but unmap gives us extra validity checks.

To be honest, in 90% of these types, one should just use vector_t and matrix_t instead of storing x and x_d. Then, it will just be a matter of calling vector_t%free, and we can be very verbose and careful in that destructor.

In an ideal world yes... but with the current state of compilers, maybe not. These structures prevents optimisation (sigh), unless do concurrent or we sprinkle pragmas everywhere (yeah I know it's shouldn't be needed but it is sadly)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants