Investigate performance slowdown when adding density-mapped diagnostics

Andy pointed out this slowdown during last week's [TWG meeting](https://forum.access-hive.org.au/t/cosima-twg-meeting-minutes-2026/5906/12). Dougie noted in this [comment](https://github.com/ACCESS-NRI/access-om3-configs/pull/622#issuecomment-3031962668), adding even a single density-mapped diagnostic adds ~20% (or more) to the runtime. Adding more density-mapped diags does not increase the runtime any more.

With help from Angus and Ed, I have now managed to identify the same issue in a standalone benchmark using the following steps (with these aggresive compiler flags for vectorisation):

```sh
git clone --recursive https://github.com/noaa-gfdl/mom6-examples.git
cd mom6-examples/ocean_only
module load openmpi/5.0.8 intel-llvm-compiler/2025.2.0 netcdf/4.9.2 python3-as-python
FCFLAGS="-O2 -g3 -fno-omit-frame-pointer -O2 -xcascadelake -qopt-zmm-usage=high -vec-threshold0" FFLAGS="-O2 -g3 -fno-omit-frame-pointer -O2 -xcascadelake -qopt-zmm-usage=high -vec-threshold0" LDFLAGS="-O2 -g3 -fno-omit-frame-pointer -O2 -xcascadelake -qopt-zmm-usage=high -vec-threshold0" make -j

cd ../../

git clone --recursive https://github.com/marshallward/mom6-benchmark
cd benchmark_ALE
ln -s .././mom6-examples/ocean_only/build/MOM6 ./MOM6
mpiexec -n 1 ./MOM6
```

Adding the density mapped levels required adding the following patch in the `benchmark_ALE` directory (the patch is generated with `git diff --patch MOM_input diag_table`

```patch
diff --git a/benchmark_ALE/MOM_input b/benchmark_ALE/MOM_input
index b52b28e..39fbc55 100644
--- a/benchmark_ALE/MOM_input
+++ b/benchmark_ALE/MOM_input
@@ -184,6 +184,14 @@ INITIAL_T_RANGE = -9.0          !   [degC] default = 0.0
                                 ! Initial temperature range (bottom - surface)
 
 ! === module MOM_diag_mediator ===
+NUM_DIAG_COORDS = 1
+DIAG_COORDS = "rho2 RHO2 RHO" !"z Z ZSTAR" !
+DIAG_COORD_DEF_RHO2 = "RFNC1:76,999.5,1020.,1034.1,3.1,1041.,0.002" ! default = "WOA09"
+REGRIDDING_ANSWER_DATE = 99991231 ! default = 20181231
 
 ! === module MOM_MEKE ===
 USE_MEKE = True                 !   [Boolean] default = False
diff --git a/benchmark_ALE/diag_table b/benchmark_ALE/diag_table
index 42b4a98..02ceeb8 100644
--- a/benchmark_ALE/diag_table
+++ b/benchmark_ALE/diag_table
@@ -47,6 +47,10 @@ benchmark_ALE
  "ocean_model",   "zos",        "zos",          "ocean_month", "all", "mean", "none",2
  "ocean_model",   "Rd1",        "Rd1",          "ocean_month", "all", "mean", "none",2
 
+# monthly 3d fields on rho2
+"access-om3.mom6.3d.umo+rho2.1mon.mean.%4yr", 1, "months", 1, "days", "time", 1, "years"
+"ocean_model_rho2", "umo", "umo", "access-om3.mom6.3d.umo+rho2.1mon.mean.%4yr", "all", "average", "none", 2
+
 # 3d annual
  "ocean_model_z", "agessc",     "agessc",       "ocean_annual_z", "all", "mean", "none",2

```
I am going to use this issue to report back on my findings - from the benchmark_ALE first, and then the full OM3 25k IAF config.

#### Misc runtime notes:
- I needed to run `ulimit -s unlimited` on the command-line before running MOM6, other wise I was getting segfaults from exceeding the stack-sizes. I was surprised by this because I thought MOM6 used dynamic memory. This suggests that we might need to add a compiler flag to the effect of  `mcmodel=huge`
- `make clean` in the mom6-examples does not clean the FMS build; need to execute `make clean.fms` as well (to make sure that new compiler/linker flags are consistently applied to the entire build)

#### Adding link-time optimisation with `-flto` requires these changes
- Add `-flto` to all the compiler flags
- Add the fortran compiler flags to LDFLAGS, and then add `-flto -fuse-ld=lld`
- Edit the FMS make to change AR (which is hardcoded to `ar` to `llvm-ar` corresponding to the load oneAPI compiler)
- Fix the missing path for libnetcdff.so by adding LD_LIBRARY_PATH=$NC_ROOT/lib/Intel:$LD_LIBRARY_PATH (and confirm that ldd shows that MOM6 resolves the correct netcdf fortran library)




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate performance slowdown when adding density-mapped diagnostics #47

Misc runtime notes:

Adding link-time optimisation with `-flto` requires these changes

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Investigate performance slowdown when adding density-mapped diagnostics #47

Description

Misc runtime notes:

Adding link-time optimisation with -flto requires these changes

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Adding link-time optimisation with `-flto` requires these changes