Skip to content

Error in CICE4 when compiling with error-checking flags #19

@manodeep

Description

@manodeep

While investigating the divide-by-zero error in mom5 with oneAPI, I have uncovered a different error in CICE4 -- ice_IOUnitsGet: No free units, which originates from this line of CICE code. I get the same error with oneAPI 2025 - so presumably this is an underlying code issue.

The exe is here -- which is compiling the latest release of ESM1.6 (access-esm1p6/dev_2025.04.000) with Intel classic compiler 2021.10.0, and using these fortran flags 'fflags="-fprotect-parens -assume nan_compares -assume ieee_compares -fpe0 -traceback -check all -init=snan -init=array -init=huge"' for mom5, cice4 and um7.

The config uses 12 CICE cpus and I get 12 instances of the same error in the pbs log, followed by a SIGTERM and a traceback from UM7 routines (which I expect are a red-herring. A terminating CICE causes SIGTERM to be sent to the other running processes and that's causes the traceback). The run directory is here: /home/593/ms2335/perf-opt-classic-esm1.6/sapphirerapids/access-esm1.6-PI-sapphirerapids-416-cores-classic-deployed-exe-to-test-sanitizer-and-div-by-zero-esm1p6-pr74

ice_IOUnitsGet: No free units
ice_IOUnitsGet: No free units
ice_IOUnitsGet: No free units
ice_IOUnitsGet: No free units
ice_IOUnitsGet: No free units
ice_IOUnitsGet: No free units
ice_IOUnitsGet: No free units
ice_IOUnitsGet: No free units
ice_IOUnitsGet: No free units
ice_IOUnitsGet: No free units
ice_IOUnitsGet: No free units
ice_IOUnitsGet: No free units
orrtl: error (78): process killed (SIGTERM)
Image              PC                Routine            Line        Source             
libpthread-2.28.s  000014E70A542D10  Unknown               Unknown  Unknown
um_hg3.exe         000000000041061D  Unknown               Unknown  Unknown
um_hg3.exe         000000000100757C  read_multi_               996  read_multi.f90
um_hg3.exe         0000000000E4D348  um_readdump_             1605  um_readdump.f90
um_hg3.exe         0000000000C7A040  initdump_                6686  initdump.f90
um_hg3.exe         00000000006245D8  initial_                 6388  initial.f90
um_hg3.exe         00000000004434A1  Unknown               Unknown  Unknown
um_hg3.exe         0000000000417376  um_shell_                3930  um_shell.f90
um_hg3.exe         00000000004104C8  MAIN__                     40  flumeMain.f90
um_hg3.exe         000000000041040D  Unknown               Unknown  Unknown
libc-2.28.so       000014E709F907E5  __libc_start_main     Unknown  Unknown
um_hg3.exe         000000000041032E  Unknown               Unknown  Unknown
forrtl: error (78): process killed (SIGTERM)
Image              PC                Routine            Line        Source             
libpthread-2.28.s  0000151689D39D10  Unknown               Unknown  Unknown
libucp.so.0.0.0    0000151685AD77D8  ucp_worker_progre     Unknown  Unknown
libmpi.so.40.30.5  000015168A374BFF  mca_pml_ucx_recv      Unknown  Unknown
...

The released version itself (i.e., without any compiler, compiler flags modifications) runs fine.

Pinging @anton-seaice @chrisb13

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions