Skip to content

ccdtools RACMO dataset load causes UCX/MPI segfault (signal 11/7) on Gadi #57

@justinh2002

Description

@justinh2002

Loading the RACMO dataset via ccdtools catalog.load_dataset('racmo2.4p1_monthly_11km_1979-2023, version='v1') causes a fatal UCX/MPI crash on Gadi, killing the Python process with exit code 139 (SIGSEGV) or 135 (SIGBUS).

[gadi-login-07:...:0] UCX WARN ucs_recursive_spinlock_destroy() failed: busy
[gadi-login-07:...:2] Caught signal 7 (Bus error: Sent by the kernel)
[gadi-login-07:...:1] Caught signal 11 (Segmentation fault: Sent by the kernel at address (nil))
Bus error

steps to reproduce

import ccdtools as dp
catalog = dp.catalog.DataCatalog()
racmo = catalog.load_dataset('racmo2.4p1_monthly_11km_1979-2023', version='v1')

Other datasets (measures_bedmachine_antarctica, measures_insar_based_antarctica_ice_velocity_map, antarctic_geothermal_heat_flow_model_aq1) load without issue via the CCD. The crash appears specific to the RACMO dataset.

Metadata

Metadata

Assignees

Labels

datapoolDefault label for ACCESS Cryosphere Data Pool Issues

Type

No type
No fields configured for issues without a type.

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions