Skip to content

Improve memory-heavy dataset loading #58

@lawrenceabird

Description

@lawrenceabird

Various datasets are memory-heavy due to the data volume being loaded. For example, loading REMA < 1 km works with 128 GB memory, but does not work on a standard "medium" ARE session.

There are likely a few way to improve this, but a few things that should be implemented anyway might help:

  • Pass select "variables" to the load_dataset command. For example, load_dataset('rema', version = 'v2', resolution = '100m', variables = ['dem', 'count']) would only load files with 'dem' and 'count' in the names. This should be extended to all datasets that combine multiple individual files (e.g. RACMO) to reduce overhead.
  • Pass a region/domain extent to the load_dataset command. For example load_dataset('rema', version = 'v2', resolution = '100m', region = [-1000, 1000, -1000, 1000]) where region = [xmin, xmax, ymin, ymax]

Metadata

Metadata

Assignees

No one assigned

    Labels

    datapoolDefault label for ACCESS Cryosphere Data Pool Issues

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions