Skip to content

Fix steady memory growth (~200 MB/granule) with geometry="swath"#69

Merged
eeholmes merged 2 commits intomainfrom
copilot/fix-memory-accumulation-level-2
Mar 9, 2026
Merged

Fix steady memory growth (~200 MB/granule) with geometry="swath"#69
eeholmes merged 2 commits intomainfrom
copilot/fix-memory-accumulation-level-2

Conversation

Copy link
Contributor

Copilot AI commented Mar 9, 2026

geometry="swath" uses xr.open_datatree() + xr.merge() to open each granule. Unlike xr.open_dataset(), the DataTree was never explicitly closed, so HDF5 file handles and their backing memory accumulated across granules until Python's cyclic GC eventually ran.

Changes

  • Primary fix — _open_as_flat_dataset@contextmanager: DataTree is now explicitly closed in a finally block after each granule is processed, matching the cleanup behaviour of xr.open_dataset():

    dt = _open_datatree(file_obj, kwargs)
    try:
        ds = _merge_datatree(dt)
        yield ds
    finally:
        dt.close()  # releases all HDF5 file handles immediately
  • gc.collect() after each batch: DataTree nodes contain parent→child reference cycles, so CPython's reference-counting GC doesn't free them promptly. An explicit gc.collect() after del opened_batch ensures they're reclaimed before the next batch opens.

  • Batch xoak index build (_extract_xoak_batch): The old code rebuilt the NDPointIndex k-d tree once per query point per granule. The new _extract_xoak_batch builds it once per granule and queries all points in a single sel() call, reducing both memory churn and CPU cost.

Original prompt

This section details on the original issue you should resolve

<issue_title>memory accummulation with level 2 (swath)</issue_title>
<issue_description>Task: Diagnose and (hopefully) solve a climbing memory issue with geometry="swath", level 2, data

Background:

In a previous issue and PR, you solved some memory accummulation issue where memory kept growing linearly
#59

With geometry="grid", it seems ok and memory starts at 450Mb and doesn't grow beyond 650Mb with this example dataset.

plan = pc.plan(
    df_points[0:100],
    data_source="earthaccess",
    source_kwargs={
        "short_name": "PACE_OCI_L3M_AVW",
        "granule_name": "*.DAY.*.4km.*",
    }
)
# Both spatial_method work fine for gridded data. Memory doesn't climb.
res = pc.matchup(plan[0:100], geometry="grid", variables = ["avw"], batch_size=1)
res = pc.matchup(plan[0:100], geometry="grid", variables = ["avw"], batch_size=1, spatial_method="xoak")

With geometry="swath", the memory steadily climbs to 3Gb. The swath files are not bigger, that much, but the really concern is that the memory keeps climbing.

With geometry="grid", it seems ok and memory starts at 450Mb and doesn't grow beyond 650Mb with this example dataset.

But with level 2 data, geometry="swath", the memory steadily climbs, ca 200Mb per granule.

import point_collocation as pc
plan = pc.plan(
    df_points[0:100], # -82.7375, 27.3835	
    data_source="earthaccess",
    source_kwargs={
        "short_name": "PACE_OCI_L2_AOP",
    },
    time_buffer="12h"
)
# this uses  spatial_method="xoak"
res = pc.matchup(plan[0:100], geometry="swath", variables = ["Rrs"], batch_size=1)

An example swath netcdf is in examples/

</issue_description>

Comments on the Issue (you are @copilot in this section)


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

…, gc.collect

Co-authored-by: eeholmes <2545978+eeholmes@users.noreply.github.com>
Copilot AI changed the title [WIP] Diagnose and resolve memory accumulation with level 2 swath Fix steady memory growth with geometry="swath" (L2/swath data) Mar 9, 2026
Copilot AI changed the title Fix steady memory growth with geometry="swath" (L2/swath data) Fix steady memory growth (~200 MB/granule) with geometry="swath" Mar 9, 2026
@eeholmes eeholmes marked this pull request as ready for review March 9, 2026 23:57
@eeholmes eeholmes merged commit 1ee0795 into main Mar 9, 2026
1 check passed
@eeholmes eeholmes deleted the copilot/fix-memory-accumulation-level-2 branch March 9, 2026 23:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

memory accummulation with level 2 (swath)

2 participants