Load and build indices lazily

### Is your feature request related to a problem?

Yes. Currently, when you load a datatree/set, all the coordinates are loaded eagerly, and indices built. I believe the loading is also synchronous, so you end up paying a performance penalty for things you might not even want.

Initially, I thought I could use create_default_indices=False to fix this, because indices are created automatically when needed. But setting that flag changes the semantics of index loading for data trees. Initially I wrote this up as a [bug](https://github.com/pydata/xarray/issues/11321), but then it appears that this is [deliberate behaviour](https://github.com/pydata/xarray/pull/9555).

### Describe the solution you'd like

I would like an option that has the same semantics as create_default_indices=True, but defers creation until necessary. Perhaps `create_default_indices="lazy"`.

A very simple example use case would be loading a datatree just so you can inspect the metadata (e.g. shapes). But or loading a large datatree, but only actually using a subtree of it.

### Describe alternatives you've considered

1) Just live with the perf hit and wasted bandwidth.
2) Use create_default_indices=False, and be very careful with issues this can cause.

### Additional context

I've prototyped a solution for this already. 

My solution involves create LazyDefaultIndex, which inherits from Index. This object will load the actual value, a PandasIndex via `create_default_index_implicit` once the index is required for operations. On datatree creation, this object is filled in any indices, where PandasIndex would normally go. Because this is an actual object, it has similar resolution 

Similarly, define LazyIndexingAdapter from PandasIndexingAdapter.

Both these lazy objects hold the actual index in a mutable box, so that the actual index is only loaded once, and re-used, regardless of copy operations. Indices are immutable, so this is safe.

Some methods don't always materialize the index:
* from_variables / create_variables / copy (returns another lazy object sharing same data)
* rename (can return a lazy object , but doesnt share)
* dim / __repr__ (doesn't use index directly)
* equals (if it can determine they share the same mutable box)
* isel / roll (depending on arguments)

This works well enough in my code base, it's about 400 lines of code, but it depends on xarray internals:
* It has to duck type PandasIndexingAdapter quite closely.
* It has to inherit from PandasIndexingAdapator from it to pass a few isinstance checks. 
* A few `Self` typed methods have to be accept `Self | PandasIndex`, and return `PandasIndex`, subtly changing some contracts.
* When pretty printing, it does get the same `*` marker that normal indices do, even after materialization.

If I were to incorpate into xarray, I'd probably make a base class AbstractPandasIndex, form which LazyPandasIndex and PandasIndex inherit, and interoperate.

---

I'm happy to share my code / incoporate into xarray if this seems like a good direction. But xarray's semantics are a bit tricky, I thought it best to discuss first.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Load and build indices lazily #11367

Is your feature request related to a problem?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Load and build indices lazily #11367

Description

Is your feature request related to a problem?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions