Skip to content

Simplify backed internal checking #2369

Description

@ilan-gold

Please describe your wishes and possible alternatives to achieve the desired result.

Right now AnnData.isbacked (i.e., X is None and AnnData.filename is present) is used for checking for a variety of different behaviors/conditions aside from just getting X:

  1. It prevents making a view-of-a-view when backed mode is present refactor: allow multi-indexed backed mode #2407
  2. Custom repr
  3. Preventing string->categorical conversion when AnnData.is_view this is meant to prevent accidentally reading a backed array into memory, I guess, since it will materialize X
  4. AnnData.T although this is actually just broken in general - you can't tranpose something backed and will get an AttributeError about .T on h5py.Dataset - again ????? fix: disallow transpose for "raw" backed array objects #2399
  5. AnnData.concatenate: chore!: remove AnnData.concatenate #2370
  6. A bunch of stuff in Raw
  7. AnnData.__{del,set}item__: fix!: remove __delitem__ and __setitem__ from the AnnData object #2367
  8. Making a copy of AnnData fix: disallow copy for "backed" objects #2406

I think these cases can succinctly be broken down into two categories:

  • Broken/unnecessary behavior I think 5 (because it is deprecated) and 2 fall into this category. To solve this I would propose just remove the behavior
  • Behavior that doesn't work anyway on h5py.Dataset, BaseCompressedSparseDataset or zarr.Array i.e., things that are broken if isbacked is False but the underlying class on X or a different elem is the same or similar to if it were True. This is 1, 3, 4, 6, and 8. For this we probably need a AnnData.predicate function or similar that crawls the AnnData object and checks for a condition at each "leaf node" i.e., is it an instance of class that has a certain attribute (like .T in case 4 above or copy where making a copy of e.g., a zarr.Array probably doesn't make sense). Here we would probably refactor {read,write}_dispatched to rely on this behavior as well and just factor out the "what do you do at a node" part to either read/write or return a boolean

The goal of this is to simplify things to the point where isbacked either doesn't exist or runs on the proposed predicate function. This would make #2357 much simpler because we can just take actions depending on whether any of the objects are "backed"

Metadata

Metadata

Assignees

Labels

No labels
No labels
No fields configured for Enhancement.

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions