Skip to content

fix: X shape check slices to first 2 shape values#2432

Merged
ilan-gold merged 18 commits into
scverse:mainfrom
eroell:x-shape-slice-ndim
Jun 23, 2026
Merged

fix: X shape check slices to first 2 shape values#2432
ilan-gold merged 18 commits into
scverse:mainfrom
eroell:x-shape-slice-ndim

Conversation

@eroell

@eroell eroell commented May 10, 2026

Copy link
Copy Markdown
Contributor
  • Release note not necessary because:

Before

higher-than-2D X was rejected at construction; you could not hold a higher-dimensional X in memory:

import numpy as np
import anndata as ad
ad.AnnData(X=np.zeros((3, 4, 5)))
# ValueError: too many values to unpack (expected 2)
#   (from `n_obs, n_vars = X.shape`)

adata = ad.AnnData(X=np.zeros((3, 4)))
adata.X = np.zeros((3, 4, 5))
# FutureWarning: Automatic reshaping when setting X will be removed ...
# -> then ValueError, since (3, 4, 5) can't be reshaped to (3, 4)

higher-than-2D layers slipped through the in-memory shape check (only axes 0 and 1 are validated) and the writer happily wrote them to disk, silently violating the spec.

After

In-memory: higher-than-2D X and layers warns but still succeeds (the shape check uses X.shape[:2]).
Writing a non-2D X / layer now hard-fails.
Reading a non-conforming file warns but still succeeds:

adata = ad.AnnData(X=np.zeros((3, 4, 5)))   # now OK in memory
# UserWarning: X must be 2-dimensional, but got an array with shape (3, 4, 5) (ndim=3). Storing higher-dimensional arrays in `X` or `layers` violates the AnnData specification, and cannot be written to disk.

adata.X = np.zeros((3, 4, 5))               # also OK
# UserWarning: X must be 2-dimensional, but got an array with shape (3, 4, 5) (ndim=3). Storing higher-dimensional arrays in `X` or `layers` violates the AnnData specification, and cannot be written to disk.

adata.layers["L"] = np.zeros((3, 4, 5))     # also OK
# UserWarning: Layer 'L' must be 2-dimensional, but got an array with shape (3, 4, 5) (ndim=3). Storing higher-dimensional arrays in `X` or `layers` violates the AnnData specification, and cannot be written to disk.

adata.write_h5ad("out.h5ad")
# ValueError: X must be 2-dimensional, but got an array with shape (3, 4, 5) (ndim=3). Storing higher-dimensional arrays in `X` or `layers` violates the AnnData specification, and cannot be written to disk.

ad.read_h5ad("legacy_non_conforming.h5ad")
# UserWarning: X must be 2-dimensional, but got an array with shape ...
# -> still returns the AnnData

Same applies to layers["L"] (error/warning message says Layer 'L' instead of X), and to the zarr IO path.

@codecov

codecov Bot commented May 10, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 98.14815% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 85.70%. Comparing base (e176a68) to head (bcadf78).
⚠️ Report is 1 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/anndata/_core/storage.py 95.23% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2432      +/-   ##
==========================================
- Coverage   87.62%   85.70%   -1.92%     
==========================================
  Files          49       49              
  Lines        7690     7729      +39     
==========================================
- Hits         6738     6624     -114     
- Misses        952     1105     +153     
Files with missing lines Coverage Δ
src/anndata/_core/aligned_mapping.py 94.27% <100.00%> (+0.07%) ⬆️
src/anndata/_core/anndata.py 86.99% <100.00%> (+0.17%) ⬆️
src/anndata/_io/h5ad.py 93.30% <100.00%> (+0.06%) ⬆️
src/anndata/_io/specs/methods.py 91.39% <100.00%> (-0.31%) ⬇️
src/anndata/_core/storage.py 95.31% <95.23%> (-0.04%) ⬇️

... and 7 files with indirect coverage changes

@eroell eroell marked this pull request as ready for review May 11, 2026 12:57
@flying-sheep flying-sheep added this to the 0.12.15 milestone May 18, 2026
@ilan-gold ilan-gold modified the milestones: 0.12.15, 0.12.17 May 18, 2026
shape = getattr(value, "shape", None)
if shape is None:
return None
ndim = len(shape)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this also means that an array of shape (m, n, 1) can't be written to disk anymore, before this was ok.

Do you want to allow this or not?

@eroell

eroell commented May 19, 2026

Copy link
Copy Markdown
Contributor Author

Seems I can't request reviews, but I'd be interested in your comments at this stage :)

Comment thread src/anndata/_io/specs/methods.py Outdated
# Older / non-conforming files may contain higher-dimensional `X` or
# `layers`. The on-disk spec forbids that; surface it as a warning so
# the user knows, but still construct the AnnData with what's there.
_warn_if_x_or_layers_3d_kwargs(d)

@ilan-gold ilan-gold May 19, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move this warning into the construction of the AnnData object instead of reading (also in the other spots) so that people who do this in-memory will know their data is technically unwritable

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adjusted to this, I updated the first PR comment accordingly to this behaviour. Assignment to .X and .layers[<key>] raise this warning now, too.

@flying-sheep flying-sheep left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not much to add to what Ilan said, thank you this looks clean!

Comment thread src/anndata/_core/anndata.py
Comment thread tests/test_x_layers_2d.py Outdated
@eroell eroell requested review from flying-sheep and ilan-gold June 4, 2026 10:38

@ilan-gold ilan-gold left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a release note and semantic commit title (see failing CI) :)

Comment thread src/anndata/_core/aligned_mapping.py Outdated
Comment thread src/anndata/_core/aligned_mapping.py Outdated
@ilan-gold ilan-gold modified the milestones: 0.12.17, 0.12.18 Jun 16, 2026
@flying-sheep flying-sheep changed the title X shape check slices to first 2 shape values feat: X shape check slices to first 2 shape values Jun 17, 2026
@flying-sheep flying-sheep changed the title feat: X shape check slices to first 2 shape values fix: X shape check slices to first 2 shape values Jun 17, 2026
@flying-sheep

flying-sheep commented Jun 17, 2026

Copy link
Copy Markdown
Member

OK, this doesn’t actually close #2430 yet, so I changed that in the description.

I also adapted the title.

@eroell eroell requested a review from ilan-gold June 19, 2026 09:24
Comment thread tests/test_x_layers_2d.py Outdated
@flying-sheep

flying-sheep commented Jun 19, 2026

Copy link
Copy Markdown
Member

OK @ilan-gold in 3ce3ead, I made it so validation only happens when setting aligned mappings, with the goal of allowing to raise warnings in there without the warnings being displayed on each access, and the side-benefit of increased performance by not re-validating everything on access.

This caused a very small breaking change illustrated by the change in 0e13d18: since we don’t go through all aligned mappings and re-validate everything whenever accessing one, we don’t aggressively re-sync obs_names.nameobsm[*].index.name on each obsm access. But since we don’t support intererior mutation of dataframes, and since I doubt that anyone even knows that this syncing happens, I think that’s more than fine.

@ilan-gold ilan-gold left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, just the one thing!

Comment thread src/anndata/_core/storage.py
@flying-sheep flying-sheep requested a review from ilan-gold June 23, 2026 07:17
@ilan-gold ilan-gold modified the milestones: 0.12.18, 0.13.0 Jun 23, 2026
@ilan-gold ilan-gold merged commit 21ed5a8 into scverse:main Jun 23, 2026
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants