fix: allow reading nullable uint arrays in `read_{,elem_}lazy` by flying-sheep · Pull Request #2287 · scverse/anndata

flying-sheep · 2026-01-08T10:56:04Z

Closes #
Tests added
Release note added (or unnecessary)

Use pandas.core.dtypes.dtypes.BaseMaskedDtype.from_numpy_dtype (with fallback to array constructors) instead of string manipulation to get MaskedArray’s dtype.

Extracted and improved from #2279

codecov · 2026-01-08T10:57:45Z

Codecov Report

❌ Patch coverage is 64.86486% with 13 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.58%. Comparing base (4376302) to head (587da01).
⚠️ Report is 47 commits behind head on main.

Files with missing lines	Patch %	Lines
src/anndata/_io/utils.py	44.44%	10 Missing ⚠️
src/anndata/experimental/backed/_lazy_arrays.py	76.92%	3 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2287      +/-   ##
==========================================
- Coverage   86.74%   84.58%   -2.17%     
==========================================
  Files          46       46              
  Lines        7204     7218      +14     
==========================================
- Hits         6249     6105     -144     
- Misses        955     1113     +158

Files with missing lines	Coverage Δ
src/anndata/tests/helpers.py	`83.36% <100.00%> (-9.49%)`	⬇️
src/anndata/experimental/backed/_lazy_arrays.py	`92.24% <76.92%> (+0.50%)`	⬆️
src/anndata/_io/utils.py	`73.54% <44.44%> (-3.99%)`	⬇️

... and 6 files with indirect coverage changes

ilan-gold · 2026-01-08T11:01:31Z

Is this a public API? https://pandas.pydata.org/docs/search.html?q=BaseMaskedDtype

flying-sheep · 2026-01-08T11:07:59Z

Hm, no: https://pandas.pydata.org/docs/reference/index.html

The pandas.core, pandas.compat, and pandas.util top-level modules are PRIVATE. Stable functionality in such modules is not guaranteed.

I’ll fall back to the previous approach then.

ilan-gold

Why are we changing the implementation of dtype? I am not sure that issue you made in pandas will ever be fixed. Like I mention in my comment, what we had wasn't random "string manipulation" but the actual on-disk file format names. It was super clear, if not the cleanest thing on earth (could use a match or something maybe), what in-memory array was created from what on-disk array. I do like the construct_array_type usage though!

flying-sheep · 2026-01-08T13:59:41Z

what we had wasn't random "string manipulation" but the actual on-disk file format names

That code is not what I referred to! I’m talking about this fragile construct:

anndata/src/anndata/experimental/backed/_lazy_arrays.py

Lines 194 to 197 in 088fa16

    
           return pd.array( 
        
               [], 
        
               dtype=str(pd.api.types.pandas_dtype(self._values.dtype)).capitalize(), 
        
           ).dtype

What you mean is that the previous code dispatched on _dtype_string whereas mine dispatches on values.dtype.kind.

I believe that what I do is perfectly safe and robust since the anndata version with this code only supports nullable-integer 0.1.0, nullable-boolean 0.1.0, and nullable-string-array 0.1.0. A hypothetical future anndata version that supports constructing a masked array from integers or bools in a different way will gain support for the new protocols together with new code dispatching on that.

My changes just make sure that we can confidently say we support UInt64 without looking into our tests (will str.capitalize create that? Probably not? maybe?)

flying-sheep · 2026-01-08T14:03:43Z

Nope, it doesn’t:

import pandas as pd, numpy as np
nullable_dtype = lambda dtype: pd.array([], dtype=str(pd.api.types.pandas_dtype(dtype)).capitalize()).dtype
nullable_dtype(np.dtype(np.uint64))

TypeError                                 Traceback (most recent call last)
Cell In[3], line 1
----> 1 nullable_dtype(np.dtype(np.uint64))

Cell In[2], line 1, in <lambda>(dtype)
----> 1 nullable_dtype = lambda dtype: pd.array([], dtype=str(pd.api.types.pandas_dtype(dtype)).capitalize()).dtype 

File /usr/lib/python3.13/site-packages/pandas/core/construction.py:311, in array(data, dtype, copy)
    309 # this returns None for not-found dtypes.
    310 if dtype is not None:
--> 311     dtype = pandas_dtype(dtype)
    313 if isinstance(data, ExtensionArray) and (dtype is None or data.dtype == dtype):
    314     # e.g. TimedeltaArray[s], avoid casting to NumpyExtensionArray
    315     if copy:

File /usr/lib/python3.13/site-packages/pandas/core/dtypes/common.py:1663, in pandas_dtype(dtype)
   1656     with warnings.catch_warnings():
   1657         # TODO: warnings.catch_warnings can be removed when numpy>2.3.0
   1658         # is the minimum version
   1659         # GH#51523 - Series.astype(np.integer) doesn't show
   1660         # numpy deprecation warning of np.integer
   1661         # Hence enabling DeprecationWarning
   1662         warnings.simplefilter("always", DeprecationWarning)
-> 1663         npdtype = np.dtype(dtype)
   1664 except SyntaxError as err:
   1665     # np.dtype uses `eval` which can raise SyntaxError
   1666     raise TypeError(f"data type '{dtype}' not understood") from err

TypeError: data type 'Uint64' not understood

ilan-gold

Nice great change then!

flying-sheep · 2026-01-08T14:38:01Z

OK, I’ll change this into a fix because it’s an actual bug:

import anndata as ad, numpy as np, pandas as pd
ad.AnnData(np.zeros((4, 6)), dict(nint=pd.array([1, 0, None, 1], dtype=pd.UInt32Dtype()))).write_zarr("/tmp/test.ad.zarr")
ad.experimental.read_lazy("/tmp/test.ad.zarr")

TypeError: data type 'Uint32' not understood
Error raised while reading key 'obs' of <class 'zarr.core.group.Group'> from /

ilan-gold · 2026-01-08T14:49:49Z

We should add a test. Ideally it wouldn't even be a new test but some setting or something. I'm looking into how this ever even slipped through, because in theory it should be tested.

ilan-gold · 2026-01-08T15:11:46Z

Thanks!

flying-sheep · 2026-01-08T15:19:12Z

probably slipped through because the nullable-integer branch in our test utils was a bit inflexible, but I fixed that.

…,elem_}lazy`

…,elem_}lazy` (#2292)

flying-sheep added 2 commits January 8, 2026 11:41

chore: cleaner nullable dtype inference

8749233

better

6ae2427

flying-sheep added the run-gpu-ci label Jan 8, 2026

flying-sheep added this to the 0.12.8 milestone Jan 8, 2026

flying-sheep requested a review from ilan-gold January 8, 2026 10:59

github-actions Bot removed the run-gpu-ci label Jan 8, 2026

public APIs

ac5e442

flying-sheep added the run-gpu-ci label Jan 8, 2026

github-actions Bot removed the run-gpu-ci label Jan 8, 2026

ilan-gold requested changes Jan 8, 2026

View reviewed changes

Comment thread src/anndata/_io/utils.py Outdated

remove unused fallback

56a5e93

flying-sheep requested a review from ilan-gold January 8, 2026 14:00

ilan-gold approved these changes Jan 8, 2026

View reviewed changes

flying-sheep changed the title ~~chore: cleaner nullable dtype inference~~ fix: fix uint support for read_{,elem_}lazy Jan 8, 2026

flying-sheep changed the title ~~fix: fix uint support for read_{,elem_}lazy~~ fix: allow reading nullable uint arrays in read_{,elem_}lazy Jan 8, 2026

relnote

177c07b

test uints

915d709

flying-sheep requested a review from ilan-gold January 8, 2026 15:07

ilan-gold approved these changes Jan 8, 2026

View reviewed changes

flying-sheep added 2 commits January 8, 2026 16:42

fix dtype comparison

d1ed943

fix view shape test

b317298

flying-sheep added the run-gpu-ci label Jan 8, 2026

Merge branch 'main' into pa/clean-infer-nullable-dtype

1e90bf0

github-actions Bot removed the run-gpu-ci label Jan 8, 2026

flying-sheep enabled auto-merge (squash) January 8, 2026 16:08

fix test_concat_to_memory_obs_dtypes

587da01

flying-sheep added the skip-gpu-ci label Jan 8, 2026

flying-sheep merged commit 981174b into main Jan 8, 2026
22 of 24 checks passed

flying-sheep deleted the pa/clean-infer-nullable-dtype branch January 8, 2026 16:40

lumberbot-app Bot added the Still Needs Manual Backport label Jan 8, 2026

flying-sheep added a commit that referenced this pull request Jan 8, 2026

Backport PR #2287: fix: allow reading nullable uint arrays in `read_{…

b695559

…,elem_}lazy`

flying-sheep mentioned this pull request Jan 8, 2026

Backport PR #2287: fix: allow reading nullable uint arrays in read_{,elem_}lazy #2292

Merged

scverse deleted a comment from lumberbot-app Bot Jan 8, 2026

flying-sheep removed the Still Needs Manual Backport label Jan 8, 2026

flying-sheep added a commit that referenced this pull request Jan 8, 2026

Backport PR #2287: fix: allow reading nullable uint arrays in `read_{…

d9ec2e4

…,elem_}lazy` (#2292)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: allow reading nullable uint arrays in `read_{,elem_}lazy`#2287

fix: allow reading nullable uint arrays in `read_{,elem_}lazy`#2287
flying-sheep merged 10 commits into
mainfrom
pa/clean-infer-nullable-dtype

flying-sheep commented Jan 8, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Jan 8, 2026 •

edited

Loading

Uh oh!

ilan-gold commented Jan 8, 2026

Uh oh!

flying-sheep commented Jan 8, 2026

Uh oh!

ilan-gold left a comment

Uh oh!

Uh oh!

flying-sheep commented Jan 8, 2026 •

edited

Loading

Uh oh!

flying-sheep commented Jan 8, 2026 •

edited

Loading

Uh oh!

ilan-gold left a comment

Uh oh!

flying-sheep commented Jan 8, 2026

Uh oh!

ilan-gold commented Jan 8, 2026

Uh oh!

ilan-gold commented Jan 8, 2026

Uh oh!

flying-sheep commented Jan 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

flying-sheep commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ilan-gold commented Jan 8, 2026

Uh oh!

flying-sheep commented Jan 8, 2026

Uh oh!

ilan-gold left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

flying-sheep commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

flying-sheep commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ilan-gold left a comment

Choose a reason for hiding this comment

Uh oh!

flying-sheep commented Jan 8, 2026

Uh oh!

ilan-gold commented Jan 8, 2026

Uh oh!

ilan-gold commented Jan 8, 2026

Uh oh!

flying-sheep commented Jan 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

flying-sheep commented Jan 8, 2026 •

edited

Loading

codecov Bot commented Jan 8, 2026 •

edited

Loading

flying-sheep commented Jan 8, 2026 •

edited

Loading

flying-sheep commented Jan 8, 2026 •

edited

Loading