Skip to content

pandas>=3 breaks with xarray.Dataset in obsm/varm when using anndata.concat and load_annotation_index #2475

Description

@ilan-gold

Please make sure these conditions are met

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of anndata.
  • (optional) I have confirmed this bug exists on the main branch of anndata.

Report

Code:

# /// script
# requires-python = ">=3.12"
# dependencies = [
#   "anndata[lazy]@git+https://github.com/scverse/anndata.git",
#   "pytest",
# ]
# ///

from pathlib import Path
from tempfile import TemporaryDirectory

import anndata as ad
import numpy as np
import pandas as pd
from anndata.tests.helpers import gen_typed_df

paths = []
M = 100
N = 50
n_datasets = 3
with TemporaryDirectory() as tmp_dir:
    for dataset_index in range(n_datasets):
        orig_path = Path(tmp_dir) / f"{dataset_index}.zarr"
        paths.append(orig_path)
        obs_names = pd.Index(f"cell_{dataset_index}_{i}" for i in range(M))
        var_names = pd.Index(f"gene_{i}{f'_{dataset_index}_ds' if (i % 2) else ''}" for i in range(N))
        obs = gen_typed_df(M, obs_names)
        var = gen_typed_df(N, var_names)
        orig = ad.AnnData(
            obs=obs,
            var=var,
            X=np.random.binomial(100, 0.005, (M, N)).astype(np.float32),
            varm={"df": var},
            obsm={"df": obs},
        )
        orig.write_zarr(orig_path)
    print(ad.concat([ad.experimental.read_lazy(p, load_annotation_index=False) for p in paths], join="outer"))

Traceback:

  File "/Users/ilangold/Projects/Theis/annbatch/tester.py", line 37, in <module>
    print(ad.concat([ad.experimental.read_lazy(p, load_annotation_index=False) for p in paths], join="outer"))
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ilangold/.cache/uv/environments-v2/tester-b444df72e492a8eb/lib/python3.12/site-packages/anndata/_core/merge.py", line 1787, in concat
    {k: r(v, axis=0) for k, v in getattr(a, f"{alt_axis_name}m").items()}
        ^^^^^^^^^^^^
  File "/Users/ilangold/.cache/uv/environments-v2/tester-b444df72e492a8eb/lib/python3.12/site-packages/anndata/_core/merge.py", line 552, in __call__
    return self.apply(el, axis=axis, fill_value=fill_value)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ilangold/.cache/uv/environments-v2/tester-b444df72e492a8eb/lib/python3.12/site-packages/anndata/_core/merge.py", line 563, in apply
    return self._apply_to_df_like(el, axis=axis, fill_value=fill_value)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ilangold/.cache/uv/environments-v2/tester-b444df72e492a8eb/lib/python3.12/site-packages/anndata/_core/merge.py", line 581, in _apply_to_df_like
    return el.reindex(self.new_idx, axis=axis, fill_value=fill_value)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ilangold/.cache/uv/environments-v2/tester-b444df72e492a8eb/lib/python3.12/site-packages/anndata/_core/xarray.py", line 423, in reindex
    el = el.reindex({index_dim: index}, method=None, fill_value=fill_value)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ilangold/.cache/uv/environments-v2/tester-b444df72e492a8eb/lib/python3.12/site-packages/xarray/core/dataset.py", line 3700, in reindex
    return alignment.reindex(
           ^^^^^^^^^^^^^^^^^^
  File "/Users/ilangold/.cache/uv/environments-v2/tester-b444df72e492a8eb/lib/python3.12/site-packages/xarray/structure/alignment.py", line 1084, in reindex
    aligner.align()
  File "/Users/ilangold/.cache/uv/environments-v2/tester-b444df72e492a8eb/lib/python3.12/site-packages/xarray/structure/alignment.py", line 667, in align
    self.reindex_all()
  File "/Users/ilangold/.cache/uv/environments-v2/tester-b444df72e492a8eb/lib/python3.12/site-packages/xarray/structure/alignment.py", line 638, in reindex_all
    self.results = tuple(
                   ^^^^^^
  File "/Users/ilangold/.cache/uv/environments-v2/tester-b444df72e492a8eb/lib/python3.12/site-packages/xarray/structure/alignment.py", line 627, in _reindex_one
    return obj._reindex_callback(
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ilangold/.cache/uv/environments-v2/tester-b444df72e492a8eb/lib/python3.12/site-packages/xarray/core/dataset.py", line 3406, in _reindex_callback
    reindexed_vars = alignment.reindex_variables(
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ilangold/.cache/uv/environments-v2/tester-b444df72e492a8eb/lib/python3.12/site-packages/xarray/structure/alignment.py", line 84, in reindex_variables
    new_var = var._getitem_with_mask(indxr, fill_value=fill_value_)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ilangold/.cache/uv/environments-v2/tester-b444df72e492a8eb/lib/python3.12/site-packages/xarray/core/variable.py", line 873, in _getitem_with_mask
    data = duck_array_ops.where(
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ilangold/.cache/uv/environments-v2/tester-b444df72e492a8eb/lib/python3.12/site-packages/xarray/core/duck_array_ops.py", line 417, in where
    promoted_x, promoted_y = as_shared_dtype([x, y], xp=xp)
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ilangold/.cache/uv/environments-v2/tester-b444df72e492a8eb/lib/python3.12/site-packages/xarray/core/duck_array_ops.py", line 311, in as_shared_dtype
    dtype = dtypes.result_type(*scalars_or_arrays, xp=xp)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ilangold/.cache/uv/environments-v2/tester-b444df72e492a8eb/lib/python3.12/site-packages/xarray/core/dtypes.py", line 338, in result_type
    return array_api_compat.result_type(*map(maybe_promote, arrays_and_dtypes), xp=xp)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ilangold/.cache/uv/environments-v2/tester-b444df72e492a8eb/lib/python3.12/site-packages/xarray/compat/array_api_compat.py", line 44, in result_type
    return xp.result_type(*arrays_and_dtypes)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
numpy.exceptions.DTypePromotionError: The DType <class 'numpy.dtypes.StringDType'> could not be promoted by <class 'numpy.dtypes._PyFloatDType'>. This means that no common DType exists for the given inputs. For example they cannot be stored in a single array unless the dtype is `object`. The full list of DTypes is: (<class 'numpy.dtypes.StringDType'>, <class 'numpy.dtypes._PyFloatDType'>)

Versions

anndata 0.14.0.dev3+g3d01ac122
numpy 2.4.6
pandas 3.0.3


pydantic-settings 2.14.1
annotated-types 0.7.0
pytest 9.0.3
python-dotenv 1.2.2
iniconfig 2.3.0
xarray 2026.4.0
six 1.17.0
session-info2 0.4.1
h5py 3.16.0
scipy 1.17.1
typing-inspection 0.4.2
cloudpickle 3.1.2
Pygments 2.20.0
scverse-misc 0.0.7
toolz 1.1.0
pydantic 2.13.4
numcodecs 0.16.5
natsort 8.4.0
google-crc32c 1.8.0
pydantic_core 2.46.4
dask 2026.3.0
packaging 26.2
charset-normalizer 3.4.7
fsspec 2026.4.0
pluggy 1.6.0
donfig 0.8.1.post1
zarr 3.2.1
typing_extensions 4.15.0
python-dateutil 2.9.0.post0
PyYAML 6.0.3
legacy-api-wrap 1.5


Python 3.12.2 | packaged by conda-forge | (main, Feb 16 2024, 20:54:21) [Clang 16.0.6 ]
OS macOS-15.1-arm64-arm-64bit

Metadata

Metadata

Assignees

No one assigned

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions