Skip to content

feat: make obs/var backend-agnostic with a DataFrameLike contract#2516

Open
srivarra wants to merge 4 commits into
scverse:mainfrom
srivarra:dataframe-backends
Open

feat: make obs/var backend-agnostic with a DataFrameLike contract#2516
srivarra wants to merge 4 commits into
scverse:mainfrom
srivarra:dataframe-backends

Conversation

@srivarra

@srivarra srivarra commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Using narwhals as the conversion layer between DataFrame implementations, and makes Dataset2D "dataframe-like" in the eyes of narwhals.

  • Release note not necessary because: dunno about this one

I built a thin narwhals translation layer between AnnData.obs/AnnData.var and the DataFrame backends narwhals supports. Now .obs/.var accept a Polars DataFrame, among others; cuDF should work too, though I haven't tested it.

The DataFrameLike contract comes from #2328. A small narwhals plugin converts a Dataset2D to any other DataFrame: it materializes the Dataset2D in-memory as a pandas df, which narwhals knows how to handle. Therefore it stops being lazy... there may be a way to keep the laziness as narwhals has lazy backends such as Dask and Polars' LazyDataFrame, but I haven't looked into that yet.

I also added a couple of convenience methods, AnnData.obs_as(...) and AnnData.var_as(...), where you can convert it to another eager DataFrame type (Pandas, Polars, Modin, cuDF, pyarrow). We can discuss if that's the right approach or not.

I've deferred the concat and merge API for now, we could try to find a way to add it here or leave it for later.

I remember I saw something in the scverse zulip about documenting if AI was used for PRs but I'm not sure if it's in PR templates. I did use AI throughout the process for this PR.

Remaining TODOS:

  • Play around to see if it works well with cuDF
  • Fix poor test coverage
  • Dataset2D -> Narwhals LazyFrame somehow?

srivarra and others added 2 commits June 26, 2026 10:32
Using narwhals as the conversion layer between DataFrame implementations, makes Dataset2D "dataframe-like" in the eyes of narwhals.

Signed-off-by: Sricharan Reddy Varra <sricharanvarra@gmail.com>
@codecov

codecov Bot commented Jun 26, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 84.69388% with 15 lines in your changes missing coverage. Please review.
✅ Project coverage is 85.66%. Comparing base (0569b0c) to head (e29c03a).
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/anndata/_core/_dataframe_backend.py 90.00% 5 Missing ⚠️
src/anndata/_core/aligned_df.py 80.00% 5 Missing ⚠️
src/anndata/_core/anndata.py 63.63% 4 Missing ⚠️
src/anndata/_io/specs/registry.py 88.88% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2516      +/-   ##
==========================================
- Coverage   87.66%   85.66%   -2.01%     
==========================================
  Files          49       50       +1     
  Lines        7729     7797      +68     
==========================================
- Hits         6776     6679      -97     
- Misses        953     1118     +165     
Files with missing lines Coverage Δ
src/anndata/_core/index.py 95.25% <100.00%> (+0.01%) ⬆️
src/anndata/_io/specs/lazy_methods.py 96.29% <ø> (ø)
src/anndata/acc/__init__.py 96.64% <100.00%> (ø)
src/anndata/utils.py 86.49% <ø> (-0.85%) ⬇️
src/anndata/_io/specs/registry.py 94.59% <88.88%> (-0.27%) ⬇️
src/anndata/_core/anndata.py 86.55% <63.63%> (-0.44%) ⬇️
src/anndata/_core/_dataframe_backend.py 90.00% <90.00%> (ø)
src/anndata/_core/aligned_df.py 92.18% <80.00%> (-4.54%) ⬇️

... and 7 files with indirect coverage changes

srivarra added 2 commits June 26, 2026 11:06
Signed-off-by: Sricharan Reddy Varra <sricharanvarra@gmail.com>
Signed-off-by: Sricharan Reddy Varra <sricharanvarra@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DataFrame API for obs and var keys via runtime-checkable Protocol

1 participant