perf: speed up backed sparse integer indexing by tippered1-debug · Pull Request #2506 · scverse/anndata

tippered1-debug · 2026-06-17T23:15:45Z

Summary

Integer indexing of backed sparse datasets currently reads every selected row or column separately, even when most of the indices form contiguous runs.

This PR groups those runs and reads them as slices. Shuffled indices and duplicates are restored to their original order after reading. Highly fragmented indexers continue to use the existing path, since slicing does not help in that case.

Related to #1224.

Benchmark

Benchmarked on a 10000 x 10000 sparse matrix with density 0.01. Each indexer contains 2048 elements. Results are medians of three runs.

HDF5 CSR

Single run: 7.10 ms → 0.73 ms — 9.7x faster
Multiple runs: 6.81 ms → 0.74 ms — 9.2x faster
Clustered, shuffled: 7.04 ms → 1.03 ms — 6.8x faster
Clustered with duplicates: 7.02 ms → 0.83 ms — 8.5x faster

Zarr CSR

Single run: 16.60 ms → 3.70 ms — 4.5x faster
Multiple runs: 16.27 ms → 13.92 ms — 1.2x faster
Clustered, shuffled: 18.91 ms → 14.30 ms — 1.3x faster
Clustered with duplicates: 19.19 ms → 8.53 ms — 2.2x faster

Fragmented indexers continue to use the existing path and stayed close to the previous timings.

Closes #
Tests added

Release note not necessary because:

codecov · 2026-06-17T23:17:39Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 85.71%. Comparing base (6d70d77) to head (653f5b6).
⚠️ Report is 8 commits behind head on main.
✅ All tests successful. No failed tests found.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2506      +/-   ##
==========================================
- Coverage   87.61%   85.71%   -1.91%     
==========================================
  Files          49       49              
  Lines        7693     7719      +26     
==========================================
- Hits         6740     6616     -124     
- Misses        953     1103     +150

Files with missing lines	Coverage Δ
src/anndata/_core/sparse_dataset.py	`92.15% <100.00%> (+0.01%)`	⬆️

... and 7 files with indirect coverage changes

ilan-gold

Can this reuse subset_by_major_axis_mask? It looks eerily familiar. Or could that method be removed in favor of this code path? I'm not sure what the performance considerations here are but I think they are probably trivial i.e., operations on a small 1d array.

Thanks for the contribution!

perf-backed-sparse-integer-indexing

f5e76c8

drop dead sparse branches

653f5b6

ilan-gold reviewed Jun 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: speed up backed sparse integer indexing#2506

perf: speed up backed sparse integer indexing#2506
tippered1-debug wants to merge 2 commits into
scverse:mainfrom
tippered1-debug:perf-backed-sparse-integer-indexing

tippered1-debug commented Jun 17, 2026

Uh oh!

codecov Bot commented Jun 17, 2026 •

edited

Loading

Uh oh!

ilan-gold left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

tippered1-debug commented Jun 17, 2026

Summary

Benchmark

HDF5 CSR

Zarr CSR

Uh oh!

codecov Bot commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ilan-gold left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov Bot commented Jun 17, 2026 •

edited

Loading