perf: avoid redundant sorting in h5py indexing by tippered1-debug · Pull Request #2496 · scverse/anndata

tippered1-debug · 2026-06-13T15:04:44Z

Summary

Avoid sorting HDF5 indices twice when they are already unique.

Duplicate indices still use the existing sorting and reconstruction path. Tests cover sorted and unsorted unique indices, duplicates, boolean masks, empty indices, and 1D/2D indexing.

Benchmark

Benchmarks were run from benchmarks/benchmarks/h5py_indexing.py. Before and after measurements used the same index arrays in the same process. Results are medians of 25 runs after 5 warmups.

Size	Scenario	Before	After	Speedup
10,000	`sorted_unique`	0.158 ms	0.077 ms	2.04×
10,000	`unsorted_unique`	0.822 ms	0.270 ms	3.04×
10,000	`duplicate_heavy`	0.252 ms	0.251 ms	1.01×
100,000	`sorted_unique`	1.839 ms	0.878 ms	2.10×
100,000	`unsorted_unique`	13.168 ms	4.621 ms	2.85×
100,000	`duplicate_heavy`	4.469 ms	4.437 ms	1.01×
1,000,000	`sorted_unique`	23.180 ms	11.251 ms	2.06×
1,000,000	`unsorted_unique`	171.516 ms	62.111 ms	2.76×
1,000,000	`duplicate_heavy`	59.565 ms	60.078 ms	0.99×

Unique-index cases are around 2–3× faster. Duplicate-heavy cases are unchanged.

Checklist

Closes #
Tests added

Release note not necessary because:

codecov · 2026-06-13T15:06:00Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 85.61%. Comparing base (58712ff) to head (1e6a575).
⚠️ Report is 13 commits behind head on main.
✅ All tests successful. No failed tests found.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2496      +/-   ##
==========================================
- Coverage   87.61%   85.61%   -2.00%     
==========================================
  Files          49       49              
  Lines        7684     7677       -7     
==========================================
- Hits         6732     6573     -159     
- Misses        952     1104     +152

Files with missing lines	Coverage Δ
src/anndata/_core/index.py	`95.11% <100.00%> (-0.13%)`	⬇️

... and 8 files with indirect coverage changes

ilan-gold · 2026-06-15T11:57:51Z

@sjfleming Could you take a look at this?

sjfleming · 2026-06-19T05:54:35Z

From what I can tell, I think this looks good, and I think it makes sense.

Selfishly, I'm looking at related tests I've written in cellarium-ml. All my tests pass using anndata 0.12.17. All tests pass using the current main branch of anndata. All tests pass using anndata from this PR branch as well. Those tests were my original impetus for #2066 . So this PR breaks nothing.

For some reason when I install from this PR branch the version shows up as 0.1.0.dev1746+g1e6a5757e which I thought was a bit strange, but the diff here seems like it's based on current main. I have not confirmed the above speedups, but it seems plausible.

Incidentally, is @tippered1-debug a bot? Is it okay to ask that? :)

tippered1-debug · 2026-06-19T13:41:22Z

No im not a bot:) I want to help make things better, especially tools that are used in research, medicine. I think that open source is one of the best ways to do that

sjfleming · 2026-06-19T18:20:03Z

Totally agree @tippered1-debug , my bad! ;) I think it’s a nice PR

ilan-gold · 2026-06-22T08:49:13Z

+    def time_process_index_for_h5py(self, size, scenario):
+        _process_index_for_h5py(self.idx)


We shouldn't be benchmarking private methods. Please either remove the benchmark or do it against a public method

perf: avoid redundant sorting in h5py indexing

1e6a575

ilan-gold added the skip-gpu-ci label Jun 15, 2026

ilan-gold added this to the 0.12.17 milestone Jun 15, 2026

ilan-gold modified the milestones: 0.12.17, 0.12.18 Jun 16, 2026

ilan-gold reviewed Jun 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: avoid redundant sorting in h5py indexing#2496

perf: avoid redundant sorting in h5py indexing#2496
tippered1-debug wants to merge 1 commit into
scverse:mainfrom
tippered1-debug:perf-process-index-h5py-unique

tippered1-debug commented Jun 13, 2026

Uh oh!

codecov Bot commented Jun 13, 2026 •

edited

Loading

Uh oh!

ilan-gold commented Jun 15, 2026

Uh oh!

sjfleming commented Jun 19, 2026

Uh oh!

tippered1-debug commented Jun 19, 2026

Uh oh!

sjfleming commented Jun 19, 2026

Uh oh!

ilan-gold Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		def time_process_index_for_h5py(self, size, scenario):
		_process_index_for_h5py(self.idx)

Uh oh!

Conversation

tippered1-debug commented Jun 13, 2026

Summary

Benchmark

Checklist

Uh oh!

codecov Bot commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ilan-gold commented Jun 15, 2026

Uh oh!

sjfleming commented Jun 19, 2026

Uh oh!

tippered1-debug commented Jun 19, 2026

Uh oh!

sjfleming commented Jun 19, 2026

Uh oh!

ilan-gold Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov Bot commented Jun 13, 2026 •

edited

Loading