test: add Dask chunk grid benchmark scaffold by ehsanestaji · Pull Request #2465 · scverse/anndata

ehsanestaji · 2026-05-21T11:11:05Z

Summary

This adds an exploratory benchmark scaffold for #2036 so we can compare virtual Dask chunk choices against HDF5/Zarr on-disk chunk layouts before changing AnnData defaults.

The benchmark runner:

creates dense X arrays with controlled HDF5/Zarr chunks and optional Zarr v3 shards
reads X lazily through anndata.experimental.read_elem_lazy
varies virtual Dask chunks, worker counts, and thread/process settings
records runtime/package metadata, store size, task count, elapsed time, result shape/size, and coarse process/worker RSS readings
includes array-level workloads and a Scanpy-style scanpy_normalize_log1p workload

This also adds a small notebook for summarizing the generated CSV and README instructions for smoke/larger-grid runs. Generated benchmark outputs are ignored under benchmarks/results.

Local signal

A modest local grid (3000x800, HDF5/Zarr, on-disk chunks 250x800 and 1000x800, default vs 1000x-1, 1/2 workers, sum_axis0 and scanpy_normalize_log1p) produced 32 rows. For small on-disk chunks (250x800), 1000x-1 reduced task counts and improved timings in the 1-worker Scanpy-style case by about 1.16x for HDF5 and 1.26x for Zarr.

These numbers are only an initial local smoke signal; the intent is to make the benchmark/review path available before proposing default behavior changes.

Checks

ruff check benchmarks/scripts/dask_chunk_grid.py tests/test_dask_chunk_grid_script.py
.venv/bin/python -m pytest tests/test_dask_chunk_grid_script.py -q
python3 -m json.tool benchmarks/notebooks/dask_chunk_grid_analysis.ipynb
git diff --check
tiny end-to-end benchmark smoke run wrote 8 CSV result rows

review-notebook-app · 2026-05-21T11:11:11Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

for more information, see https://pre-commit.ci

codecov · 2026-05-21T11:12:46Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 85.60%. Comparing base (829abb6) to head (95bf7a1).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #2465   +/-   ##
=======================================
  Coverage   85.60%   85.60%           
=======================================
  Files          49       49           
  Lines        7671     7671           
=======================================
  Hits         6567     6567           
  Misses       1104     1104

ehsanestaji · 2026-05-21T11:13:19Z

I updated the title to a semantic PR title. I don’t seem to have permission to add labels on this repo, but the current validation failures look like triage metadata rather than code failures. Could a maintainer please add the appropriate labels, likely no milestone and skip-gpu-ci? benchmark, type: dask array, and performance 🐌 may also fit this benchmark-only PR.

ilan-gold · 2026-06-08T13:07:18Z

These numbers are only an initial local smoke signal; the intent is to make the benchmark/review path available before proposing default behavior changes.

So to this end, it might make sense for you @ehsanestaji to create a separate repo with benchmarks that can be run. This repo would produce a graph/table at the end and then propose changes/reasons for those changes. Does that sound reasonable? I don't really see why anndata itself should carry this code around.

Then there would be a PR to update the defaults, link to the benchmarking effort, and perhaps write a small documentation note/page explaining things here.

benchmarks: add dask chunk grid exploration

8f71d0e

ehsanestaji mentioned this pull request May 21, 2026

feat: sensible chunk defaults for dask #2036

Open

3 tasks

[pre-commit.ci] auto fixes from pre-commit.com hooks

95bf7a1

for more information, see https://pre-commit.ci

ehsanestaji changed the title ~~Benchmarks for Dask chunk default tradeoffs~~ test: add Dask chunk grid benchmark scaffold May 21, 2026

Zethson added this to the 0.12.17 milestone May 21, 2026

Zethson added performance 🐌 skip-gpu-ci type: dask array labels May 21, 2026

ilan-gold modified the milestones: 0.12.17, 0.12.18 Jun 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

test: add Dask chunk grid benchmark scaffold#2465

test: add Dask chunk grid benchmark scaffold#2465
ehsanestaji wants to merge 2 commits into
scverse:mainfrom
ehsanestaji:explore/anndata-2036-dask-chunk-defaults

ehsanestaji commented May 21, 2026

Uh oh!

review-notebook-app Bot commented May 21, 2026

Uh oh!

codecov Bot commented May 21, 2026 •

edited

Loading

Uh oh!

ehsanestaji commented May 21, 2026

Uh oh!

ilan-gold commented Jun 8, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

ehsanestaji commented May 21, 2026

Summary

Local signal

Checks

Uh oh!

review-notebook-app Bot commented May 21, 2026

Uh oh!

codecov Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ehsanestaji commented May 21, 2026

Uh oh!

ilan-gold commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov Bot commented May 21, 2026 •

edited

Loading

ilan-gold commented Jun 8, 2026 •

edited

Loading