Anndata and zarr

### Question

Hey all,

I have a few anndata datasets with sparse csr X matrices (each is with ~10M cells and 40K genes, with parity of about 5%).

I want to be able to quickly load whole rows from these datasets (say given a query, load all rows based on a condition on the obs table).
Currently I am taking the anndata object and converting it to tileDB, but I recently encountered the zarr file format, and specifically the support of zarr v3 in anndata.

I have a few questions regarding zarr:

1. Is Zarr v3 would be a good fit for our use case? Should I expect improvement over tileDB?
2. Are there some guidelines on what codec to use? Chunk sizes?
3. Are there some guidelines as to how to benefit from concurrency? I see dask being used in many places together with zarr.

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Anndata and zarr #2145

Question

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Anndata and zarr #2145

Description

Question

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions