Fix 870: Significant memory and runtime improvements for Ripley's L by jberg5 · Pull Request #1236 · scverse/squidpy

jberg5 · 2026-06-30T21:15:29Z

Description

Headline number: at 250,000 cells in a single cluster, memory usage drops from ~1.8tb to 0.2gb, and runtime drops from probably something like 40 minutes (extrapolated because I don't have 1.8tb of ram lol) to 6 minutes on a GCP n2-highmem-8.

Peak memory

n (cells)	`main` (pdist)	`branch` (KDTree)	reduction
20,000	11.5 GB	0.13 GB	~90x
30,000	25.6 GB	0.13 GB	~200x
40,000	45.4 GB	0.13 GB	~350x
100,000	~290 GB*	0.13 GB	~2,200x
250,000	~1.8 TB*	0.14 GB	~13,000x
500,000	~7.2 TB*	0.15 GB	~50,000x

Runtime

n (cells)	`main` (pdist)	`branch` (KDTree)	speedup
20,000	15.5 s	5.7 s	2.7x
30,000	34.0 s	10.8 s	3.2x
40,000	60.1 s	19.2 s	3.1x
100,000	~6.4 min*	77.5 s	~5x
250,000	~40 min*	6.0 min	~7x
500,000	~2.7 hr*	20.8 min	~8x

(* for Claude extrapolated because the process OOMed on main)

Previously, ripley's L calculation materialized O(n^2) pairwise distances (via pdist) and then broadcast that across the number of steps in support. In issue #870, at 250,000 cells, this is n * (n - 1) / 2 = 31,249,875,000 unordered pairs, multiplied by 50 steps means distances < support.reshape(-1, 1) would be a 2D bool array of 50 * 31,249,875,000 bytes, so roughly 1.5tb of memory (excluding the pdist intermediate, which would still exist and add another ~250gb on top of that), assuming all cells are in the same cluster. This would OOM on pretty much any reasonable hardware.

Fortunately, we can skip all of that by using a KDTree. Long story short once we build the binary tree once, using O(n) memory, and then two_point_correlation traverses this structure to find the number of points within each radius without materializing every pairwise distance. This gives us O(n) memory usage instead of O(n^2).

One thing to note: this narrows the list of valid metrics down to:

>>> KDTree.valid_metrics
['euclidean', 'l2', 'minkowski', 'p', 'manhattan', 'cityblock', 'l1', 'chebyshev', 'infinity']

whereas previously pdist would have accepted any of

[‘braycurtis’, ‘canberra’, ‘chebyshev’, ‘cityblock’, ‘correlation’, ‘cosine’, ‘dice’, ‘euclidean’, ‘hamming’, ‘jaccard’, ‘jensenshannon’, ‘mahalanobis’, ‘matching’, ‘minkowski’, ‘rogerstanimoto’, ‘russellrao’, ‘seuclidean’, ‘sokalsneath’, ‘sqeuclidean’, ‘yule’]

but I don't think any of the dropped ones were valid / sensical metrics for the kind of spatial stats that are happening here.

How has this been tested?

Running existing tests
Extensive benchmarking (both memory and runtime) across various problem sizes while verifying correctness againstmain.

Closes

closes #870

codecov · 2026-06-30T21:28:51Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.81%. Comparing base (a9966fd) to head (c76ef30).
⚠️ Report is 12 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1236      +/-   ##
==========================================
+ Coverage   75.32%   76.81%   +1.49%     
==========================================
  Files          56       63       +7     
  Lines        7936     9270    +1334     
  Branches     1295     1566     +271     
==========================================
+ Hits         5978     7121    +1143     
- Misses       1447     1547     +100     
- Partials      511      602      +91

Files with missing lines	Coverage Δ
src/squidpy/gr/_ripley.py	`96.52% <100.00%> (+0.03%)`	⬆️

... and 24 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Use KDTree for better ripley L mem usage

eea36eb

jberg5 added 2 commits June 30, 2026 14:46

better docs and more helpful error messages

61e74c5

cleanup

c76ef30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix 870: Significant memory and runtime improvements for Ripley's L#1236

Fix 870: Significant memory and runtime improvements for Ripley's L#1236
jberg5 wants to merge 3 commits into
scverse:mainfrom
jberg5:ripley-L-mem-usage

jberg5 commented Jun 30, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Jun 30, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

jberg5 commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

How has this been tested?

Closes

Uh oh!

codecov Bot commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jberg5 commented Jun 30, 2026 •

edited

Loading

codecov Bot commented Jun 30, 2026 •

edited

Loading