Skip to content

perf(radial): vectorize get_radial_zernikes via shared _zernike_scores (~2x)#75

Open
timtreis wants to merge 1 commit into
mainfrom
perf/radial-zernike-vectorize
Open

perf(radial): vectorize get_radial_zernikes via shared _zernike_scores (~2x)#75
timtreis wants to merge 1 commit into
mainfrom
perf/radial-zernike-vectorize

Conversation

@timtreis

@timtreis timtreis commented Jun 6, 2026

Copy link
Copy Markdown
Collaborator

What

get_radial_zernikes already built the Zernike basis on the masked foreground vectors (cheap), but reduced it with 2·K separate scipy.ndimage.sum_labels calls (one per moment × real/imag) and re-gathered pixels[ijv] inside every one. Profiling showed the reduction — not the basis — dominated.

This delegates the intensity-weighted moment sums to the shared cp_measure.utils._zernike_scores (added in #74 for get_zernike) with the pixel image as the per-pixel weight. The helper keeps the basis on the foreground vectors and segment-sums each moment by label with a single numpy.bincount, collapsing the whole reduction into one vectorised pass. The caller then normalises by each object's pixel count (radial Zernikes divide by pixel count, not the enclosing-circle area) and forms magnitude/phase.

This cashes in the weight= reuse hook deliberately left on _zernike_scores.

@timtreis timtreis force-pushed the perf/radial-zernike-vectorize branch from 6d79c10 to 3082e2c Compare June 6, 2026 16:12
@timtreis timtreis added the numpy label Jun 9, 2026
timtreis added a commit that referenced this pull request Jun 9, 2026
Reuse primitives.segment.label_to_idx_lut for the label->row map (correct
sizing, find_objects-based) instead of a hand-rolled reverse map keyed on
masks.max(); derive labels internally so get_zernike no longer needs its own
unique() pass. Single foreground gather, skip the identically-zero imaginary
segment-sum for m==0 moments, and precompute the azimuthal powers once.

Return (real_sums, imag_sums, radii, counts): radii feeds get_zernike's
pi*r**2 normalisation, counts the intensity-weighted radial Zernikes (PR #75),
which reuse this via the restored `weight` arg. Add weighted + count golden
tests vs centrosome so no path ships untested.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@timtreis timtreis force-pushed the perf/radial-zernike-vectorize branch 2 times, most recently from b4a8d5f to ee7ed1e Compare June 9, 2026 23:01
afermg pushed a commit that referenced this pull request Jun 17, 2026
`python -m cp_measure._bench.targets --base <ref> --head <ref>` resolves a PR
diff to exactly the measurement functions it changes, for the benchmark action.

Resolution is SYMBOL-level, not file-level: it builds a static symbol-reference
graph over the package (AST, resolving intra-package imports incl. submodule and
relative imports) and selects a feature iff its call graph transitively reaches a
changed symbol. So a shared-helper edit (e.g. utils._zernike_scores) selects only
the features that actually use it — verified on the real PRs: #74 -> {zernike},
#75 -> {radial_zernikes}, where file-closure would have over-selected the ~6
features whose modules merely import utils.

- Rooted at an explicit entry-point table (the get_* registry) so bulk.py's lazy
  numba/multimask imports can't cause an entry-point to be missed; a test
  cross-checks the table against the live registries by function identity.
- Reads everything from git refs (git show), diffing against the merge-base, so
  it matches CI and is correct for stacked PRs given the PR's real base.
- Three distinct states: benchmarked / skipped-unsupported (multimask, numba) /
  empty — a multimask-only PR is never mistaken for "no measurement change".
- Tolerates the get_ferret->get_feret cross-branch rename via name candidates.

Hermetic tests build a throwaway git repo + mini package to prove symbol-level
precision and the three states; a guarded test checks the real #74/#75 refs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
afermg added a commit that referenced this pull request Jun 17, 2026
* feat(synth): deterministic synthetic cell-image generator for benchmarking

Add `cp_measure.synth.generate(image_size, n_objects, n_channels, seed)` — the
shipped, importable generator for the PR-benchmark action (build step 1).

Produces a cell-like contiguous label mask (organic star-shaped cells placed by
gap-respecting dart-throwing, log-normal sizes, no degenerate ~1px objects) plus
intensity channels built from a shared smooth envelope + shared/independent
multi-scale Gaussian splats, so area, intensity, texture AND colocalisation
features all carry real signal. Output is a pure function of the inputs (version
stamped via `__version__`); placement is capacity-checked and raises loudly
rather than silently under-placing.

test/test_synth.py replaces the design's "eyeball the examples" gate with
programmatic acceptance asserts at the matrix corners (min-size×max-count,
max-size×min-count): determinism, contiguous exact count, no degenerate objects,
shape/texture/intensity signal, and a controlled sub-unity channel correlation.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* refactor(synth): review hardening — single cell-extent, sturdier tests, determinism

Apply the "fix now" set from the max-effort review of the generator (no
behavioural bugs were found; these harden maintainability, the test net, and
cross-version reproducibility):

- Extract `_cell_extent(base_r, amps)` as the single definition of a cell's
  radial reach, used by both the packing radius (worst-case amps) and the
  rasterisation window (actual amps). Removes the reach-vs-bulge drift risk that
  could silently break the no-overlap guarantee if one formula were edited.
- Strengthen the two toothless tests: texture now asserts median per-object std
  is well ABOVE the read-noise floor (a splat-removed regression collapses to
  ~noise and fails); organic-shape now asserts a boundary radial-roughness CV
  that plain disks fail (the old solidity<0.99 passed for pixelated disks).
  Both verified to fail on their intended regressions.
- Determinism: stable sort for tied radii; replace rng.choice(p=...) with
  inverse-CDF sampling on rng.random (version-stable draw count) so two
  separately-installed envs can't diverge. Bump __version__ 0.1.0 -> 0.2.0.
- Widen the brittle seed-averaged correlation band (0.4-0.7 -> 0.35-0.8) so a
  legitimate constant re-tune doesn't flip it.
- Per decisions: keep realistic PSF splat bleed; drop the unimplemented
  "clusters" docstring claim.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(bench): symbol-level PR target mapper (build step 2)

`python -m cp_measure._bench.targets --base <ref> --head <ref>` resolves a PR
diff to exactly the measurement functions it changes, for the benchmark action.

Resolution is SYMBOL-level, not file-level: it builds a static symbol-reference
graph over the package (AST, resolving intra-package imports incl. submodule and
relative imports) and selects a feature iff its call graph transitively reaches a
changed symbol. So a shared-helper edit (e.g. utils._zernike_scores) selects only
the features that actually use it — verified on the real PRs: #74 -> {zernike},
#75 -> {radial_zernikes}, where file-closure would have over-selected the ~6
features whose modules merely import utils.

- Rooted at an explicit entry-point table (the get_* registry) so bulk.py's lazy
  numba/multimask imports can't cause an entry-point to be missed; a test
  cross-checks the table against the live registries by function identity.
- Reads everything from git refs (git show), diffing against the merge-base, so
  it matches CI and is correct for stacked PRs given the PR's real base.
- Three distinct states: benchmarked / skipped-unsupported (multimask, numba) /
  empty — a multimask-only PR is never mistaken for "no measurement change".
- Tolerates the get_ferret->get_feret cross-branch rename via name candidates.

Hermetic tests build a throwaway git repo + mini package to prove symbol-level
precision and the three states; a guarded test checks the real #74/#75 refs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* revert(bench): drop the change-detection mapper; benchmark all exposed functions

Per design decision: the benchmark compares at the main exposed-function level —
run every public get_* feature base-vs-head and let the speedup table show what
changed (~1.0x = untouched). This removes the static AST symbol-graph mapper
(build step 2) entirely, along with its edge-case surface; benchmark cost is
controlled by the matrix size / per-function budget instead of pre-selection.

- Remove src/cp_measure/_bench/targets.py and test/test_targets.py (keep the
  _bench package for the upcoming runner).
- Remove accidentally-committed __pycache__/*.pyc and add a .gitignore for
  Python bytecode (the repo had none).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(bench): fixture/runner/comparator — benchmark all get_* head-vs-main

Build step 2 (v3): the benchmark core, three composable pieces.

- fixtures.py: build the (image_size x object_count x seed) matrix once from the
  pinned synth generator, serialise to .npz with a manifest + per-array sha256
  (stamps synth.__version__). Both envs load identical, checksum-verified inputs.
- run.py: `python -m cp_measure._bench.run` times EVERY public get_* function
  (core arity-1, correlation arity-2, plus a [legacy] variant where a `legacy`
  param exists) over the fixtures in one environment -> JSON. Channels normalised
  to [0,1] (the pipeline convention; get_texture requires it). Per-call warmup +
  reps (min), SIGALRM per-call timeout, thread-pinning set before numpy import.
  Functions enumerated from the live registry at HEAD; a function that errors on
  synth input is recorded, not fatal.
- compare.py: `python -m cp_measure._bench.compare` diffs two run JSONs into a
  speedup table. speedup = main/head (>1 faster); per cell takes the min then the
  median across seeds; classifies faster/slower/within-noise/new/removed/no-data.
  Untouched functions land at ~1.0x — the "what changed" signal, no mapper needed.

Validated end-to-end on a smoke matrix (all 12 functions time ok incl. texture;
self-compare is 1.00x). The two-worktree/two-env orchestration is step 3 (workflow).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(bench): two-job benchmark workflow + orchestration driver (build step 3)

Wires fixtures -> run(head) + run(main) -> compare -> sticky PR comment.

- .github/workflows/benchmark.yml: triggered by the `benchmark` label (labels
  need write access, so the trigger is maintainer-gated) or workflow_dispatch.
  Two-job split: `build` runs untrusted PR code with `permissions: {}` (no token
  to steal, persist-credentials off); `report` holds pull-requests/issues:write
  but never checks out PR code — it only renders the artifact into a sticky
  `<!-- cp-bench -->` comment and removes the label. fetch-depth: 0 so `main` is
  present; concurrency cancels superseded runs.
- .github/scripts/run_benchmark.sh: installs head + main in two isolated uv envs,
  VENDORS head's synth.py + _bench/ into the main worktree so the generator and
  tooling are identical across both runs (only cp_measure.core.* differs), builds
  the fixtures once, runs both, compares.
- fixtures.py: add CI_MATRIX (bounded for hosted-runner limits, the workflow
  default; full DEFAULT via dispatch) + a `python -m cp_measure._bench.fixtures`
  build CLI.

Validated locally: script bash-syntax, YAML structure (tokenless build, gated
report), fixtures CLI, full test suite. End-to-end CI run is via workflow_dispatch.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* refactor(bench): review cleanup — leaner comments/docstrings + small fixes

Elegance/LOC pass over the benchmark PR (net ~-50 lines, mostly verbose
docstrings) plus the real findings from the review:

Fixes:
- workflow: `gh api --paginate | head` SIGPIPE under pipefail could abort the
  comment post on PRs with many comments — use a single `?per_page=100` page +
  `--jq 'first'` instead. Add `if: always()` to upload-artifact so a failed run
  still surfaces partial output. Drop the redundant matrix default + useless cat.
- run.py: build call-args INSIDE the guarded path so an input a function can't
  handle (e.g. a 1-channel fixture) is recorded per-cell, not fatal. Record the
  matrix + fixture count in meta so the comment shows which sweep ran; note the
  shared-fn JIT caveat for [legacy] variants.
- compare.py: label the status column (was a blank header); guard head_t==0;
  surface the matrix scope in the header.
- run_benchmark.sh: trap-based cleanup of the temp dir/worktree/venvs (was leaked).
- .gitignore: ignore local benchmark artifacts (bench-out/, *.npz).

Cleanup: trim the synth/bench/test module docstrings and synth's per-constant
comments to their load-bearing facts; collapse generate()'s numpydoc block; drop
the unused load_fixture(verify=...) flag; de-clever _norm01's constant-image path.
Kept the _cell_extent single-source helper (an earlier review's no-overlap fix).

31 tests pass, ruff clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test(bench): trim to a lean regression set

Consolidate the over-exhaustive acceptance tests (356 -> 110 lines, 7 tests):
- synth: one invariants test (shape/dtype/contiguous count, no degenerate
  objects, shape+size variety, intensity/texture/coloc signal) + determinism +
  edges, all at a single representative config instead of parametrising every
  check over both matrix corners. Drop the radial-roughness disk-vs-organic
  discriminator (eccentricity spread + size variety still catch a broken gen).
- bench: merge the fixture build/load/determinism cases, fold enumerate into the
  run integration test, and collapse the compare classification/render cases into
  one. Drop the trivial _norm01 and standalone CLI tests.

* style: ruff-format with current ruff (88-col line wraps)

* chore: stop tracking scratch tasks/ and .claude/ (added by mistake)

* refactor(bench): report raw main/head timings, drop noise-band classification

Per single-function resolution: show each function's main vs head time as
mean (min-max) over reps x seeds plus the raw main/head ratio, and let the
maintainer read their function directly. Removes the faster/slower/within-noise
band (which a noisy/sequential run could mislabel) and any normalisation; run.py
now stores just the rep times.

* Revert "Merge pull request #80 from afermg/feat/synth-bench-generator"

This reverts commit 3809218, reversing
changes made to 7f67606.

* feat(bench): synthetic-image PR performance benchmark action

Single revertable unit; re-introduces the benchmark mechanic (reverted #80) with
the harness-source fix folded in.

- cp_measure/synth.py: deterministic synthetic cell-image generator.
- cp_measure/_bench/{fixtures,run,compare}.py: build the (size x count x seed)
  fixture matrix, time every get_* head-vs-main, report raw mean (min-max) timings.
- .github/workflows/benchmark.yml + scripts/run_benchmark.sh: label-triggered
  two-job workflow. The harness is checked out from main (not the PR head, which a
  perf PR does not carry); the PR head is fetched as a worktree, main's synth.py +
  _bench/ vendored in, and only cp_measure.core differs between the timed runs.

* Revert "feat(bench): synthetic-image PR performance benchmark action"

* demo: self-contained PR benchmark action (simplified)

Everything lives on this branch (nothing on main). on: pull_request runs the
workflow from the PR branch on every commit, times every public get_* on the PR
head vs main, and posts a sticky comment with the timing table.

- synth.py: minimal generator — n ellipses on a regular grid + a few random
  Gaussian blobs per channel.
- _bench/{fixtures,run,compare}.py: build fixtures, time all get_* head-vs-main,
  raw timings table.
- .github: single-job pull_request workflow (no label, no pull_request_target) +
  head-based driver that vendors the tooling into a main worktree.
- includes the granularity speedup (#76) so the demo table shows a real delta.

* demo: move the whole benchmark into .github/scripts (no package module)

Remove src/cp_measure/{synth.py,_bench/} and their tests. Everything now lives in
.github/scripts/benchmark.py — a single self-contained script (generator + runner
+ comparator); each env regenerates the same seeded inputs, so nothing is shared
or vendored. Table now references the commit and emits one grid per affected
function (speedup >= 1.1x) with image size as rows and object count as columns.

* demo: extend benchmark matrix to 4 sizes x 2 counts (256–2048)

Grid now spans image sizes 256/512/1024/2048 (rows) x object counts 16/64
(cols); bump the job timeout to 45m for the larger sizes.

* demo: median per cell, 3 seeds x 3 counts, dynamic affected-threshold caption

- per-cell aggregate is now the median (over seeds x reps); speedup = median/median
- matrix: sizes 256-2048 (rows) x counts 16/64/256 (cols) x 3 seeds = 36 cells
- caption derives the cutoff from AFFECTED (≥1.1x) instead of hardcoding >1
- job timeout 60m for the larger matrix

* demo: drop 256px image size (unrealistically small)

Sizes now 512/1024/2048 x counts 16/64/256 x 3 seeds = 27 cells (3x3 grid).

* demo: shift matrix down to 256-1024 (drop slow 2048)

Sizes 256/512/1024 x counts 16/64/256 x 3 seeds — 2048 was too slow per commit.

* demo: report regressions too — flag functions that moved >=1.05x either way

Was speedup>=1.1x only (regression-blind: a slowdown reported 'no change'). Now a
function is shown if any cell is >=1.05x faster OR <=1/1.05x slower; header notes
>1 faster / <1 slower.

* demo: slim benchmark.py — drop unused bits

- remove the n_channels param (always 2: ch0 for core, ch0+ch1 for coloc)
- drop 'from __future__ import annotations' (unneeded on the 3.12 runner)
- .gitignore: drop *.npz (no fixture files are written anymore)

* revert(granunlarity): it has an independent PR, was used as test

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Alán F. Muñoz <afer.mg@gmail.com>
timtreis added a commit that referenced this pull request Jun 26, 2026
Reuse primitives.segment.label_to_idx_lut for the label->row map (correct
sizing, find_objects-based) instead of a hand-rolled reverse map keyed on
masks.max(); derive labels internally so get_zernike no longer needs its own
unique() pass. Single foreground gather, skip the identically-zero imaginary
segment-sum for m==0 moments, and precompute the azimuthal powers once.

Return (real_sums, imag_sums, radii, counts): radii feeds get_zernike's
pi*r**2 normalisation, counts the intensity-weighted radial Zernikes (PR #75),
which reuse this via the restored `weight` arg. Add weighted + count golden
tests vs centrosome so no path ships untested.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@timtreis timtreis force-pushed the perf/zernike-vectorize branch from 35ca892 to 5d081ea Compare June 26, 2026 21:45
timtreis added a commit that referenced this pull request Jun 26, 2026
#89 centralized label sanitation at the entry points and deleted
primitives/segment.py (label_to_idx_lut), which _zernike_scores imported.

- Drop the deleted lookup table: under the contiguous 1..N contract the segment
  index is label - 1, so it is a plain arange (guard masks.max() on size-0).
  _zernike_scores stays in utils.py (reused by #75) — no new module.
- Import centrosome explicitly as `from centrosome import zernike` (no
  centrosome.zernike.* attribute access) in utils and the sizeshape caller.
- test_zernike: drop the non-contiguous test (unsupported), drop redundant
  single/multi-object tests (the irregular case covers them) and parametrize it
  over zernike numbers 5/9/14, drop the weighted test + _weighted_reference
  helper (more test code than the branch it checked), rename the generator to
  _generate_square_objects.
- test_sanitize: the vectorized get_zernike no longer raises IndexError on gapped
  IDs (it pads to max-label rows), so assert raw output differs from the
  sanitized wrapper instead.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@timtreis timtreis force-pushed the perf/zernike-vectorize branch from 7bf4c98 to a7b3278 Compare June 26, 2026 22:08
afermg added a commit that referenced this pull request Jun 30, 2026
* perf(sizeshape): vectorize get_zernike on foreground pixels (~8x)

`get_zernike` delegated to `centrosome.zernike.zernike`, which scatters the
per-pixel Zernike basis into a full `(H, W, K)` complex array (~560 MB at
1080^2, K~30) and scores via ~60 full-image `scipy.ndimage.sum` calls. Both
costs scale with image area rather than object pixels, so most work lands on
background.

Replace it with a pure-numpy `_zernike_scores` helper that keeps the basis on
the masked foreground vectors and segment-sums each moment by label with a
single `numpy.bincount`. The Horner basis evaluation is copied verbatim from
`centrosome.construct_zernike_polynomials` (same lookup-table coefficients,
`r**2 > 1` cutoff and `z = y + i*x` convention), so results track the installed
centrosome to floating-point round-off (bit-identical in practice).

The helper lives in `cp_measure.utils` (alongside `masks_to_ijv`) and takes an
optional pixel `weight` so the sibling `get_radial_zernikes` can reuse it for
intensity-weighted moments in a follow-up PR.

Measured: 8.6x at typical density (782->98 ms large tier), 3.7-44x depending on
foreground fraction. No new deps. Also switches from `range(1, n+1)` to the
actual unique labels (identical on contiguous masks, correct for
non-contiguous).

Adds golden + edge tests (empty, single-pixel r=0, non-contiguous labels,
edge-touching, non-default zernike_numbers) asserting parity with centrosome.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* refactor(sizeshape): make _zernike_scores a complete reusable primitive

Reuse primitives.segment.label_to_idx_lut for the label->row map (correct
sizing, find_objects-based) instead of a hand-rolled reverse map keyed on
masks.max(); derive labels internally so get_zernike no longer needs its own
unique() pass. Single foreground gather, skip the identically-zero imaginary
segment-sum for m==0 moments, and precompute the azimuthal powers once.

Return (real_sums, imag_sums, radii, counts): radii feeds get_zernike's
pi*r**2 normalisation, counts the intensity-weighted radial Zernikes (PR #75),
which reuse this via the restored `weight` arg. Add weighted + count golden
tests vs centrosome so no path ships untested.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(zernike): suppress expected single-pixel divide warning in _zernike_scores

Single-pixel objects have an enclosing-circle radius of 0, so the unit-disk
coordinate division is 0/0 -> NaN (discarded later by the r**2 > 1 cutoff,
matching centrosome). Wrap it in numpy.errstate so the expected RuntimeWarning
isn't emitted from this shared helper (it would otherwise crash callers running
under -W error::RuntimeWarning).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* style(test): Wrap long lines in `_weighted_reference` function

* fix(zernike): adapt to post-#89 contract; address review

#89 centralized label sanitation at the entry points and deleted
primitives/segment.py (label_to_idx_lut), which _zernike_scores imported.

- Drop the deleted lookup table: under the contiguous 1..N contract the segment
  index is label - 1, so it is a plain arange (guard masks.max() on size-0).
  _zernike_scores stays in utils.py (reused by #75) — no new module.
- Import centrosome explicitly as `from centrosome import zernike` (no
  centrosome.zernike.* attribute access) in utils and the sizeshape caller.
- test_zernike: drop the non-contiguous test (unsupported), drop redundant
  single/multi-object tests (the irregular case covers them) and parametrize it
  over zernike numbers 5/9/14, drop the weighted test + _weighted_reference
  helper (more test code than the branch it checked), rename the generator to
  _generate_square_objects.
- test_sanitize: the vectorized get_zernike no longer raises IndexError on gapped
  IDs (it pads to max-label rows), so assert raw output differs from the
  sanitized wrapper instead.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test(sanitize): simpler raw-vs-sanitized assertion, no padding specifics

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test(zernike): drop redundant 3D-empty test

The ndim==3 guard is a #35 band-aid; dimensionality dispatch belongs at the entry
points, not asserted in a per-measurement test. test_core_measurements already
covers the 3D-empty case.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Alán F. Muñoz <afer.mg@gmail.com>
Base automatically changed from perf/zernike-vectorize to main June 30, 2026 16:55
@timtreis timtreis force-pushed the perf/radial-zernike-vectorize branch from ee7ed1e to da10e77 Compare June 30, 2026 17:49
@timtreis timtreis marked this pull request as ready for review June 30, 2026 17:53
@timtreis timtreis force-pushed the perf/radial-zernike-vectorize branch from 4c49dfb to 9b29640 Compare June 30, 2026 18:01
…s (~2x)

Delegate the intensity-weighted moment sums to cp_measure.utils._zernike_scores
(the masked-basis + segment-sum machinery shared with get_zernike): keep the
basis on foreground pixels and segment-sum each moment with numpy.bincount
instead of centrosome's full (H, W, K) scatter + 2K scipy.ndimage.sum_labels
passes. Normalise by each object's pixel count, and drop the old empty-case
branch (empty input now yields (0, K) arrays naturally).

Assumes the contiguous 1..N label contract (see cp_measure._sanitize); imports
centrosome explicitly as `from centrosome import zernike`.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@timtreis timtreis force-pushed the perf/radial-zernike-vectorize branch from 9b29640 to 80a01c1 Compare June 30, 2026 18:07
@github-actions

Copy link
Copy Markdown

Benchmark — 80a01c1 vs main

speedup = main/head (>1 faster, <1 slower) · median per cell · showing functions that moved ≥1.05× either way

granularity

size \ objects 16 64 256
256 0.98× 1.00× 1.01×
512 0.94× 0.91× 0.98×
1024 0.97× 0.97× 1.00×

manders_fold

size \ objects 16 64 256
256 1.02× 1.02× 1.00×
512 1.02× 1.01× 1.00×
1024 0.95× 0.95× 1.01×

radial_distribution

size \ objects 16 64 256
256 0.96× 0.96× 0.96×
512 0.98× 0.95× 0.96×
1024 1.01× 0.97× 0.93×

radial_zernikes

size \ objects 16 64 256
256 2.26× 2.12× 2.29×
512 2.04× 2.10× 2.20×
1024 2.33× 2.40× 2.58×

texture

size \ objects 16 64 256
256 0.96× 1.06× 1.01×
512 1.02× 1.03× 1.02×
1024 0.99× 1.00× 0.99×

zernike

size \ objects 16 64 256
256 0.91× 0.95× 0.96×
512 0.88× 0.87× 0.87×
1024 1.00× 0.96× 0.97×

@timtreis timtreis requested a review from afermg June 30, 2026 18:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant