Skip to content

feat(numba/sizeshape): unified numba sizeshape backend (~3x, bit-exact)#78

Draft
timtreis wants to merge 8 commits into
integration/all-numbafrom
feat/numba-sizeshape
Draft

feat(numba/sizeshape): unified numba sizeshape backend (~3x, bit-exact)#78
timtreis wants to merge 8 commits into
integration/all-numbafrom
feat/numba-sizeshape

Conversation

@timtreis

@timtreis timtreis commented Jun 6, 2026

Copy link
Copy Markdown
Collaborator

Stacked / WIP base. Reviewable scope = the 4 sizeshape commits (compare). The base integration/all-numba is a local integration of the numba stack — the real base is main once #77 (numpy default sizeshape + primitives/_moments.py) and the numba stack (#59 bzyx, #60 coloc's labels_to_offsets, + the rest) have merged. On that base, drop the _moments.py copy (comes from #77) and the registry edit becomes a one-line add.

What

sizeshape was the one numba NO-GO and the largest remaining component of the numba pipeline. This adds a full numba backend by reimplementing every numba-addressable primitive bit-exact and keeping only the genuine import-boundary work (scipy Euclidean EDT) on host.

primitive source bit-exact speedup
moments (raw/central/normalized/Hu) + inertia fused _moment_kernel + shared derivation raw/inertia exact, rest ~1e-13 3.5×
area_convex / solidity numba monotone-chain hull (boundary pixels) + skimage grid_points_in_poly 142/142 exact 2.8×
perimeter / perimeter_crofton / euler label-aware neighbour-pattern kernels exact / ~1e-13 5.6×
axis lengths / eccentricity / orientation derived from kernel central moments (option B) ~1e-13, symmetric fallback exact
max/mean/median radius scipy Euclidean EDT (kept) exact
area, bbox, centroid, extent, area_filled cheap regionprops (moment-free) exact

Option B (deriving axes/ecc/orientation from the central moments) makes the regionprops call fully moment-free — verified 0 ms einsum.

Performance

get_sizeshape: 305 → 100 ms (~3×) on the large tile (1080²/142 obj). All 78 features match the numpy backend (raw moments / area_convex / euler exact; everything else ≤ 4e-7).

Design

  • One shared foreground-pixel pass (labels_to_offsets) feeds the kernels; to_bzyx 2D-only, 3D volumes fall back to the numpy baseline.
  • The convex hull keeps skimage's exact grid_points_in_poly rasteriser — no risky pnpoly port — and only replaces the slow QHull construction with a numba monotone-chain over boundary pixels (hull(boundary) == hull(object)).
  • axes_eccentricity_orientation added to primitives/_moments.py so the numpy lane (perf(sizeshape): scatter-based spatial moments + inertia, replacing regionprops einsum (~1.6x) #77) can later adopt the same einsum elimination.
  • Rewrote bulk._numba_registries (the integration merge had left it un-parseable) to compose all numba backends and register sizeshape; set_accelerator("numba") now routes it.

Tests

24 golden tests: each kernel vs regionprops (multi / non-contiguous / edge-touching / degenerate / holed-object euler / empty), end-to-end get_sizeshape vs the numpy backend (incl. calculate_advanced/new_features variants), 3D fallback, and dispatch routing. ruff clean.

timtreis and others added 4 commits June 6, 2026 21:59
First piece of the unified numba sizeshape lane. `regionprops_table` computes the
spatial/central/normalized/Hu moments (and inertia) per-region via an einsum whose
contraction path is re-derived for every object. This adds a fused numba kernel
(`core/numba/_sizeshape.py::_moment_kernel`) that computes the 16 raw + 16 central
moment matrices in two passes over the foreground pixels (bbox-min tracked inline),
using `labels_to_offsets` for the label->row map.

The derived quantities (normalized / Hu / inertia) live in the shared
`primitives/_moments.py` (refactored to expose `derive_normalized_hu` /
`normalized_from_central` / `hu_from_normalized` / `inertia_2d`), so the numpy
scatter and the numba kernel share one derivation. `spatial_moments_2d` (numba) is
a drop-in for the numpy accumulator: same object order, bit-identical raw/central
(0.00), normalized/Hu/inertia to round-off.

Measured: spatial moments 59 -> 17 ms (3.5x), large tile. Golden tests vs the numpy
accumulator AND regionprops (multi / non-contiguous / edge-touching / single-pixel /
inertia / empty).

NOT yet wired: the full `get_sizeshape` numba wrapper + dispatch registration, and
the convex-hull / perimeter / euler kernels (next phases). Built on the integration
numba stack; rebases onto main+#77 when those land.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…~2.8x)

Adds `convex_area_2d` to the numba sizeshape lane. skimage's `area_convex` is the
pixel count inside each object's convex hull (pixels offset by ±0.5 diamond points,
counted by `grid_points_in_poly`). The slow part is the per-region hull construction
(scipy QHull + python overhead); the rasteriser is fast and convention-specific.

So we replace ONLY the hull construction with a fused numba monotone-chain
(`_hull_kernel`) over each object's BOUNDARY pixels (hull(boundary) == hull(object),
~6x fewer points), and KEEP skimage's exact `grid_points_in_poly` for the raster.
Result is bit-exact (proven 142/142) without porting the risky pnpoly. Coordinates
are x2-scaled so the offset points are integers and the monotone-chain hull is exact;
each hull is rasterised in its bbox-local frame. Degenerate objects (single pixel /
1-wide line) fall back to the pixel count, matching skimage.

Measured: area_convex 112 -> 40 ms (~2.8x), large tile, 142/142 bit-exact. Golden
tests vs regionprops (multi / irregular / non-contiguous / edge-touching /
degenerate / empty).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ct, ~5.6x)

Adds the three deterministic neighbour-pattern features. skimage runs them per-region
with C convolutions; a whole-image label-aware numba pass reproduces them bit-exact
(each object's pattern histogram equals running skimage on the isolated object — other
labels read as background — and skimage's per-region 1px pad is the whole-image edge pad).

- `perimeter_2d` (4-connectivity): border image (fg with a non-same-label 4-neighbour),
  then per border pixel value = 1 + 2*(same-object 4-conn border) + 10*(same-object
  diagonal border), weighted by skimage's LUT. Only border-centre (odd) values carry
  weight, so non-border pixels are skipped.
- `crofton_euler_2d`: shares the 2x2 binary-config histogram (skimage's XF kernel, 16
  bins); perimeter_crofton = crofton_coefs @ h, euler_number = euler_8conn_coefs @ h.

Measured: perimeter 3.5 ms, crofton+euler 3.8 ms vs skimage's 41 ms for all three
(~5.6x). Bit-exact vs regionprops: perimeter/crofton ~1e-13, euler exactly 0. Golden
tests incl. a holed object (euler topology), non-contiguous, edge-touching, irregular.

Lane status: moments+inertia, convex hull, perimeter/crofton/euler all implemented &
bit-exact. Remaining: the get_sizeshape wrapper + dispatch registration.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…tion B, ~3x)

Ties the kernels into the full `get_sizeshape` feature dict and registers the numba
sizeshape backend (previously a NO-GO). Option B: axes / eccentricity / orientation are
derived from the kernel central moments (`primitives/_moments.axes_eccentricity_orientation`,
bit-exact incl. the symmetric ±pi/4 fallback), so the regionprops call is fully moment-free
(verified 0 ms einsum) — only cheap props (area/bbox/centroid/extent/area_filled/image) and
the scipy Euclidean EDT radius loop stay on host.

Sources: moments/inertia + axes/ecc/orientation -> moment kernel; area_convex/solidity ->
hull kernel; perimeter/crofton/euler -> pattern kernels; radii -> scipy EDT (kept). `to_bzyx`
2D-only; 3D volumes fall back to the numpy baseline.

Also fixes `bulk._numba_registries`, which the integration merge had left as un-parseable
garbage (interleaved docstrings / em-dashes / duplicate returns) — rewritten cleanly to
compose all numba backends (intensity, granularity, zernike, radial_zernikes,
radial_distribution, texture, feret, sizeshape, coloc+costes) and registers `sizeshape`.

Measured: get_sizeshape 305 -> 100 ms (~3x), all 78 features match the numpy backend
(raw moments / area_convex / euler exact; the rest <=4e-7). Tests: end-to-end vs numpy
get_sizeshape (multi / non-contiguous / calculate_advanced+new_features variants), 3D
fallback, and set_accelerator("numba") dispatch routing.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
timtreis and others added 2 commits June 6, 2026 23:25
…ives

The four sizeshape primitives (spatial_moments_2d, convex_area_2d,
perimeter_2d, crofton_euler_2d) each recomputed labels_to_offsets over
the full raster, so it ran 4x per get_sizeshape call. Add a _Prep
NamedTuple wrapping the labels_to_offsets result (lut, n, offsets) — the
only prep every primitive shares — plus a _foreground_prep() helper; the
primitives now take an optional `prep` arg and _sizeshape_2d threads one
prep into all four. Each primitive still computes its own prep when
called standalone (prep=None), so the public functions and their tests
are unchanged.

The full foreground pixel list (rows/cols/obj nonzero) is needed only by
the moment kernel, so it stays inside spatial_moments_2d rather than the
shared prep — standalone perimeter_2d/crofton_euler_2d/convex_area_2d no
longer pay for a full-raster nonzero they never read. convex_area_2d
does its own nonzero over the boundary mask (sparse, not redundant).

Removes 3 redundant labels_to_offsets passes: ~9 ms / ~5% on the
1080^2 / 132-object tile. 24 tests unchanged (bit-exactness, dispatch,
edge cases); ruff clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…apper

Address /code-review notes on the sizeshape lane:

- Extract `moment_feature_dict` into `primitives/_moments.py` as the single
  source of truth for the 53 `calculate_advanced` moment/inertia feature
  names and the (p,q) orders exposed. The numba `_sizeshape_2d` now calls it
  instead of building those keys with inline f-string loops, removing the
  duplication of the numpy assembly and the f-string/constant drift risk.
  The numpy `get_sizeshape` adopts the same helper when #77's moment rewrite
  rebases onto this lane (the sizeshape golden test cross-checks the keys vs
  the numpy F_* constants meanwhile).
- Clarify the wrapper: `pixels` is accepted only for dispatch-signature
  parity and is unused (sizeshape is purely geometric, like the numpy
  backend); pass `masks` to `to_bzyx` for axis normalisation and drop the
  computed-then-discarded `_pixels_zyx`.
- Add an end-to-end empty-mask test exercising the full dict assembly with
  zero objects (the kernels had empty tests; the wrapper assembly did not).

25 tests pass (incl. new empty case); ruff clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@timtreis timtreis added the numba label Jun 9, 2026
timtreis and others added 2 commits June 10, 2026 01:07
…_dict order

Mirror the numpy-side fixes so the shared _moments.py converges:
- inertia_2d clips eigenvalues to >=0 like skimage, so thin/oblique objects get
  axis_minor=0 / eccentricity=1 instead of NaN (axes_eccentricity_orientation
  reuses inertia_2d, so the fix propagates to the axis features too).
- moment_feature_dict emits the moment keys in the grouped order of the PyPI
  0.1.19 release (all Spatial, then Central, then Normalized, ...) instead of
  interleaving Spatial/Central by (p, q) — the numba get_sizeshape output column
  order now matches the numpy backend and the release.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ssertion

A missing ')' in test_set_accelerator_numba_composes_with_numpy made the whole
test_backend_correctness.py file uncollectable, which masked a stale assertion:
it expected sizeshape to stay on the numpy backend, but this branch adds a numba
sizeshape backend, so under the numba accelerator it composes
cp_measure.core.numba._sizeshape. Close the paren and assert the numba module.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant