Skip to content

fix(radial): per-object measurement, label-independent (closes #22)#68

Open
timtreis wants to merge 3 commits into
mainfrom
fix/radial-per-object-22
Open

fix(radial): per-object measurement, label-independent (closes #22)#68
timtreis wants to merge 3 commits into
mainfrom
fix/radial-per-object-22

Conversation

@timtreis

@timtreis timtreis commented Jun 5, 2026

Copy link
Copy Markdown
Collaborator

Closes #22 (get_radial_distribution should give same results per object independent of other labels).

The bug

get_radial_distribution computes its geometry (centrosome.distance_to_edge + propagate over color_labels) on the whole label image, so an object's radial features can be perturbed by neighbouring labels. The numba lane (#63) fixes this for the numba backend only; the default numpy path was still affected.

The fix

Measure each object on its own cropped + 1px-padded mask (labels[sl] == label), reusing the exact same numpy/centrosome algorithm. Because the crop contains only this object (neighbours become background), each object's result equals the baseline run on that object in isolation — by construction.

legacy behaviour
False (new default) per-object crop → field-independent (#22 fixed)
True original whole-image behaviour, byte-for-byte

Notes

timtreis and others added 3 commits June 5, 2026 03:59
…(Issue #22)

get_radial_distribution computed its geometry (distance_to_edge / propagate) over
the whole label image, so an object's radial features could be perturbed by
neighbouring labels (Issue #22). Measure each object on its own cropped + 1px-padded
mask instead, reusing the exact same numpy/centrosome algorithm — so each object's
result equals the baseline run on that object in isolation, by construction.

- new default (legacy=False): per-object crop, field-independent.
- legacy=True: the original whole-image behaviour, byte-for-byte.

The original whole-image algorithm is preserved verbatim as the private
_radial_distribution_image; the public function dispatches to it per isolated crop
or once on the whole image. +tests: per-object independence (#22 property, bit-exact),
legacy-leaks-where-new-doesn't, and odd-centre object new==legacy.

Closes #22.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- derive present labels from find_objects (drops the redundant numpy.unique and
  the `for sl in (slices[...],)` comprehension trick)
- assemble results with numpy.concatenate over the (1,) per-object arrays instead
  of numpy.array([obj[key][0] ...])

No behaviour change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…image result

Make the contract explicit: legacy=True reproduces CellProfiler / centrosome
(the Issue #22 leak is rooted upstream in scipy), while the default deliberately
diverges from CellProfiler to give per-object-independent values.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@afermg afermg left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, the solution is to bypass the problem by getting the objects and calculating the measurement for each one individually, right? I would just like a benchmark to evaluate that we are not slowing it down significantly.

@timtreis

timtreis commented Jun 5, 2026

Copy link
Copy Markdown
Collaborator Author

Yeah exactly, it definitely will slow things down for larger number of objects (actually better for small), but the older version is simply incorrect since the numbers for a given cell change depending on its neighbours.

Here a benchmark on a 1080x1080 img

objects   legacy ms   perobj ms   ratio   ms/obj
      1      756.32      767.62   1.01x  767.619
      4      905.92      719.37   0.79x  179.843
     16      933.86      632.00   0.68x   39.500
     64      970.97      755.67   0.78x   11.807
    144     1013.02      999.64   0.99x    6.942
    400     1057.14     1804.88   1.71x    4.512
   1024     1043.58     3686.60   3.53x    3.600
   1600     1091.35     5464.12   5.01x    3.415
   4096      988.64    13142.43  13.29x    3.209

@timtreis timtreis requested a review from afermg June 5, 2026 21:10
@afermg

afermg commented Jun 6, 2026

Copy link
Copy Markdown
Owner

My point is that the issue with the neighbours presence/absence is being worked on right now scipy/scipy#25293, but as we discussed via slack, we may have to increase the lower boundary of the scipy dependency.

Before changing the implementation I would like to consider the impact of the scipy fixes on performance and code simplicity. As it stands it is a numerical bug, but it is shared with CellProfiler (and thus present in our benchmark dataset).

@afermg afermg left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Noted a couple of considerations
  • Remove a test and add a note on another one for future removal
  • Add the benchmarking to understand the inflection point for this "per-object" approach (probably a new PR adding it as CI and we rebase to it)

boundaries up to the maximum radius. Parts of the object that are beyond this
radius will be counted in an overflow bin. The radius is measured in pixels.

legacy : bool

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out things won't be as simple. We should define what "legacy" means:

(3) Cellprofiler before the MAD fix was merged CellProfiler/CellProfiler@02a8b51
(2) scipy before the fix for #22 was merged scipy/scipy#25293

I think "legacy" should mean "before the MAD fix, but not necessarily before the scipy change". It's still unclear how the scipy change will affect the benchmark dataset, but I don't think it will do so significantly. We can add a flag to increase the tolerance for that case /at the test level/.

return labels, pixels


def test_radial_per_object_independent_of_neighbours():

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's assume that "legacy" has always covered this (because this is sort-of orthogonal to the legacy implementation since it is due to a dependency)

)


def test_radial_legacy_leaks_where_new_does_not():

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add a note to remove this once github.com/scipy/scipy/pull/25293 is merged.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And maybe the legacy flag for this measurement altogether (?) This one is a tricky one.

assert any(not np.allclose(new[k], legacy[k], equal_nan=True) for k in new)


def test_radial_unique_centre_object_new_equals_legacy():

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is unnecessary, we have already identified the cause of the divergence. Not worth having this test if it will be merged in the future anyways. I'm becoming more keen on raising the lower bound of scipy if it doesn't break other python versions or dependencies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

get_radial_distribution should give same results per object independent of other labels in input

2 participants