fix(radial): per-object measurement, label-independent (closes #22)#68
fix(radial): per-object measurement, label-independent (closes #22)#68timtreis wants to merge 3 commits into
Conversation
…(Issue #22) get_radial_distribution computed its geometry (distance_to_edge / propagate) over the whole label image, so an object's radial features could be perturbed by neighbouring labels (Issue #22). Measure each object on its own cropped + 1px-padded mask instead, reusing the exact same numpy/centrosome algorithm — so each object's result equals the baseline run on that object in isolation, by construction. - new default (legacy=False): per-object crop, field-independent. - legacy=True: the original whole-image behaviour, byte-for-byte. The original whole-image algorithm is preserved verbatim as the private _radial_distribution_image; the public function dispatches to it per isolated crop or once on the whole image. +tests: per-object independence (#22 property, bit-exact), legacy-leaks-where-new-doesn't, and odd-centre object new==legacy. Closes #22. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- derive present labels from find_objects (drops the redundant numpy.unique and the `for sl in (slices[...],)` comprehension trick) - assemble results with numpy.concatenate over the (1,) per-object arrays instead of numpy.array([obj[key][0] ...]) No behaviour change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…image result Make the contract explicit: legacy=True reproduces CellProfiler / centrosome (the Issue #22 leak is rooted upstream in scipy), while the default deliberately diverges from CellProfiler to give per-object-independent values. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
afermg
left a comment
There was a problem hiding this comment.
If I understand correctly, the solution is to bypass the problem by getting the objects and calculating the measurement for each one individually, right? I would just like a benchmark to evaluate that we are not slowing it down significantly.
|
Yeah exactly, it definitely will slow things down for larger number of objects (actually better for small), but the older version is simply incorrect since the numbers for a given cell change depending on its neighbours. Here a benchmark on a 1080x1080 img |
|
My point is that the issue with the neighbours presence/absence is being worked on right now scipy/scipy#25293, but as we discussed via slack, we may have to increase the lower boundary of the scipy dependency. Before changing the implementation I would like to consider the impact of the scipy fixes on performance and code simplicity. As it stands it is a numerical bug, but it is shared with CellProfiler (and thus present in our benchmark dataset). |
afermg
left a comment
There was a problem hiding this comment.
- Noted a couple of considerations
- Remove a test and add a note on another one for future removal
- Add the benchmarking to understand the inflection point for this "per-object" approach (probably a new PR adding it as CI and we rebase to it)
| boundaries up to the maximum radius. Parts of the object that are beyond this | ||
| radius will be counted in an overflow bin. The radius is measured in pixels. | ||
|
|
||
| legacy : bool |
There was a problem hiding this comment.
Turns out things won't be as simple. We should define what "legacy" means:
(3) Cellprofiler before the MAD fix was merged CellProfiler/CellProfiler@02a8b51
(2) scipy before the fix for #22 was merged scipy/scipy#25293
I think "legacy" should mean "before the MAD fix, but not necessarily before the scipy change". It's still unclear how the scipy change will affect the benchmark dataset, but I don't think it will do so significantly. We can add a flag to increase the tolerance for that case /at the test level/.
| return labels, pixels | ||
|
|
||
|
|
||
| def test_radial_per_object_independent_of_neighbours(): |
There was a problem hiding this comment.
Let's assume that "legacy" has always covered this (because this is sort-of orthogonal to the legacy implementation since it is due to a dependency)
| ) | ||
|
|
||
|
|
||
| def test_radial_legacy_leaks_where_new_does_not(): |
There was a problem hiding this comment.
We should add a note to remove this once github.com/scipy/scipy/pull/25293 is merged.
There was a problem hiding this comment.
And maybe the legacy flag for this measurement altogether (?) This one is a tricky one.
| assert any(not np.allclose(new[k], legacy[k], equal_nan=True) for k in new) | ||
|
|
||
|
|
||
| def test_radial_unique_centre_object_new_equals_legacy(): |
There was a problem hiding this comment.
This one is unnecessary, we have already identified the cause of the divergence. Not worth having this test if it will be merged in the future anyways. I'm becoming more keen on raising the lower bound of scipy if it doesn't break other python versions or dependencies.
Closes #22 (
get_radial_distribution should give same results per object independent of other labels).The bug
get_radial_distributioncomputes its geometry (centrosome.distance_to_edge+propagateovercolor_labels) on the whole label image, so an object's radial features can be perturbed by neighbouring labels. The numba lane (#63) fixes this for the numba backend only; the default numpy path was still affected.The fix
Measure each object on its own cropped + 1px-padded mask (
labels[sl] == label), reusing the exact same numpy/centrosome algorithm. Because the crop contains only this object (neighbours become background), each object's result equals the baseline run on that object in isolation — by construction.legacyFalse(new default)TrueNotes
legacy=Truerestores the old values.