Refactor applications/DynaCell into a reusable benchmark package#404
Conversation
There was a problem hiding this comment.
Pull request overview
This PR refactors applications/dynacell from a thin training app into a reusable dynacell benchmark package, reorganizing configs and bringing evaluation/reporting/preprocess/data modules (with tests) into VisCy while updating shared runtime utilities to support the new package shape.
Changes:
- Reorganize Dynacell configs into
configs/recipes/+configs/examples/, remove paper-/HPC-specific SEC61B artifacts, and update docs/tests for config discovery. - Add reusable
dynacell.data,dynacell.evaluation,dynacell.reporting, anddynacell.preprocessmodules plus substantial test coverage. - Update shared VisCy components (CLI checkpoint merging, prediction writer overwrite behavior, affine augmentation safety options, HCS training shape checks) to support the new workflows.
Reviewed changes
Copilot reviewed 74 out of 85 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/viscy-utils/src/viscy_utils/cli.py | Adjust checkpoint-hparam vs user-config precedence during LightningCLI parsing. |
| packages/viscy-utils/src/viscy_utils/callbacks/prediction_writer.py | Add overwrite option to allow reusing existing output stores. |
| packages/viscy-transforms/src/viscy_transforms/_affine.py | Add configurable padding_mode and safe-crop scale clamping for GPU affine aug. |
| packages/viscy-transforms/tests/test_affine.py | Extend affine tests to cover scale-floor computation and safe-crop behavior. |
| packages/viscy-data/src/viscy_data/hcs.py | Skip strict training shape check when GPU augmentations handle cropping. |
| applications/dynacell/src/dynacell/main.py | Route evaluate/report to Hydra entry points; keep fit/predict via viscy_utils. |
| applications/dynacell/src/dynacell/engine.py | Add ckpt_path init-time loading to decouple predict-time settings from ckpt hparams. |
| applications/dynacell/src/dynacell/data/_yaml.py | Add shared OmegaConf→Pydantic YAML loader. |
| applications/dynacell/src/dynacell/data/manifests.py | Add dataset manifest + split schemas and path-based loaders. |
| applications/dynacell/src/dynacell/data/collections.py | Add frozen benchmark collection schemas and loader. |
| applications/dynacell/src/dynacell/data/specs.py | Add benchmark-spec schema/loader for tying pipeline stages together. |
| applications/dynacell/src/dynacell/data/init.py | Re-export dynacell.data public API. |
| applications/dynacell/src/dynacell/evaluation/init.py | Initialize evaluation package. |
| applications/dynacell/src/dynacell/evaluation/pipeline.py | Add Hydra-based evaluation pipeline with caching + output writing. |
| applications/dynacell/src/dynacell/evaluation/metrics.py | Add pixel/mask/feature metric computations (optional deps guarded). |
| applications/dynacell/src/dynacell/evaluation/utils.py | Add feature extractors + plotting and distribution-distance helpers. |
| applications/dynacell/src/dynacell/evaluation/io.py | Add eval I/O dispatch helpers (Zarr vs TIFF-like) and preprocessing utilities. |
| applications/dynacell/src/dynacell/evaluation/segmentation.py | Add segmentation workflows with optional dependency guards. |
| applications/dynacell/src/dynacell/evaluation/formatting.py | Add DataFrame formatting utilities for evaluation outputs. |
| applications/dynacell/src/dynacell/evaluation/torch_ssim.py | Add torch-native SSIM implementation used by metrics. |
| applications/dynacell/src/dynacell/evaluation/spectral_pcc/init.py | Initialize spectral_pcc subpackage. |
| applications/dynacell/src/dynacell/evaluation/spectral_pcc/plot_shading_analysis.py | Add analysis plotting script for shading artifact comparison. |
| applications/dynacell/src/dynacell/evaluation/spectral_pcc/plot_combined.py | Add combined metrics plotting + summary script. |
| applications/dynacell/src/dynacell/evaluation/spectral_pcc/diagnostic_real.py | Add Hydra diagnostic workflow for real-data spectra/metrics analysis. |
| applications/dynacell/src/dynacell/reporting/init.py | Re-export reporting public API. |
| applications/dynacell/src/dynacell/reporting/tables.py | Add table aggregation + LaTeX rendering utilities. |
| applications/dynacell/src/dynacell/reporting/figures.py | Add matplotlib figure generation for aggregated metrics. |
| applications/dynacell/src/dynacell/reporting/cli.py | Add Hydra reporting CLI entry point and config path wiring. |
| applications/dynacell/src/dynacell/preprocess/init.py | Re-export preprocess utilities. |
| applications/dynacell/src/dynacell/preprocess/config.py | Add OmegaConf-based preprocess config loader (with fallback). |
| applications/dynacell/src/dynacell/preprocess/zarr_utils.py | Add OME-Zarr rewrite utility (rechunk/reshard). |
| applications/dynacell/configs/examples/unext2/fit.yml | Add generic UNeXt2 fit example in new config layout. |
| applications/dynacell/configs/examples/unext2/predict.yml | Add generic UNeXt2 predict example with init-time ckpt_path. |
| applications/dynacell/configs/examples/fnet3d/fit.yml | Update FNet3D fit example paths to new layout. |
| applications/dynacell/configs/examples/fnet3d/predict.yml | Add generic FNet3D predict example with init-time ckpt_path. |
| applications/dynacell/configs/examples/unetvit3d/fit.yml | Update UNetViT3D fit example paths to new layout. |
| applications/dynacell/configs/examples/unetvit3d/predict.yml | Update UNetViT3D predict example paths and use init-time ckpt_path. |
| applications/dynacell/configs/examples/celldiff/fit.yml | Update CellDiff fit example paths to new layout. |
| applications/dynacell/configs/examples/celldiff/predict.yml | Add CellDiff predict example with init-time ckpt_path and overlap config. |
| applications/dynacell/configs/recipes/trainer/fit_1gpu.yml | Add reusable 1-GPU trainer recipe. |
| applications/dynacell/configs/recipes/trainer/fit_4gpu.yml | Add reusable 4-GPU trainer recipe. |
| applications/dynacell/configs/recipes/trainer/fit_fm_4gpu.yml | Add reusable 4-GPU flow-matching trainer recipe. |
| applications/dynacell/configs/recipes/trainer/predict_gpu.yml | Remove ckpt_path from trainer recipe (moved to model init args). |
| applications/dynacell/configs/recipes/models/unext2_3d.yml | Add reusable UNeXt2 model recipe (z=15). |
| applications/dynacell/configs/recipes/models/unext2_3d_z8.yml | Add reusable UNeXt2 model recipe (z=8). |
| applications/dynacell/configs/recipes/models/fnet3d.yml | Add reusable FNet3D model recipe. |
| applications/dynacell/configs/recipes/models/fnet3d_z8.yml | Add reusable FNet3D model recipe (z=8). |
| applications/dynacell/configs/recipes/models/unetvit3d.yml | Add reusable UNetViT3D model recipe. |
| applications/dynacell/configs/recipes/models/celldiff_fm.yml | Add reusable CellDiff flow-matching model recipe. |
| applications/dynacell/configs/recipes/modes/spotlight.yml | Add Spotlight mode recipe for foreground-aware loss. |
| applications/dynacell/configs/recipes/data/hcs_phase_fluor_3d.yml | Adjust data recipe defaults (e.g., preload false). |
| applications/dynacell/configs/evaluate/eval.yaml | Add Hydra eval base config. |
| applications/dynacell/configs/evaluate/spectral_pcc/base.yaml | Add Hydra config for spectral PCC workflows. |
| applications/dynacell/configs/evaluate/spectral_pcc/simulate.yaml | Add simulation config for metric validation. |
| applications/dynacell/configs/evaluate/spectral_pcc/diagnostic_real.yaml | Add real-data diagnostic config. |
| applications/dynacell/configs/report/base.yaml | Add Hydra report config. |
| applications/dynacell/tests/test_training_integration.py | Update config discovery test to new configs/examples location. |
| applications/dynacell/tests/test_cli_routing.py | Add tests for CLI routing between Lightning and Hydra subcommands. |
| applications/dynacell/tests/test_data_manifests.py | Add tests for dataset manifest/collection/spec schemas and loaders. |
| applications/dynacell/tests/test_evaluation_io.py | Add tests for eval I/O dispatching based on path type. |
| applications/dynacell/tests/test_evaluation_metrics.py | Add regression tests for shared-scale pixel metrics behavior. |
| applications/dynacell/tests/test_evaluation_pipeline.py | Add caching regression test for evaluation pipeline. |
| applications/dynacell/tests/test_reporting_tables.py | Add tests for reporting tables loading/aggregation/LaTeX output. |
| applications/dynacell/tests/test_reporting_tables_extended.py | Add extended reporting table tests (caption/label, lower-is-better). |
| applications/dynacell/tests/test_reporting_figures.py | Add tests for figure creation/saving and empty-data behavior. |
| applications/dynacell/tests/test_preprocess_config.py | Add tests for preprocess config loader fallback behavior. |
| applications/dynacell/tests/test_preprocess_zarr_utils.py | Add tests for zarr rewrite correctness (data, chunks, metadata, shards). |
| applications/dynacell/README.md | Update usage + config layout documentation; remove paper-specific SEC61B instructions. |
| applications/dynacell/.gitignore | Ignore runtime artifacts (lightning_logs, outputs, pycache). |
| applications/dynacell/pyproject.toml | Add optional extras for eval/report/preprocess and allow direct references. |
| applications/dynacell/examples/configs/sec61b/run_unext2_continue.slurm | Remove paper/HPC-specific SEC61B SLURM script. |
| applications/dynacell/examples/configs/sec61b/run_unext2.slurm | Remove paper/HPC-specific SEC61B SLURM script. |
| applications/dynacell/examples/configs/sec61b/run_fnet3d_paper.slurm | Remove paper/HPC-specific SEC61B SLURM script. |
| applications/dynacell/examples/configs/sec61b/run_fnet3d.slurm | Remove paper/HPC-specific SEC61B SLURM script. |
| applications/dynacell/examples/configs/sec61b/fit_unext2.yml | Remove paper-specific SEC61B config. |
| applications/dynacell/examples/configs/sec61b/fit_fnet3d_paper.yml | Remove paper-specific SEC61B config. |
| applications/dynacell/examples/configs/sec61b/fit_fnet3d.yml | Remove paper-specific SEC61B config. |
| applications/dynacell/examples/configs/sec61b/fit_celldiff.yml | Remove paper-specific SEC61B config. |
| applications/dynacell/examples/configs/recipes/data/hcs_sec61b_3d.yml | Remove hard-coded SEC61B data recipe. |
| applications/dynacell/examples/configs/fnet3d/predict.yml | Remove old predict example path (superseded by new configs layout). |
| applications/dynacell/examples/configs/celldiff/predict.yml | Remove old predict example path (superseded by new configs layout). |
| CLAUDE.md | Update repo/dev workflow documentation and conventions. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 78 out of 91 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Addresses two PR #404 review findings: 1. `_CacheContext._manifest_dirty` was mutated directly from helper call sites, leaking implementation detail. Adds `mark_manifest_dirty` and `consume_manifest_dirty` methods and routes every external touch through them. Only the dataclass itself now references the private field. 2. `resolve_dynaclr_encoder_cfg` used `except Exception` to detect a missing nested config key — wider than needed and against CLAUDE.md guidance. Replaced with `OmegaConf.select(..., default=None)`, which handles missing keys natively without a try/except. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Addresses a PR #404 review finding: the split GT/pred feature API had structural tests (empty inputs, column-drop, shape mismatch) but no pinned-value regression guard. Adds two tests that seed deterministic synthetic inputs and assert exact output values for CP_FID / CP_KID / CP_Median_Cosine_Similarity and the DINOv3 equivalents. If anyone later changes the column-drop, per-side z-score, or FID/KID/cosine pairing logic — or a dependency shifts numerics — these tests will fail rather than silently drifting metrics. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 131 out of 143 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Restructure applications/dynacell/configs/benchmarks/virtual_staining/
so train, predict, and eval leaves for the same benchmark cell live
under one directory `<org>/<train_set>/<model>/`, with shared building
blocks split by consumer:
- `shared/model/` — train/predict building blocks (train_sets,
predict_sets, targets, model_overlays, launcher_profiles)
- `shared/eval/` — HPC-bound Hydra eval groups (target, feature_extractor/dynaclr)
- `<org>/<train_set>/<model>/{train.yml, predict/<predset>.yml, eval/<predset>.yaml}`
Fold `configs/evaluation/` into the benchmark tree and delete the
sibling root. The eval selector renames from `benchmark=<path>` to
`leaf=<path>` to reflect the new filesystem model; leaves are addressed
through a committed symlink tree at `virtual_staining/leaf/` because
Hydra's group resolution requires a physical `leaf/` prefix. Eval
leaves keep `.yaml` (Hydra only discovers `.yaml` for groups); train
and predict leaves stay `.yml`.
`dynacell.__main__` now injects two `hydra.searchpath` roots — the
benchmark tree (for `leaf/`) and `shared/eval/` (for `target/` and
`feature_extractor/dynaclr/`). The `_EXTERNAL_CONFIGS_SUBPATH` constant
becomes `_EXTERNAL_SEARCHPATHS`, and the helper is renamed to
`_external_configs_dirs` (plural, returning a list).
Also fixes a pre-existing bug: the three unetvit3d train leaves under
mito/nucleus/membrane referenced a non-existent `runtime_single_gpu.yml`
launcher profile; they now use `runtime_shared.yml`. Model-overlay
`base:` entries targeting `recipes/` gain an extra `..` segment to
account for the new `shared/model/` depth.
Tests:
- test_benchmark_config_composition.py: path literals, three new
(unetvit3d × {mito,nucleus,membrane}) pairs, new
test_eval_leaf_symlink_resolves parametrised over all 8 eval leaves.
- test_cli_routing.py: three `_external_configs_dir` patches renamed
(plural) and argv selector strings updated to `leaf=`.
- test_submit_benchmark_job.py: parametrised leaf paths.
Docs: virtual_staining/README.md (full rewrite to match new tree),
evaluation/README.md (groups table, external-users section, benchmark
eval leaves section), UNEXT2_VS_FCMAE_CLASSES.md (stale path), dynacell/
README.md (sbatch example path), BENCHMARK_CONFIG_SCHEMA.md (Hydra
search-path contract note about the `leaf/` symlink adapter).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…l/ move
The preceding refactor commit renamed train and predict leaves to
<org>/<train_set>/<model>/{train.yml, predict/<predset>.yml} but the
content updates that prepend the shared/model/ segment to every base:
line were unstaged when that commit landed. This catches the 25 train
and predict leaves up with the new shared/model/ layout, and tightens a
prose comment in fnet3d_paper_fit.yml that still referenced the
pre-refactor shared/targets/ path.
Without this fix a fresh checkout of the reorg commit has train and
predict leaves pointing at shared/train_sets/, shared/targets/, etc. —
paths that no longer exist — so `load_composed_config` would raise
FileNotFoundError on every leaf.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Trim the WHAT-narration from docstrings and comments in dynacell.__main__: drop the redundant "two roots are injected" enumeration (the tuple is self-describing), fix the "repo root" claim in _external_configs_dirs (the walk stops at the app's pyproject.toml, not the workspace root), and drop the "joined with commas" line from _inject_external_configs' docstring. Bind ``p = parent / sub`` via walrus inside the existence filter so the path is computed once instead of twice per subpath. In the composition tests, delete the ``EVAL_LEAVES = PREDICT_LEAVES`` alias (the aliased list was the same object, causing silent coupling) and drive the symlink-integrity test off ``PREDICT_LEAVES`` directly. Drop the ``@package _global_`` header assertion from the symlink test — that check belongs to composition, not to symlink integrity. Add a ``REPO_ROOT`` sanity assert so drift of the ``parents[3]`` magic count fails loudly. Tighten ``TestInjectExternalConfigs`` class docstring to a single line. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the "in-package / repo checkout" shorthand in the config-groups table with concrete source paths, so readers see exactly where each group lives after the reorg: shared/eval/ for HPC-bound groups, leaf/ for the eval-leaf symlink tree, _configs/ for packaged schema. Tighten the "Running an evaluation" intro so it no longer implies all groups live under _configs/, and spell out the two hydra.searchpath roots dynacell.__main__ injects. Add a footgun entry for external users about Hydra's .yaml-only group resolution — saving a group as .yml makes the selector silently fail with MissingConfigException. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Keep the top level of virtual_staining/ biology-only: `er/`, `membrane/`,
`mito/`, `nucleus/` plus README.md. Everything else — shared building
blocks and the leaf/ Hydra symlink adapter — now lives under a single
_internal/ directory whose leading underscore signals "implementation
detail; don't browse here for science."
Layout is now:
virtual_staining/
<org>/<train_set>/<model>/{train.yml, predict/, eval/}
_internal/
shared/
model/ # train/predict building blocks
eval/ # target/, feature_extractor/dynaclr/
leaf/ # symlink tree aliasing canonical eval leaves
dynacell.__main__ injects two hydra.searchpath roots under _internal/
(the tree root for leaf/, and shared/eval/ for target + dynaclr) —
same cardinality as before, just under the hidden support tree.
Train/predict leaves get `_internal/` prepended to every base: path.
Model overlays gain one more `../` to reach configs/recipes/ (now 6
segments deep instead of 5). The 8 leaf/ symlinks are recreated with
6 relative-dot segments (was 5 at virtual_staining/leaf/, now 6 at
virtual_staining/_internal/leaf/).
Docs refreshed across BENCHMARK_CONFIG_SCHEMA.md, virtual_staining/
README.md, and evaluation/README.md: every layout tree, base: example,
group-source column, and search-path reference points at _internal/.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two related bugs in predict-path tiling: P1. DynacellFlowMatching.predict_step routed predict_method='sliding_window' to CELLDiff3DVS.generate_sliding_window(), which is the non-overlapping tiler and does not consume predict_overlap. Any config that set both predict_method='sliding_window' and predict_overlap=[...] silently got non-overlapping tiles, producing different predictions than the user expected. Now raise ValueError when sliding_window is combined with a non-zero overlap, pointing users to predict_method='iterative' (the overlap-anchored velocity tiler) instead. Also refresh the docstring to spell out which methods consume predict_overlap. Fix the membrane predict leaf to match its peers (er, mito): predict_method=iterative. Example config at configs/examples/celldiff/predict.yml gets the same fix plus an inline comment clarifying which methods overlap. P2. DynacellUNet.predict_sliding_window allocated both accumulator buffers (prediction_sum, prediction_count) with zeros_like(source), so their channel dimension was always in_channels. For models whose out_channels != in_channels (common: 1 phase in -> 2 target out), the first `prediction_sum[slicer] += patch_out` raised a broadcast error. Allocate the accumulators lazily from the first patch output so they take the model's actual out_channels and dtype. Regression tests added for both paths: - sliding_window + non-zero overlap must raise ValueError - predict_sliding_window returns the correct shape when out_channels=2 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The earlier "always append" strategy broke argparse when users placed
diagnostic flags (-c job, --info, --help, ...) after positional
overrides:
$ dynacell evaluate leaf=er/... -c job
dynacell: error: unrecognized arguments: hydra.searchpath=[file://...]
Hydra's argparse declares ``overrides`` as a single positional with
nargs="*". Once that nargs run is interrupted by a flag, any later
positional is reported unrecognized. Appending the injected token at
the end scatters a positional across the flag in the override-first
layout; prepending scatters it across the flag in the flag-first layout.
Fix: insert the token adjacent to an existing positional override when
one is present, so positionals stay contiguous whichever side the
flags sit on. Fall back to append when the user passed no positional
overrides (e.g. bare --help).
Three regression tests cover the three layouts: flag-leading,
flag-trailing, and no-positional. The former "appends always" test is
replaced by the layout-specific pair.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Tighten the "Running an evaluation" lead so readers see up front that repo-checkout groups live under virtual_staining/_internal/ (the hidden support tree), not loose under virtual_staining/. The detailed bullet list below already spelled this out; the intro now matches. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 191 out of 201 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Lightning's SLURMEnvironment rejects --ntasks at runtime (slurm.py:_validate_srun_variables) and demands --ntasks-per-node. Yesterday's fix emitted the wrong directive; all four FCMAE jobs failed within 50 s on the cluster. Switch the sbatch generator, hardware profiles, pre-submit validator, and invariant test to use ntasks_per_node throughout. The enforced invariant is now devices == ntasks_per_node and gpus == nodes × devices. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Gated HF repos (DINOv3) need per-user access grants, but the downloaded weights are identical. Before this change every team member who ran `dynacell evaluate compute_feature_metrics=true` re-downloaded ~2 GB of DINOv3 weights into their own `~/.cache/huggingface/`. Even ignoring the network cost, only one team member has HF gate access right now — every other member's runs 403 at model fetch. Fix: point HF_HOME at a team-shared project-storage dir. First user with gated access downloads once; everyone else reads the shared copy. Two layers so the behavior is automatic regardless of invocation: 1. `dynacell.__main__._maybe_set_shared_hf_cache()` runs before Hydra subcommands (evaluate, precompute-gt, report). It defaults HF_HOME to `/hpc/projects/comp.micro/virtual_staining/models/dynacell/evaluation/hf_cache/` when three conditions hold: the user hasn't set HF_HOME themselves, we're on a repo checkout (same signal as external configs), and the shared dir exists on this machine. Wheel installs and non-HPC machines fall through to the normal ~/.cache/huggingface default. 2. `runtime_shared.yml` also sets HF_HOME for sbatch-rendered fit / predict jobs. Those jobs don't currently load HF models, but having the env var declared in the launcher profile future-proofs any later workflow that does and keeps the team convention visible in config. Four regression tests cover the auto-setter: user-set wins, wheel install skips, missing shared dir skips, repo checkout + existing dir sets. README documents the mechanism and first-time download contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three defaults were required on every eval / precompute-gt invocation
even though they are effectively fixed for our benchmark set:
feature_extractor/dinov3=lvd1689m (only DINOv3 we evaluate with)
feature_extractor/dynaclr=default (team DynaCLR checkpoint)
io.gt_cache_dir=/hpc/.../eval_cache/<TARGET> (one dir per target)
Hoist them so the minimal command
dynacell precompute-gt target=er_sec61b predict_set=ipsc_confocal
composes everything needed for all four cache families, and
dynacell evaluate leaf=<...>
carries the same defaults without the redundant override directives.
- eval.yaml defaults list switches
optional feature_extractor/dinov3: null -> lvd1689m
optional feature_extractor/dynaclr: null -> default
The `optional` keyword means wheel installs that don't ship
dynaclr=default (it lives in the repo's shared/eval/ tree) silently
skip the default — wheel users supply their own via --config-dir.
- Each target YAML now carries io.gt_cache_dir for its target's cache
path (SEC61B, TOMM20, nucleus, membrane).
- The 8 canonical eval leaves drop the redundant
override /feature_extractor/dinov3: lvd1689m
override /feature_extractor/dynaclr: default
and the per-leaf io.gt_cache_dir — all now inherited.
- README updated: "Enable feature metrics" and "Priming the cache"
examples show the one-line minimal forms.
To swap variants at call-time nothing changes:
dynacell evaluate ... feature_extractor/dinov3=<other_variant>
dynacell evaluate ... feature_extractor.dynaclr.checkpoint=/other.ckpt
dynacell evaluate ... feature_extractor/dinov3=null # disable
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three issues from /simplify agents:
- Hardcoded HPC path in _SHARED_HF_CACHE makes the module CZ Biohub-
specific; other sites couldn't override without editing source. Read
DYNACELL_SHARED_HF_CACHE at module load and fall back to the CZ
Biohub default only when the env var is absent. Extension story is
now: set DYNACELL_SHARED_HF_CACHE in your shell / sbatch env block
and the rest of the mechanism just works.
- Docstring on _maybe_set_shared_hf_cache named the sibling helper
_external_configs_dirs; rephrase in semantic terms ("external Hydra
searchpaths resolve") so the docstring doesn't rot if the detection
mechanism ever changes.
- Four test methods in TestMaybeSetSharedHfCache imported `os` inline
instead of at module top (violates CLAUDE.md). Plus a redundant
monkeypatch.delenv() cleanup at the end of the last test that
pytest's monkeypatch fixture already handles on teardown. Lift
`import os` to module top, drop the inline imports, drop the
redundant cleanup.
Skipped /simplify nitpicks:
- functools.cache on _external_configs_dirs (6-8 stat calls saved
per CLI invocation; not worth the cognitive cost of a cache that
has to coexist with tests that patch the function reference)
- Parametrize the 4 tests (4 is small enough to leave explicit)
- Inline comment on opaque group names (marginal signal)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Setting HF_HOME to the team-shared project-storage dir inadvertently broke per-user gated-repo authentication: HF_HOME relocates the entire HF directory tree including the auth token file. The first precompute attempt with DINOv3 enabled came back 401 "Access to model ... is restricted. You must have access to it and be authenticated to access it" — not a missing grant (the user had been granted access, which would have returned 403), but a missing token. HF was looking under $HF_HOME/token, where no token file lives. Switch to HF_HUB_CACHE, which only relocates the weights/datasets cache and leaves auth state at its default ~/.cache/huggingface/token. Each user's personal token now enforces gate ACLs on their own account; the cache of downloaded weights is shared on project storage as intended. The _maybe_set_shared_hf_cache helper and its four tests flip from HF_HOME → HF_HUB_CACHE. The runtime_shared.yml launcher env block does the same. README gains a paragraph explaining *why* we chose HF_HUB_CACHE over HF_HOME, and clarifies that subsequent reads from the shared cache are plain disk reads that don't round-trip to HF and so don't need a token at all. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previous 4-GPU FCMAE runs OOM-killed before first training step. On gpu-c-1 sstat showed MaxVMSize=103G/rank at startup — 4 ranks × 103G ≈ 412G of virtual memory, plus mmap_preload /dev/shm staging, pushed past the 512G cgroup cap. num_workers=8 per rank meant 32 dataloader workers inflating per-rank VM before the first step. Halving to 4 cuts startup VM roughly in half while keeping prefetch above the single-worker bottleneck. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Collapse the 5-kwarg ``(gt_plate_path, gt_channel_name, ..., pred_plate_path, pred_channel_name)`` signatures on ``check_cache_identity`` and ``seed_cache_identity`` into a single ``source: Literal["gt", "pred"] | None`` + ``plate_path`` + ``channel_name`` triple. The paired-None runtime guards collapse to one check and the public surface no longer encodes the GT/pred axis in parameter names. The new ``seed_cache_identity`` accepts only one source per call; the test that previously seeded both sides in a single invocation splits into two sequential calls. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Collapse ``init_cache_context`` and ``init_pred_cache_context`` into one function with a required ``side: Literal["gt", "pred"]`` kwarg. The two near-duplicate ~100-LOC bodies become one ~80-LOC body; ``side`` selects which manifest key, which cache dir, and which identity fields to check. The cross-side contamination guard (``manifest.get(other_side) is not None``) now runs in both directions. Today only the pred path rejected GT-contaminated manifests; the GT path will now also reject pred-side contamination — a config error in either direction fails fast instead of silently returning a half-valid cache. ``side`` has no default — callers must spell out their intent. The two GT call sites (``pipeline.py:153`` and ``precompute_cli.py:82``) and the one pred call site (``pipeline.py:160``) all pass it explicitly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Drop six 1-line trampolines and ``_ensure_cache_side``. Export three side-agnostic helpers instead — ``fov_masks``, ``fov_cp_features``, ``fov_deep_features`` — each reading ``ctx.side`` for the per-side artifact label, force-recompute key, and manifest source tag. The shared private helpers (``_fov_masks``, ``_fov_deep_features``, ``_load_or_compute_feature_timepoints``) were already side-agnostic; this just removes the redundant public layer above them. The compute backends (``cp_target_regionprops``, ``deep_target_features``) are body-identical to their pred-side twins — only their parameter name differs. The new ``fov_cp_features`` and ``fov_deep_features`` call the target-side functions on both sides as a stop-gap; a follow-up commit drops the unused pred-side variants and renames the survivors to ``cp_regionprops`` / ``deep_features``. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The three ``*_target_*`` / ``*_pred_*`` metric pairs in ``metrics.py`` are body-identical — they only differ by parameter name. Drop the duplicates and rename the survivors: - ``cp_target_regionprops`` + ``cp_pred_regionprops`` → ``cp_regionprops`` - ``build_target_crops`` + ``build_pred_crops`` → ``build_crops`` - ``deep_target_features`` + ``deep_pred_features`` → ``deep_features`` The argument becomes ``image`` (was ``target`` / ``prediction``); the ``Shape mismatch:`` error now reads ``image`` in its message. The eval pipeline's cache-disabled fallback in ``pipeline.py`` and the ``fov_cp_features`` / ``fov_deep_features`` compute backends all switch to the unified names. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Hoist the per-FOV pred-feature dispatch into ``_fov_pred_features_per_t``,
which returns a ``{kind: list[T]}`` dict regardless of whether the pred
cache is enabled. The per-timepoint reads inside the inner loop become a
straight indexing pass with no ``if pred_cache_ctx.enabled`` branch.
The cache-disabled branch keeps the crop-sharing optimization (one set
of 2-D crops per timepoint, reused across DINOv3/DynaCLR/CellDINO) so
the max-projection + cell-iteration + crop construction still runs once
per (FOV, t) instead of once per backbone.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ties
Four functions in ``pipeline_cache.py`` independently recomputed
``label_prefix = "pred_" if ctx.side == "pred" else ""`` and
``source = {"source": "prediction"} if ctx.side == "pred" else {}`` from
``ctx.side``. Lift both into ``_CacheContext.label_prefix`` and
``source_tag`` properties so the side-encoding lives next to ``side`` and
the call sites just read it.
Two more ternaries in ``init_cache_context`` (``cache_dir_key``,
``other_side``, and the ``io.{gt,pred}_path`` / ``io.{gt,pred}_channel_name``
selection) collapse into ``_SIDE_IO_KEYS`` and ``_OTHER_SIDE`` lookup
tables. Net: same behavior, one place to edit per axis.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…point The cache-disabled branch of ``_fov_pred_features_per_t`` previously materialized a ``t_count``-long ``crops`` list, then walked it three more times in separate comprehensions for DINOv3, DynaCLR, and CellDINO. All ``t_count`` 2-D crop tensors sat in RAM until the function returned. Fuse into one loop: build crops for one timepoint, fan out to every deep backbone, then drop them before the next timepoint. Same total work; peak crops memory drops from ``t_count`` copies to one. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Cross-FOV/cross-timepoint accumulator for deep features that batches DINOv3/DynaCLR/CellDINO forwards across the whole dataset, splits results by per-(pos, t) row counts, and writes through the existing zarr cache contract. Force flags are snapshotted at construction so mid-scan flushes don't flip semantics for later positions; ctx.force is only cleared in drain() for kinds actually flushed. Lock-tag construction for the per-(family, position) write lock is hoisted to _kind_lock_tag so _load_or_compute_feature_timepoints and _flush_kind cannot drift. No call sites yet; per-FOV path (fov_deep_features) keeps working for the cache-disabled dev path. The wiring lands in subsequent commits. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three tests for the cross-FOV batched precompute path: - writes_all_slots: every (pos, t) lands in the cache with the expected (n_cells, feature_dim) shape. - matches_per_fov: precompute_deep_features and fov_deep_features produce bit-exact identical cache contents (golden equivalence with a deterministic stub extractor). - skips_cached_slots: pre-seeded (pos, t) slots are not re-extracted and the original cache contents survive. Helpers (_write_tiny_hcs_plate, _write_tiny_seg_plate, _BatchedConstantExtractor) are local to this test module per the project's no-conftest-import rule. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Run precompute_deep_features once over the whole position set before the per-FOV loop so DINOv3/DynaCLR/CellDINO forwards amortize across the dataset instead of paying Python/CUDA dispatch per (FOV, t). After precompute, the existing fov_deep_features / pred per-t calls hit the cache. Paired pos-name validation is hoisted so precompute cannot write to mismatched cache slots; the in-loop checks stay as defense in depth. Sides whose ctx.require_complete is true are skipped at precompute time — the main loop's _raise_if_require_complete keeps enforcing the fail-loud miss contract on those sides. Threshold is exposed via feature_metrics.deep_feature_batch_threshold (default 256 when the key is absent; explicit default lands in the config-knob commit). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the per-kind fov_deep_features calls in precompute-gt's per-FOV
loop with a single streaming push to DeepFeatureBatcher. The existing
loop already loaded the GT plate FOV once for masks + CP; deep features
now reuse that read and batch forwards across FOVs/timepoints.
build.{masks,cp} still go through the per-FOV helpers because they're
not part of the batched pipeline. flush_manifest stays in the loop to
preserve crash-recovery progress; batcher.drain() runs after the loop
to land any pending crops + reset force flags.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Set feature_metrics.deep_feature_batch_threshold = 256 so the cross- FOV/cross-timepoint batched precompute path has a clear default that shows up in the resolved Hydra config. Both evaluate_predictions and precompute-gt read this key via OmegaConf.select with a fallback of 256 — explicit default lets downstream leaves (e.g. benchmark configs) override per-run without touching code. CellDINO's internal batch_size is 32, so threshold=256 amortizes 8 internal chunks per flush — comfortable on a single H100. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ecks _flush_kind never read its `side` arg (ctx.side was authoritative), hence the `del side` workaround. Remove the parameter and update both batcher call sites. The per-iteration position-name checks inside evaluate_predictions' main loop are subsumed by the upfront hoisted validation block that walks the same zip; nothing mutates pred_positions / gt_positions / seg_positions between the hoist and the loop. Underscore the unused position names in the unpacking instead. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous push API took a pre-stacked (n_cells, H, W) ndarray plus a (0, 0, 0) sentinel for zero-cell slots; _flush_kind then destacked it back into a per-cell list to feed features_from_crops. The round trip + the sentinel were pure ceremony — features_from_crops already handles an empty list and the per-cell list shape it needs. Take the raw list[np.ndarray] from build_crops end to end. Both callers (precompute_deep_features, precompute-gt CLI) drop their local stack-or-sentinel expression; _flush_kind drops the per-row Python loop and uses a flat comprehension over items. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
_flush_kind called flush_manifest(ctx) after every batched write, which under typical settings is 3 backbones × ceil(N_cells / flush_threshold) round trips per session — each one locks the global manifest, reloads YAML, merges, and writes. The "crash-resilience" case the inline flush nominally protected was already covered: zarr slot writes are durable per slot, so on resume the lockless prefetch finds them whether or not the manifest is current. Move the flush to the caller, after batcher.drain(). evaluate_predic- tions flushes every active side; precompute-gt flushes once after the loop. Manifest write cadence drops to O(active_sides) for eval and O(1) for precompute, matching the durability the caller wants. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Torch metrics (PCC/SSIM/NRMSE/PSNR) ran on CUDA but then both tensors were force-moved to CPU numpy and fed to cubic's spectral_pcc / fsc_resolution — round-tripping through host memory every timepoint. cubic's spectral metrics are array-module-agnostic and accept cupy inputs directly, so the host bounce is pure overhead. Hand the tensors off via cubic.cuda.ascupy / asnumpy (zero-copy on CUDA via the CUDA Array Interface, per cubic AGENTS.md) so PCC/SSIM/ NRMSE/PSNR/Spectral_PCC/FSC all stay on one device. Also defer _require_cubic to the path that actually needs it, so torch-only runs no longer require the cubic/cucim/cupy extras to be installed. Smoke: on a 15x512x512 volume with spectral PCC enabled, per-call wall time drops from ~2.95 s (CPU bounce) to ~70 ms (GPU-resident). CPU and GPU paths agree to ~1e-7 relative on PCC/NRMSE/PSNR/Spectral_PCC and ~6e-5 on SSIM (float32 cuDNN noise). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…doff Three small cleanups in compute_pixel_metrics from the simplify pass: 1. fsc_resolution mean-centers internally before every FFT (see cubic/metrics/spectral/frc.py lines 171/251/696/877), so the outer ``target - target.mean()`` / ``prediction - prediction.mean()`` was wasted: two extra full-volume allocations + reductions per (FOV, t). 2. Hoist ``to_xp = ascupy if cuda else asnumpy`` so the two-line branch collapses to one converter call per tensor. 3. Drop the ``_xp`` suffix; reuse the ``prediction, target`` names after conversion. cubic's AGENTS.md explicitly steers away from importing the ``xp`` convention into callers. Docstring restructured to lead with the load-bearing WHY (host bounce dominated wall time on long timelapses) instead of restating the implementation. Numerics unchanged: CPU and GPU paths still agree to ~1e-7 on PCC/NRMSE/PSNR/Spectral_PCC and ~6e-5 on SSIM on a 15x256x256 smoke run. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(viscy-models): add 3D PatchGAN discriminator and LSGAN losses
Introduces a new gan subpackage with PatchGAN3D and MultiScalePatchGAN3D
(pix2pixHD-style, 2-scale by default), plus lsgan_d_loss / lsgan_g_loss
list-based helpers. These will pair with UNetViT3D to form a 3D pix2pix
adversarial baseline alongside the existing regression and flow-matching
DynaCell baselines.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(dynacell): add DynacellGAN Lightning module for 3D pix2pix baseline
DynacellGAN composes the existing UNetViT3D generator with the new
MultiScalePatchGAN3D discriminator, training with LSGAN + L1 (λ=100) via
manual optimization (automatic_optimization=False). A requires_grad
toggle separates the D step from the G step so each backward only
populates the active side's grads; D grads are also explicitly cleared
between optimizers so the test invariant (no D grads after a G step)
holds for Phase 2 DDP correctness.
Validation logs the existing loss/validate weighted-mean alias, so
ModelCheckpoint(monitor="loss/validate") keys off the same metric used
across the other DynaCell baselines. configure_optimizers builds two
AdamW + WarmupCosine pairs directly to fit the two-optimizer return
shape Lightning's manual mode expects.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(dynacell): add pix2pix3d_unetvit recipe, model overlays, and smoke leaf
The recipe declares dynacell.engine.DynacellGAN with UNetViT3D's
production generator config and a 2-scale spectral-normalized PatchGAN3D.
fit / predict model overlays mirror the unetvit3d ones with lr_g/lr_d
split (no scalar lr) and bf16-mixed precision.
The single-GPU smoke leaf for er/pix2pix3d_unetvit/ipsc_confocal sets
benchmark.dataset_ref: null to disable the compose-hook resolver so it
can point data_path at SEC61B_test48.zarr directly, and uses batch_size=2
to match the inherited num_samples=2 (HCSDataModule divisibility rule).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(dynacell): add ER iPSC production train leaf for pix2pix3d_unetvit
Mirrors er/unetvit3d/ipsc_confocal/train.yml with the model overlay and
all model_name / experiment_id / run_root / job_name strings retargeted
to pix2pix3d_unetvit. Keeps ModelCheckpoint.monitor: loss/validate
unchanged — DynacellGAN logs that key via on_validation_epoch_end.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(dynacell): add pix2pix3d_unetvit single-set train leaves across organelles
Adds 7 production single-GPU training leaves for pix2pix3d_unetvit:
- er/pix2pix3d_unetvit/a549_mantis
- {membrane, mito, nucleus}/pix2pix3d_unetvit/{ipsc_confocal, a549_mantis}
Each leaf mirrors the matching unetvit3d cell with model_overlay,
model_name, experiment_id, logger name/save_dir, ckpt dirpath, launcher
run_root/job_name retargeted to pix2pix3d_unetvit. ModelCheckpoint
monitor=loss/validate preserved.
Joint training leaves (joint_ipsc_confocal_a549_mantis) deferred — still
gated on the open BatchedConcatDataModule DDP deadlock (handoff
2026-04-26).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(dynacell): add iPSC predict + eval leaves for er/pix2pix3d_unetvit
Mirrors er/unetvit3d/ipsc_confocal/{predict,eval}__ipsc_confocal.* with
the model overlay, model_name, experiment_id, ckpt_path, output_store
filename, and eval pred_path / save_dir all retargeted to
pix2pix3d_unetvit. predict_method stays full_image (the only mode
DynacellGAN.predict_step supports for v1).
ckpt_path uses an explicit REPLACE_ME_WITH_PRODUCTION_CHECKPOINT_PATH
placeholder so torch.load raises FileNotFoundError with the placeholder
visible if the user forgets to override it — running predict against an
untrained generator would silently produce garbage.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(dynacell): add a549_mantis predict + eval triplet for er/pix2pix3d_unetvit/ipsc
Adds the 3 a549_mantis test splits (mock, denv, zikv) for the
iPSC-trained pix2pix3d_unetvit ER cell — enables cross-dataset transfer
evaluation. Mirrors the unetvit3d ipsc_confocal predict/eval triplet
with the same retargeting rule already used for the iPSC pair.
ckpt_path uses the REPLACE_ME_WITH_PRODUCTION_CHECKPOINT_PATH placeholder
so any forgotten override fails loudly via FileNotFoundError.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* refactor(dynacell): /simplify pass — reuse configure_adamw_scheduler, dedupe _log_samples
DynacellGAN.configure_optimizers used to rebuild AdamW + WarmupCosineSchedule
inline despite the class docstring claiming it called the shared helper.
Now calls configure_adamw_scheduler twice and concatenates, dropping the
redundant schedule guard and the unused monai/torch.optim imports.
_log_samples was a byte-identical instance method on DynacellUNet,
DynacellFlowMatching, and DynacellGAN. Promoted to a module-level free
function and updated the five call sites across all three classes.
Updated test_dynacell_gan_validate_logs_alias to monkeypatch the module
attribute instead of the instance method.
Also trimmed three narrating comments inside DynacellGAN (the plan/PR
context comments belong in the PR description, not the source) and five
per-layer banner comments in patchgan3d.py (the conv literals state the
shape and the class docstring covers the layer table).
Safe under the running production training's Slurm auto-requeue: no
constructor signatures change, AdamW + WarmupCosine pairs are built with
the same args, optimizer state shape is unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(dynacell): Phase 2 — DDP topology + 4-GPU smoke + production leaves
Adds the DDP infrastructure for pix2pix3d_unetvit so the GAN baseline
can train on 4 H200s in ~13 h instead of the ~50 h single-GPU pace
measured on job 33132605 (now cancelled).
- recipes/topology/ddp_4gpu_gan.yml — DDP variant that wraps the strategy
in find_unused_parameters: true. Required because DynacellGAN runs
manual_optimization with two optimizers, and the per-step requires_grad
toggle leaves one side's params "unused" from the reducer's perspective
on each backward. Plain ddp_4gpu.yml stays single-strategy for the
other models.
- model_overlays/pix2pix3d_unetvit_fit_ddp.yml — mirrors the single-GPU
fit overlay but composes ddp_4gpu_gan.yml topology + 4-GPU hardware.
- er/pix2pix3d_unetvit/ipsc_confocal/train_smoke_4gpu.yml — 4-GPU smoke
variant. Same SEC61B_test48 store, batch_size=2 matching the inherited
num_samples=2 divisibility rule. Validates DDP sharding + the
find_unused_parameters DDP reducer behavior under the GAN's manual-opt
pattern before committing the 4-GPU production run.
- er/pix2pix3d_unetvit/ipsc_confocal/train_4gpu.yml — 4-GPU production
leaf. Keeps per-rank batch_size=4 (effective 16 across 4 GPUs) and
unchanged lr_g/lr_d so the DDP run is comparable to the single-GPU
reference at the per-rank optimization settings. logger name and
job_name suffixed with _4gpu to disambiguate in W&B/slurm; run_root
and ckpt dirpath unchanged so Phase 4 predict leaves keep working.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(dynacell): retarget membrane pix2pix3d_unetvit save paths
The membrane single-set train leaves landed with placeholder paths under
.../ipsc/memb_temp/ — diverging from the celldiff convention (memb, not
memb_temp) and conflating the a549_mantis run dir with the iPSC tree.
Both leaves now point at the canonical .../ipsc/memb/ and
.../a549_mantis/memb/ trees, matching the celldiff layout so all
organelles can be cross-referenced uniformly.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(dynacell): expand pix2pix3d_unetvit predict/eval matrix
Mirrors the er/pix2pix3d_unetvit/ipsc_confocal/ predict+eval set across
membrane, mito, and nucleus so every iPSC-trained pix2pix3d_unetvit run
has an apples-to-apples inference + evaluation path against both the
ipsc_confocal self-target and the three a549_mantis conditions (mock,
denv, zikv). Predict leaves swap target-inherited normalizations for a
source-only NormalizeSampled (Phase3D) and clear the RandWeightedCropd
augmentation. Eval leaves enable compute_feature_metrics only on the
ipsc_confocal self-target, matching the existing er convention. Ckpt
paths are placeholders pending the corresponding production runs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(dynacell): add pix2pix3d_unetvit joint train + predict leaves
Adds the joint_ipsc_confocal_a549_mantis variant for all four
organelles (er, membrane, mito, nucleus): one train.yml authoring
BatchedConcatDataModule with two HCSDataModule children plus the four
predict leaves (self iPSC + three a549_mantis conditions). The joint
train differs from the celldiff joint by using NormalizeSampled
(fov_statistics) instead of MinMaxSampled so joint and single-set
pix2pix3d_unetvit runs share the same normalization contract for
apples-to-apples ablations. batch_size=2 × num_samples=2 matches the
single-set 4 GPU-samples/step (per the BatchedConcatDataModule rule
in CLAUDE.md that joint does NOT divide by num_samples).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* refactor(viscy-models): drop speculative num_layers from PatchGAN3D
PatchGAN3D and MultiScalePatchGAN3D accepted a num_layers constructor
parameter that only worked with the value 5 — any other value raised
ValueError. The argument was justified as future-proofing for ablations
that don't exist yet, which is exactly the speculative flexibility
CLAUDE.md prohibits: "No 'flexibility' or 'configurability' that wasn't
requested. ... No error handling for impossible scenarios." Re-add it
only if and when a non-5 variant lands. Recipe + test fixture updated
in lockstep; uv run pytest packages/viscy-models/tests/test_gan/ and
the DynacellGAN init/forward tests pass (11/11).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(dynacell): apply TTUR — drop pix2pix3d_unetvit lr_d to lr_g/10
The 1:1 LR schedule collapsed to D-dominance on DDP run 33176091
(wandb 20260520-002629): loss/d_train fell from 1.3e-2 (ep1) to
~1e-4 (ep12+), loss/g_adv pinned at ~1.0 (saturated MSE(d_fake, 1)),
loss/g_l1 flat at ~0.45 from epoch ~5 onward — the adversarial
gradient was effectively zero, generator was training as pure L1.
Slowing the discriminator 10x (lr_d = lr_g / 10 = 3e-5) is the
canonical pix2pix-style fix for D-dominance. Applies to both the
single-GPU and DDP fit overlays so the entire Phase 3 baseline
(all 4 organelles × {iPSC, A549, joint}) inherits the same
adversarial balance regime.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Alexandr Kalinin <alxndrkalinin@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(dynacell): add eval runtime module + thread budgeting (C1)
Adds dynacell.evaluation.runtime with three layered thread-cap entry points:
early_apply_env_caps() sets BLAS/OMP env vars before C-extension load (called
from __main__.main_cli); apply_thread_budget(threads) is the in-process safety
net (env + torch.set_num_threads + threadpool_limits); resolve_runtime(config)
materializes the new runtime config block.
The runtime block (eval.yaml) is opt-in for parallelism (default executor=serial,
fov_workers=1) but unconditionally tightens BLAS/OMP threads to cpu_count by
default. On SLURM nodes where --cpus-per-task < host CPU count, this corrects
previously-loose BLAS threading. Set DYNACELL_THREADS_PER_WORKER=N to override
at the env layer for cases where in-process caps come too late.
precompute_gt_artifacts validates that runtime.executor='serial' and
fov_workers=1 — FOV-level parallelism is restricted to evaluate_predictions
in this commit; precompute parallelism is out of scope (DeepFeatureBatcher
accumulates state across FOVs).
resolve_runtime supports two-phase resolution for evaluate_predictions:
top-of-function for parent BLAS cap, then post-position-list with
freeze_threads_per_worker= to keep parent and worker BLAS budgets consistent.
Worker primitives (_process_one_fov, ProcessPoolExecutor) land in C3+C4.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(dynacell): per-T memory hygiene + region timing instrumentation (C2)
Adds always-on region timers (RegionTimer context manager) that record
wall time of FOV-level and per-T regions to <save_dir>/eval_timing.csv.
Region taxonomy matches actual execution placement at pipeline.py:341-505:
FOV-level (t=None): mask_gt, mask_pred, cp_gt, deep_gt_dinov3,
deep_gt_dynaclr, deep_gt_celldino, features_pred_per_t, microssim, seg_write
Per-T: pixel_metrics, mask_metrics, feature_pairwise
Timing overhead is ~0.1 ms per region per timepoint; ~120 ms total on a
9-h eval (~4e-6 of wall time) — no gating knob.
Per-T memory hygiene knobs (runtime.cuda_empty_cache_every_n_timepoints,
runtime.gc_collect_every_n_fovs) ship at 0 (no behavior change) in this
commit. The plan calls for measuring and flipping defaults in C2 — that
measurement requires a real eval run on the test48 zarr that exceeds this
session's scope. Defaults stay at 0; observed late-T degradation in
prior A549 ER evals (18 -> 61 s/T over 10 timepoints) is the documented
motivation but not the measured fix in this commit.
DYNACELL_FORCE_PER_T_HYGIENE=1 env var flips both knobs on at runtime
regardless of YAML defaults — operator escape hatch for post-ship
mitigation without a code change.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* refactor(dynacell): extract per-FOV body into _process_one_fov + FovResult (C3)
Pure refactor: extract the per-FOV body of evaluate_predictions into a
module-level function _process_one_fov(...) -> FovResult and an
_aggregate_fov_result(...) parent-side aggregator. Serial-path behavior
is identical to the prior code (verified: 384/48 pass/fail parity with
HEAD~2; the 48 failures are all pre-existing).
FovResult carries everything one FOV contributes to the run as a
picklable dataclass:
- per_t_pixel_rows (with microssim already merged in per pipeline.py:471
semantics — no separate microssim_rows field),
- per_t_mask_rows, per_t_feature_rows,
- seg_array of shape (T, 2, D, H, W) bool (channel 0 = pred, channel 1 = GT),
- 4 _BackboneLists for cp/dinov3/dynaclr/celldino — each holds the per-T
pred_feats/gt_feats/pred_fovs/gt_fovs/pred_ts/gt_ts arrays. Workers
preserve the existing `if size > 0: append` guard so empty per-T slots
never enter the lists; aggregator extends parent lists unconditionally.
- timings slice from the per-process region_timer collector.
_aggregate_fov_result writes the seg array to the HCS plate (still in the
parent — concurrent create_position from workers would race plate
metadata), extends per-T row lists, and extends the 24 dataset-level
accumulators.
This refactor positions C4 to add ProcessPoolExecutor + worker primitives
without further changes to the FOV body. flush_manifest stays in the
parent's loop for now; under process mode (C4) each worker will flush
its own ctx at FOV end.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(dynacell): wire ProcessPoolExecutor + GPU serialization lock (C4)
Adds worker primitives in runtime.py (_worker_initializer,
gpu_serialization_lock, make_fov_executor) and the worker entry point
_worker_run_fov in pipeline.py. evaluate_predictions now branches on
runtime.executor: serial (default) runs _process_one_fov inline
identical to today; process spawns a ProcessPoolExecutor over FOVs.
Worker design:
- Spawn context avoids inheriting parent CUDA. _worker_initializer
sets BLAS/OMP env caps BEFORE apply_thread_budget lazy-imports torch.
- _worker_setup lazy-loads seg_model + extractors under
gpu_serialization_lock(): N workers don't init CUDA + load weights
concurrently. Plate handles cached per-worker.
- GPU touch sites inside _process_one_fov are wrapped in
gpu_serialization_lock(): mask_gt/pred, cp_gt, deep_gt_{dinov3,dynaclr,
celldino}, features_pred_per_t, pixel_metrics (per-T, GPU-bound
end-to-end including cubic spectral), and on-demand cellpose fallback
inside mask_metrics. In serial mode the lock has no path set and yields
immediately — bit-for-bit-identical behavior.
- Lock file at /tmp/dynacell_gpu_<SLURM_JOB_ID-or-PID>.lock (node-local,
not NFS). One lock per SLURM job, shared across workers.
Phase-2 runtime resolution: after pred_positions is built,
resolve_runtime is re-called with n_positions= to clamp "auto"
fov_workers to min(provisional, n_positions) and freeze
threads_per_worker at the Phase-1 value so worker initializers match
the parent's already-applied BLAS cap.
Order: results collected via as_completed (arrival order) but aggregated
in original pred_positions order so per-FOV CSV + embedding NPZ row
order is deterministic regardless of executor.
Workers flush their own ctx manifest at FOV end (global manifest lock at
_pos_write_lock(ctx, "manifest", "global") fcntl serializes flushes
across workers); interrupted process-mode runs preserve cache progress
the same way serial mode did.
Tests: full dynacell suite 384/48 pass/fail (identical to baseline; all
48 failures pre-existing). No new spawn-mode test in this commit —
covered by C5 CPU integration test.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test(dynacell): runtime + parallel tests, CI wiring, SLURM env caps, docs (C5)
Adds test coverage for the C1-C4 runtime module and FOV-parallelism wiring:
test_runtime.py (23 tests): apply_thread_budget no-crash semantics under
Hydra's main; resolve_runtime auto/freeze/validate paths; region timing
collector; DYNACELL_FORCE_PER_T_HYGIENE env-var hook; spawn-context
ProcessPoolExecutor smoke via stdlib os.getpid.
test_evaluation_pipeline_parallel.py (3 tests): FovResult pickle round-trip
with realistic per-backbone shapes (the path that ProcessPoolExecutor takes
for Future.result transport); _aggregate_fov_result extends all 24
dataset-level accumulator lists correctly. Re-imports the live pipeline
module inside test bodies because test_lazy_init.py pops dynacell.* modules
from sys.modules, which would invalidate pickle resolution of a
module-level imported FovResult.
CI wiring:
- Root pyproject.toml registers the "slow" marker and appends -m "not slow"
to existing addopts.
- test-dynacell-configs runs test_runtime.py + test_evaluation_pipeline_parallel.py
alongside existing benchmark-schema/submit-tool tests. No new heavy deps
needed (these tests don't import iohub, cubic, transformers).
SLURM env caps:
- run_eval_jointtrained_nucleus.slurm and
run_eval_unext2_jointtrained_membrane.slurm export
DYNACELL_THREADS_PER_WORKER=\$SLURM_CPUS_PER_TASK before invoking eval.
Docs:
- applications/dynacell/CLAUDE.md gains "Eval runtime / parallelism"
section documenting the config block, env-var hooks, timing
instrumentation, and process-mode caveats.
Full dynacell test suite: 410 passed / 48 failed / 65 skipped — the 48
failures are the same set that fails on dynacell-models@HEAD (verified
parity across all 5 commits in this branch).
End-to-end serial-vs-process parity test on a real iohub+cache fixture is
a follow-up (see .claude/plans/eval-parallelism.md §C5).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(dynacell): drop parent seg_model under executor=process to avoid duplicate residency
Under executor=process, workers lazy-load their own seg_model copies (under
the GPU lock) — but the parent had been holding a duplicate copy GPU-resident
since the unconditional prepare_segmentation_model(config) call at the top of
evaluate_predictions. On an 80 GB A100 with fov_workers=4, this meant
(N+1) × model_weights resident instead of N.
After Phase-2 runtime resolution (where the final executor is known), drop
the parent's copy when executor=process. The pre-warm side effect
(segmenter_model_zoo's validate_model populating the checkpoint dir to
avoid N workers racing on a cold quilt download) is already done by then.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore(dynacell): resolve config interpolations in-place before worker submit
Belt-and-suspenders safety net suggested in PR #426 review: call
OmegaConf.resolve(config) in-place after Phase-1 runtime resolution and
before any worker pickle.dumps the DictConfig across the spawn boundary.
OmegaConf supports lazy interpolation resolution natively, so workers
*would* resolve ${...} refs on first attribute access — but doing it once
in the parent eliminates a tiny per-worker hot-path cost and guards against
the (unlikely) case where an interpolation depends on parent-only state.
??? MISSING fields are untouched: verified OmegaConf 2.3 `_resolve` walks
only nodes where `_is_interpolation()` is true, so bare ??? values pass
through. Confirmed: 63 tests pass across pipeline / compose / parallel /
runtime suites.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore(dynacell): silence inner per-T tqdm in spawn workers
Adds a runtime.is_worker() helper that returns True iff
_worker_initializer has set _GPU_LOCK_PATH. Wire it into
_process_one_fov's inner-T tqdm via disable=is_worker(), so under
fov_workers>1 the N concurrent per-T bars don't interleave on the
parent's stderr. Outer per-FOV tqdm (in the parent) stays visible
because the parent's _GPU_LOCK_PATH is never set.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(dynacell): don't double-count timing rows under executor=serial
Bug: _process_one_fov writes region timings to the global _TIMINGS
collector via region_timer, and returns the slice in FovResult.timings.
_aggregate_fov_result then calls extend_timings(result.timings)
unconditionally. In serial mode this re-appends each row to the SAME
parent collector — every timed region appears twice in eval_timing.csv.
In process mode the second append is correct (workers have separate
per-process _TIMINGS, parent's is empty until aggregated).
Fix: add extend_worker_timings kwarg to _aggregate_fov_result. Pass
True in process mode, False in serial. Verified via 63-test suite green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(dynacell): context-manage worker plate handles per FOV
_worker_setup no longer caches pred/gt/seg plate handles in _WORKER_STATE.
Instead, _worker_run_fov opens them via `with open_ome_zarr(...) as ...`
scoped to one FOV — iohub file descriptors close between FOVs and the
worker has nothing to clean up on ProcessPoolExecutor shutdown.
Models + cache contexts stay cached in _WORKER_STATE (their load cost
~5-10 s per worker is too high to repeat per FOV; ProcessPoolExecutor
process exit reclaims them).
Position lookup by name uses a linear scan of plate.positions() (iohub
has no name->position index). Cheap for the ≤ tens of positions per
typical eval — sub-ms per scan.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(dynacell): skip prepare_segmentation_model under require_complete_cache
When io.require_complete_cache=true, the caller asserts that all needed
masks live in the cache — so loading SuperModel (which hits a quilt
download for nucleus/membrane on a cold node) is wasted work. Return
None unconditionally in that mode. If a cache miss happens despite the
assertion, the cache layer raises StaleCacheError before any segment()
call, so the None seg_model never gets dereferenced.
Enables the cache-only/no-model path that C5's integration test design
relies on (run evaluate_predictions end-to-end against prebuilt mask
caches without instantiating cellpose/SuperModel).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test(dynacell): add CPU integration test for serial-vs-process parity
Drives evaluate_predictions end-to-end on a tiny iohub fixture (3 positions
× T=2 × D=4 × H=W=32) with prebuilt mask caches. Runs the pipeline twice —
once with runtime.executor=serial, once with runtime.executor=process
runtime.fov_workers=2 — and asserts:
1. pixel_metrics + mask_metrics rows are byte-equal after sort by FOV.
2. segmentation_results.zarr per-position content is byte-equal across
both channels.
Cache-only design (matches plan §C5): target_name=er +
require_complete_cache=true + compute_feature_metrics=false +
compute_microssim=false + pixel_metrics.fsc=null +
pixel_metrics.spectral_pcc=null. The full pipeline runs without
instantiating cellpose, SuperModel, DinoV3, DynaCLR, CellDino, or cubic.
What this validates: spawn boot, DictConfig pickling, FovResult.timings +
arrays round-trip through pickle, _aggregate_fov_result extends parent
accumulators correctly, parent-side segmentation_results.zarr writes
don't race, sort-by-pos_name aggregation, and the new
extend_worker_timings semantics doesn't double-count timing rows.
_live_pipeline_module() clears dynacell.evaluation.* from sys.modules
before importing — necessary because test_evaluation_pipeline.py stubs
metrics/utils via monkeypatch.setitem(sys.modules, ...), and without the
clear my pipeline's compute_pixel_metrics resolves to the empty-returning
stub.
CI: runs ~16 s on the existing test-dynacell-configs job.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* ci(dynacell): include CPU integration test in dynacell-configs CI job
Adds tests/test_evaluation_pipeline_parallel_cpu.py to the dynacell
pytest invocation. The test runs end-to-end evaluate_predictions in
both serial and spawn-process modes against a tiny iohub fixture +
prebuilt mask cache — no eval extras (cellpose, transformers, cubic)
needed because target_name=er + io.require_complete_cache=true
short-circuit segmenter and feature-extractor loads.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(dynacell): harden runtime resolution + gate GPU lock on use_gpu
Four Copilot-review findings on the runtime layer:
1. apply_thread_budget threadpool_limits re-arm: the old handle becomes
unreachable and is GC'd. The previous comment said "stays alive"
which was misleading. Document the actual behavior: we WANT the new
limit active, dropping the old handle is intentional. Also remove
the dead if-branch since both arms did the same assignment.
2. resolve_runtime accepts literal fov_workers=0 / threads_per_worker=0
which produces divide-by-zero in the "auto" math or nonsensical
thread counts. Add explicit >= 1 validation with a clear error.
3. _cpu_count() prefers os.sched_getaffinity(0) over os.cpu_count() so
SLURM cgroup limits (cpuset etc.) are honored — otherwise auto-
resolved fov_workers/threads_per_worker can oversize past the
cgroup-visible CPU budget. Falls back to os.cpu_count() on non-Linux.
4. gpu_serialization_lock now takes a gate: bool = True kwarg. Callers
in pipeline.py pass gate=use_gpu so under config.use_gpu=false the
lock is a no-op even in process mode. Previously the lock fired
unconditionally, defeating cross-worker parallelism for CPU-only
eval runs. The lock still gates on _GPU_LOCK_PATH for the
parent/serial no-op (unchanged).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* perf(dynacell): stream aggregation under executor=process to bound peak memory
Previously buffered every FovResult in results_by_pos until ALL N
worker futures completed, then aggregated in pos_names_in_order. Peak
parent memory: N × seg_array (~N × 440 MB for T=10 D=21 H=W=1024 evals)
all live simultaneously.
Switch to streaming aggregation: drain the buffer opportunistically in
pos_names_in_order as futures complete. Only out-of-order FovResults
stay buffered; once the next-expected position lands we aggregate
(plate write + accumulator extend) and drop the ref. Peak buffer is
bounded by the number of out-of-order FOVs in flight, not by N.
Deterministic per-FOV CSV / embedding NPZ row order is preserved
because aggregation still happens in pos_names_in_order — early
completions just wait their turn in the buffer.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(dynacell): grouped multi-condition eval driver (#427)
* feat(dynacell): grouped multi-condition eval driver
Add ``dynacell evaluate-grouped``: run the same (model, organelle) across
multiple I/O variants (the canonical case is the three A549 treatment
plates mock/denv/zikv) in one Python process, paying the segmenter +
DINOv3 + DynaCLR + CELL-DINO cold-start once instead of N times. On
serial-mode interactive jobs and on per-condition SLURM jobs this is
30-90s of weight load + checkpoint warmup amortized across all
conditions. Under runtime.executor=process, each condition still spawns
its own pool whose workers re-load model copies, so only the parent-side
load is amortized; the driver prints a note when it detects this.
Mechanism:
- ``dynacell.evaluation.model_loader.load_eval_models`` extracts the
segmenter + extractor instantiation that previously lived inline in
``evaluate_predictions`` and ``_worker_setup``. The bundle carries
identity tags (dinov3_model_name, dynaclr_ckpt source paths, encoder
cfg, celldino weights) so ``init_cache_context`` callers get identical
cache-identity values across conditions.
- ``evaluate_predictions(config, *, models=None)`` accepts a pre-loaded
bundle and skips its inline load when supplied. ``models=None``
preserves the historical single-condition behavior; existing callers
and tests are unchanged.
- ``evaluate_predictions_grouped(config)`` reads a top-level
``conditions:`` list, deep-merges each overlay into a copy of the
base, and runs them sequentially with the shared models. Per-condition
``_final_metrics_cache_valid`` short-circuits are honored independently.
- ``_check_grouped_field_invariants`` rejects overlays that touch
``target_name``, ``feature_extractor.*``, ``compute_feature_metrics``,
or ``use_gpu`` -- those determine which models get loaded and must be
identical across conditions. ``io.*``, ``save.*``, ``runtime.*``,
``limit_positions``, ``force_recompute.*`` are free to vary.
- Hydra entry point ``evaluate_model_grouped`` with new
``_configs/eval_grouped.yaml`` (inherits ``eval.yaml`` via defaults).
- New CLI subcommand registered in ``__main__._HYDRA_COMMANDS``.
Tests: ``tests/test_evaluation_grouped.py`` builds a 2-condition
cache-only fixture (separate pred plates + cache dirs) and asserts
byte-equal parity against sequential per-condition runs, plus rejection
cases for empty ``conditions`` and forbidden model-loading-field
overrides. Cache-only path means no models actually load -- the test
validates the loop structure, not the speedup. Reuses the iohub fixture
builders from ``test_evaluation_pipeline_parallel_cpu.py``.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* refactor(dynacell): simplify grouped eval driver + fold load + cache-context unification
Followup polish on d69d970 (feat: grouped multi-condition eval driver).
Addresses /simplify findings + the two related cleanups the user asked to
fold in:
Pipeline driver (pipeline.py):
- Lazy model load — defer ``load_eval_models`` until the first condition
that misses ``_final_metrics_cache_valid``. Restarts where every
condition is cache-hit pay no 30–90 s cold start.
- Invariant baseline is now the conditions-stripped input config, not
the conditions[0]-merged config. A model-loading override smuggled
into condition 0 is rejected symmetrically with conditions 1..N
(previously it was silently re-baselined). New test guards the
regression: ``test_grouped_rejects_condition0_model_field_override``.
- ``_merge_condition`` drops the manual ``to_container`` round-trip
deep-copy; ``OmegaConf.merge`` already returns a non-aliased config.
- ``_check_grouped_field_invariants`` switched to ``OmegaConf.is_config``
in place of the ``type(OmegaConf.create([]))`` idiom; baseline
snapshot is computed once and reused per iteration.
- Extracted ``_load_cached_final_metrics`` so the three-NPY reload block
is shared between ``evaluate_model`` and the grouped driver.
- Removed redundant first-iteration merge + ``apply_dataset_ref`` for
condition 0; the loop body now does both for every condition,
including idx=0.
Model loader (model_loader.py) and precompute-gt:
- Added ``LoadFlags`` so ``load_eval_models`` can be called from both
``evaluate_predictions`` (all extractors gated by
``compute_feature_metrics``) and ``precompute_gt_artifacts``
(per-extractor ``build.masks`` / ``build.dinov3`` / ``build.dynaclr``
/ ``build.celldino`` flags). ``LoadFlags.for_evaluate`` recovers the
prior default. The 30-line inline load block in ``precompute_cli.py``
collapses to a single ``load_eval_models(config, flags=LoadFlags(...))``
call. Stricter celldino weights gate is preserved as an explicit
pre-load raise.
- Added ``init_cache_contexts(config, models)`` and
``init_gt_cache_context(config, models)`` so the four identical
``init_cache_context(..., side=..., **identity_tags)`` call sites in
``evaluate_predictions``, ``_worker_setup``, and
``precompute_gt_artifacts`` consolidate behind the EvalModels bundle.
Test fixtures (tests/_eval_fixtures.py):
- Promoted the cache-only fixture builders (``build_eval_config``,
``build_fixture``, ``make_hcs_plate``, ``make_mask_cache``,
``live_pipeline_module``, ``read_position_arrays``,
``N_POSITIONS``/``T``/``D``/``H``/``W``) out of
``test_evaluation_pipeline_parallel_cpu.py`` into a private
``_eval_fixtures.py`` module (underscore-prefixed so pytest skips
collection). Both ``test_evaluation_pipeline_parallel_cpu.py`` and
``test_evaluation_grouped.py`` now import normally — no
``sys.path.insert`` (project's "no ``sys.path`` edits" rule).
CI:
- ``.github/workflows/test.yml`` adds ``test_evaluation_grouped.py`` to
the ``test-dynacell-configs`` matrix so the grouped driver gets
exercised on every push.
All 104 tests in the affected suites pass. Pre-existing 48-test failure
baseline (test_benchmark_config_composition + one test_evaluation_metrics
case) is unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(dynacell): tier grouped-eval process-mode message by cache state
Split the existing ``[grouped] note:`` print into two tiers so users get
proportional signal for the configuration they actually picked:
- ``executor=process`` + ``require_complete_cache=true`` (cache-only,
n_conditions > 1) keeps the mild informational note — workers re-init
per condition but skip ``prepare_segmentation_model`` (returns None)
and don't instantiate the deep extractors, so the only overhead is
N pool spawns.
- ``executor=process`` + ``require_complete_cache=false`` (cache-miss,
n_conditions > 1) gets a loud multi-line WARNING. This is the actual
worst case: each condition's worker pool independently loads
SuperModel + DINOv3 + DynaCLR + CELL-DINO, so total wasted cold-start
scales as ~30-90 s × N_workers × N_conditions. The warning explicitly
points to ``executor=serial`` as the fix, since the parent's
pre-loaded ``EvalModels`` is shared across conditions in serial mode
but lost across pool spawns in process mode.
Two new tests in ``test_evaluation_grouped.py``:
- ``test_grouped_warns_loudly_on_process_plus_cache_miss`` confirms the
WARNING text fires (executor=process + require_complete_cache=false +
2 conditions) with all the expected keywords.
- ``test_grouped_mild_note_on_process_plus_cache_only`` confirms the
cache-only fixture (require_complete_cache=true) gets the mild note
and does NOT trip the loud WARNING.
CLAUDE.md "Grouped multi-condition eval" section updated with the
tiered behavior so users searching for the WARNING text find context.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(dynacell): grouped eval struct-mode + require_complete_cache invariant
Three fixes triggered by Copilot's review on PR #427 plus an independent
struct-mode bug that would have crashed every real Hydra invocation of
``dynacell evaluate-grouped``:
1. ``_merge_condition`` was over-simplified during the /simplify pass
(commit c271d3f). ``OmegaConf.merge`` propagates struct mode from
its first argument, and Hydra composition always produces struct-mode
configs. Result: merging an overlay carrying a ``name`` label (which
isn't in the schema) raises ``ConfigKeyError``, and ``del
merged["conditions"]`` raises ``ConfigTypeError`` even when the merge
succeeds. Restore the ``to_container`` round-trip on the base — it
escapes struct mode while still producing an independent config. Same
fix applied to the ``models_base`` snapshot a few lines above.
Existing tests didn't catch this because they construct configs via
``OmegaConf.create({...})`` directly (non-struct); real users going
through ``@hydra.main`` would have hit the crash immediately.
2. Add ``io.require_complete_cache`` to ``_MODEL_LOADING_FIELDS`` so
per-condition overlays cannot override it. The flag flips
``prepare_segmentation_model`` between "load SuperModel" and "return
None" — letting a condition flip it would mean the shared model
bundle is wrong for that condition (None when cache-miss expects a
real segmenter, or loaded but never used). Treat as a grouped
invariant. (Copilot thread on pipeline.py:1227.)
3. Add ``applications/dynacell/tests/__init__.py``. The new test files
use ``from ._eval_fixtures import ...`` relative imports, which work
under the project's ``--import-mode=importlib`` setting but are
fragile under other pytest import modes. Adding ``__init__.py``
makes ``tests/`` a proper package and the relative imports work
regardless of which import mode pytest is invoked with. (Copilot
threads on test_evaluation_grouped.py:32 and
test_evaluation_pipeline_parallel_cpu.py:42.)
Two new regression tests in ``tests/test_evaluation_grouped.py``:
- ``test_grouped_rejects_require_complete_cache_override`` covers the
new invariant.
- ``test_grouped_works_on_struct_mode_config`` wraps the parity fixture
in ``OmegaConf.set_struct(grouped_cfg, True)`` to simulate a
Hydra-composed config; would have failed before the
``_merge_condition`` fix.
All 108 tests in affected suites pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs(dynacell): fix two CLAUDE.md doc nits flagged in PR #427 self-review
Two 75-confidence findings from the parallel review on PR #427 that
scored below the 80-threshold for the inline review comment but are
worth fixing now while context is fresh:
1. "Reality check for process mode" subsection referenced
``pipeline.py:171`` as the parent seg-model load site. After the
eval-parallelism refactor that line is a dataclass field
(``pred_ts: list[np.ndarray] = ...``); the actual load now lives in
``load_eval_models(config)`` called from the top of
``evaluate_predictions``. Switch the reference from a brittle line
number to a function-name pointer.
2. "Grouped multi-condition eval" subsection said the per-condition
cache-load short-circuit fires "with ``force_recompute.final_metrics
=false``", omitting the second gate: ``_final_metrics_cache_valid``
also returns False (forcing recompute) when ``force_recompute.all
=true``, regardless of the ``final_metrics`` flag. Make both gates
explicit so users who set ``force_recompute.all=true`` for unrelated
reasons aren't surprised by unexpectedly recomputed metrics.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Alexandr Kalinin <alxndrkalinin@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(dynacell): keep SuperModel under require_complete_cache when pred cache off
`prepare_segmentation_model` was returning None unconditionally
whenever `io.require_complete_cache=true`, on the assumption that all
masks would be served from cache. That assumption breaks for
nucleus/membrane targets when `io.pred_cache_dir=None`: the per-T loop
in `_process_one_fov` falls back to `segment(predict[t],
seg_model=seg_model)`, which raises `ValueError` because `seg_model` is
None.
Narrow the early return to cases where no path can reach `segment()`
without a cached mask — organelle targets (workflow-based, no
`seg_model` needed) or nucleus/membrane runs that also have
`io.pred_cache_dir` configured. Reported by Copilot on PR #426.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs(dynacell): align gpu_serialization_lock docstring with implementation
The docstring claimed "no CUDA available in the worker" was one of the
no-op conditions, but the implementation only checks `gate` and
`_GPU_LOCK_PATH`. On a CPU-only node with `use_gpu=true` the lock would
still serialize across workers. Rather than add a runtime CUDA probe
(callers already pass `gate=use_gpu` to opt out for CPU-only regions),
correct the docstring to describe what the code actually does.
Reported by Copilot on PR #426.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs(dynacell): rewrite eval README — runtime/grouped/cache-only + trim bloat
The eval README had grown stale relative to the runtime/grouped/cache
work landed on this branch and PR #427: no mention of
`io.require_complete_cache`'s cache-only fast path, no `runtime.*`
block, no `dynacell evaluate-grouped`, no CELL-DINO.
This rewrite folds those in and restructures: FOV parallelism + grouped
eval move out of "Artifact caches" (where they were mis-classified)
into a new top-level "Scaling" section; a new "Quick start" sits above
Configuration; HF-cache operational details go to a new "HPC notes"
section at the end.
Bloat trim along the way: components table compressed (related modules
combined, trivial files pushed to a note), HF-cache + external-users
sections shortened, the 11-row force_recompute table collapsed to a
YAML block plus 5 grouped descriptions, redundant Hydra
group-discovery paragraphs removed. No flags, footguns, or CLI
options dropped — every config knob and env var from the old version
is still documented. 466 → 265 lines.
Public-repo hygiene: replaced the hard-coded
`/hpc/projects/comp.micro/...` default for the team-shared HF cache
with a description that points at the `_DEFAULT_SHARED_HF_CACHE`
constant in `dynacell/__main__.py` and the `DYNACELL_SHARED_HF_CACHE`
env-var override.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(dynacell): address PR #426 code-review findings
Five fixes from a code-review pass on the controlled FOV-parallel eval
work:
- Restore the seg_model invariant for grouped multi-condition runs.
After 5829175, ``prepare_segmentation_model``'s load decision
depends on ``io.pred_cache_dir`` in addition to
``io.require_complete_cache``, but ``_MODEL_LOADING_FIELDS`` only
caught the latter. A per-condition overlay setting
``io.pred_cache_dir=None`` against a base that has it set (for
nucleus/membrane + ``require_complete_cache=true``) would crash
at runtime: the shared ``seg_model`` is None but the per-T loop
falls back to ``segment(predict[t], seg_model=None)``. New
``_seg_model_required()`` helper mirrors the load decision and
``_check_grouped_field_invariants`` rejects the dangerous direction
only, leaving canonical per-condition pred caches free to differ.
- Fix the process-mode progress bar stalling. ``pbar.update(1)`` only
fired inside the in-order drain loop, so if FOV 0 was slow and later
FOVs completed first, the bar stayed at 0% until FOV 0 drained.
Update on each worker completion instead.
- Correct the misleading ``[grouped] note:`` for ``executor=process`` +
``require_complete_cache=true``. The message claimed "no models
actually load" but deep extractors do load whenever
``compute_feature_metrics=true``, and the segmenter only short-
circuits in a subset of cases. Rewrite to match reality.
- Move ``runtime.*`` imports (12 symbols) from four inline
per-function blocks to one top-of-file import. No spawn-ordering or
circular-import constraint blocks them; the inline pattern just
violated the project's "no inline imports without strong reason"
rule and duplicated a ``flush_manifest`` import that already lives
at module top.
New regression test ``test_grouped_rejects_pred_cache_dir_flip_for_nucleus``
guards the dangerous direction; an existing test's assertion updates to
the corrected note wording.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Alexandr Kalinin <alxndrkalinin@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
All other dynacell-era models (FNet3D, FCMAE family) write to /hpc/projects/.../models/dynacell/...; only the legacy comparison- vs-viscy models (celldiff, unetvit3d) and my new pix2pix3d_unetvit baseline (which copied the celldiff/unetvit3d leaf templates) kept the legacy /models/cell_diff_vs_viscy/ prefix. This commit normalizes pix2pix3d_unetvit alone — 31 leaves under benchmarks/virtual_staining/<organelle>/pix2pix3d_unetvit/ — so future Phase 3/4 runs land in the canonical dynacell tree. The in-flight DDP run (33190465) is unaffected: it reads from its already-rendered resolved YAML and writes to the legacy path; that run's outputs will be migrated manually after it completes. Legacy celldiff + unetvit3d leaves are intentionally left alone in this commit (already-trained models, would need on-disk mv too). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ed models 16 new predict configs under */celldiff/a549_mantis/ (4 organelles × 4 predict sets: denv/mock/zikv/ipsc_confocal). A549 test sets use z_window_size=48 with a549trained zarr naming; iPSC confocal OOD eval uses z_window_size=40 with a549trained suffix in ipsc/predictions/. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The new metric suite — torch-fidelity FID/KID/PRC/MIND, real-vs-pred linear probes (CP/DINOv3/DynaCLR/CellDINO), and the CellDINO encoder baseline — is the intended default for the post-refactor re-eval campaign. Previously the base default was false, so any leaf that did not explicitly set true silently skipped DINOv3+DynaCLR+CellDINO loading and emitted an 11-column feature_metrics.csv instead of the 74-column full schema. Flip the base default in _configs/eval.yaml; add explicit compute_feature_metrics: true to the 72 leaf eval__*.yaml configs that previously relied on the default; flip the three fnet3d mix-trained nucleus shell scripts that hard-coded false. Leaves restate the flag so intent stays visible at the leaf without chasing the defaults graph. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
defaults: [eval, _self_] caused eval's expansion (which ends with the ``leaf`` group) to apply BEFORE eval_grouped.yaml's own ``_self_`` body, so ``conditions: []`` from _self_ silently clobbered any leaf-provided ``conditions:`` list. ``evaluate-grouped leaf=<path>`` would then exit immediately with "evaluate_predictions_grouped requires a non-empty top-level 'conditions' list." Swapping to defaults: [_self_, eval] runs the schema-declaring _self_ first, then lets eval (and its trailing ``optional leaf: null``) apply on top, so the leaf's conditions land last and survive composition. Verified on all 12 generated grouped leaves: ``uv run dynacell evaluate-grouped leaf=<path> --cfg job`` now reports the leaf's full conditions list (16-28 entries) with target_name set correctly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The old example ``uv run dynacell evaluate-grouped -c eval_grouped_a549_mantis_er`` was wrong on two counts: * ``-c`` is Hydra's ``--cfg`` flag, which only accepts ``job``, ``hydra``, or ``all`` for config display. Passing a config name there is a no-op (or argparse error, depending on placement). * The grouped leaves live under ``configs/benchmarks/virtual_staining/_internal/leaf/grouped/`` and are discovered via the ``leaf=`` group override the same way single-condition leaves work — that's the actual entry point baked into the dynacell CLI's _EXTERNAL_SEARCHPATHS. Replace with the correct ``leaf=grouped/<bucket>/eval_grouped`` form and add a one-line note explaining why ``-c`` doesn't work, so the next person who follows the docs gets a runnable command. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Lets one tool cover both serial chained predicts (existing behavior,
default) and SLURM array submissions (new), so hand-written
``run_*_pred_*.slurm`` scripts can be retired in favor of a single
launcher with topology validation + resolved-config artifacts.
New flags (all opt-in, no change to existing serial path):
* ``--array`` — render ``sbatch_template_array.sbatch`` with
``#SBATCH --array=0-(N-1)``, per-task logs at ``%A_%a.{out,err}``,
and a bash ``CONFIGS=(...)`` block dispatched by
``$SLURM_ARRAY_TASK_ID``.
* ``--max-array-concurrency N`` — append ``%N`` to the array spec
to cap concurrent tasks. Requires ``--array``.
* ``--allow-mixed-directives`` — bucket leaves by ``(SBATCH directive
identity, run_root)`` and submit one sbatch array per bucket. Each
bucket writes artifacts into its own ``run_root``; job names get a
``_g{N}`` suffix. Without this flag, ``--array`` keeps the strict
uniform-hardware/uniform-run_root invariant of serial mode.
Internals:
* Factored out ``_compose_leaves``, ``_validate_uniform_sbatch``,
``_validate_devices``, ``_write_resolved_configs``,
``_render_serial_sbatch``, ``_render_array_sbatch``, and
``_sbatch_submit`` from the monolithic ``submit()`` so the
serial/array dispatch reads as a thin top-level loop.
* ``_render_sbatch_directives`` learned an ``array_spec`` arg that
inserts ``#SBATCH --array=`` and switches the log filenames from
``%j`` to ``%A_%a``.
* ``_directive_bucket_key`` produces a stable hashable identity for
a leaf's SBATCH block (``nodes``, ``gpus``, ``mem``, ``constraint``,
...), used by ``--allow-mixed-directives`` to group leaves.
The new ``sbatch_template_array.sbatch`` mirrors
``sbatch_template_batch.sbatch`` (umask, group-writable mkdirs,
cleanup trap, ``cd @@repo_root``, ``ml uv``, ``nvidia-smi``) and ends
with ``srun uv run python -m dynacell predict --config "$CFG"`` where
``$CFG`` is selected from ``CONFIGS`` by ``$SLURM_ARRAY_TASK_ID``.
``predict_batch.sh`` already forwards extra args via ``"$@"``; the
docstring is updated to advertise the new flags as supported
passthrough with worked examples.
Tests cover: ``--max-array-concurrency`` / ``--allow-mixed-directives``
rejected without ``--array``; negative concurrency rejected; mixed
``run_root`` rejected without ``--allow-mixed-directives``; serial
mode unchanged (no array directive, no ``CONFIGS=``); ``--array``
emits the array directive and ``CONFIGS`` block; ``--max-array-concurrency``
appends ``%N``; ``--allow-mixed-directives`` buckets 4 ER leaves
(3 a549-test + 1 ipsc-test) into two sbatch arrays with distinct
``_g0``/``_g1`` job-name suffixes; ``_directive_bucket_key``
distinguishes by constraint and ignores ``time``.
Verified end-to-end against the 16 celldiff_r2 a549trained leaves:
one invocation auto-buckets into ``a549/predictions/`` (12 tasks)
+ ``ipsc/predictions/`` (4 tasks) sbatch arrays.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds the tooling that drives a 12-bucket re-eval campaign for the
post-refactor metric suite (torch-fidelity FID/KID/PRC/MIND, real-vs-pred
linear probes, CellDINO baseline, updated DynaCLR ckpt) across every
(organelle × train_set) tuple that currently has a prediction zarr on
disk.
Generator (``tools/generate_grouped_eval_configs.py``)
======================================================
Walks ``{ipsc,a549}/{predictions,joint_predictions}/`` and parses every
prediction zarr into a canonical identity ``(organelle, model, variant,
train_set, test_set, condition)``. Seven filename patterns are
recognized, including the legacy ``<org>_<model>__<gene>_<cond>`` form
that pre-dates the modern A549 naming — these are still live on disk for
sec61b + tomm20 deterministic and CellDiff_r2 iPSC-trained predictions
and must be consumed. ``celldiff_r2`` in ``joint_predictions/`` without a
variant suffix is recognized as the joint-default form. Skip allowlist
for the known-stale May-5 ``memb_celldiff_*`` zarrs and the
``sec61b_fnet3d``/``sec61b_unext2`` legacy aliases. The original
non-r2 CellDiff family is also skipped (only ``celldiff_r2`` is in
scope per the 2026-05-21 instruction).
Per condition, derives:
* ``benchmark.dataset_ref: {dataset, target}`` as a dict (resolver
silently returns None on string refs or partial dicts). Manifest
slugs are hyphen-form (``aics-hipsc``, ``a549-mantis-<gene>-<cond>``);
target keys are gene markers on A549 (``h2b``/``caax``) vs logical
organelle names on iPSC.
* ``io.pred_path`` (the walked zarr) and
``io.pred_cache_dir`` under
``{dataset}/eval_cache_pred/{train_set}/{model_variant}/{condition}/``.
* ``save.save_dir`` at the canonical re-eval path
(``{dataset}/{evaluations_with_embeddings,evaluations_a549trained_with_embeddings,evaluations_jointtrained_with_embeddings}/eval_<paper>[_<infix>]_<organelle>[_<cond>]``).
* ``name`` — stable per-condition label used by the grouped driver.
Pre-checks before emission: every pred_path exists; every gt_cache_dir
contains both DynaCLR (sha12 ``e409a5a079aa``) and CellDINO (sha12
``ef7c17ffb0aa``) feature dirs; no duplicate save_dirs within a bucket.
``--skip-fresh`` filters out conditions whose canonical save_dir
already has a ≥70-col ``feature_metrics.csv`` (used after canceling
+ resubmitting the array to avoid redoing completed work).
Emits 12 production leaves (one per organelle × train_set) plus a 13th
``_probe`` leaf used by the Step-5 sanity check (6 conditions × 2
positions, all save_dirs under ``/tmp/reeval_probe/`` so partial runs
can't corrupt production), plus a README summarizing per-bucket
condition counts. Leaves are pure ``# @Package _global_`` overlays
(no ``defaults: [eval_grouped, _self_]`` — that creates a composition
cycle since ``evaluate-grouped`` already starts from
``config_name="eval_grouped"``).
Tests (``tools/generate_grouped_eval_configs_test.py``)
=======================================================
* Grammar dispatch: 14 parametrized cases covering the seven canonical
patterns + variants + dedupe-preference-for-joint-dir.
* Failure cases raise ``ValueError`` with informative messages
(unknown organelle, unknown model, unknown CellDiff variant).
* Save_dir/pred_cache_dir/dataset_ref derivation matches the canonical
layout (incl. the A549 ``h2b``/``caax`` vs iPSC ``nucleus``/``membrane``
target-key split).
* Live-data tests (marked ``requires_data``) walk the actual training
tree, assert every emitted ``pred_path`` exists, and compose +
resolve every generated leaf via ``OmegaConf.merge`` +
``apply_dataset_ref`` + ``_check_grouped_field_invariants`` — catches
what ``--cfg job`` does NOT (Hydra-only composition vs eval-time
resolver splice + invariant checks).
Campaign runners
================
``tools/run_reeval_campaign.slurm`` — SLURM array (0-11%4) over the 12
grouped leaves. ``constraint=a100|h100|h200`` (eval-time GPU load is
~15 GB; ``pix2pix3d_needs_h200`` applies to predict-time DDP only).
``--time=48h`` (2× the probe-based projection for the largest 28-cond
bucket). ``DYNACELL_THREADS_PER_WORKER=32`` set before any C-extension
loads so BLAS caps bite at first thread-allocator load (per
``dynacell/CLAUDE.md`` "Eval runtime / parallelism").
``tools/verify_reeval_outputs.py`` — post-run sanity walks every
emitted save_dir + pred_cache_dir and runs the 5 plan-defined checks:
``feature_metrics.csv`` has ≥70 cols (target 74), 8 ``{gt,pred}_*``
embedding npz files, ``eval_timing.csv`` present, CellDINO/DynaCLR
probe + fidelity columns present, ``pred_cache/features/celldino/``
populated. Prints a pass/fail table; exits non-zero on any fail.
Grouped leaves
==============
12 leaves + ``_probe`` leaf + README under
``configs/benchmarks/virtual_staining/_internal/leaf/grouped/``,
auto-generated by the script above. Committed as a campaign snapshot
(generator can regenerate at any time given the prediction zarrs + GT
caches on disk). Current state reflects the post-``--skip-fresh`` set
after the first array's partial completion: 16/8/1/16/16/6/22/12/9/22/16/6
= 150 production conditions + 6 probe conditions.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…style) (#428) * feat(viscy-models): add 3D PatchGAN discriminator and LSGAN losses Introduces a new gan subpackage with PatchGAN3D and MultiScalePatchGAN3D (pix2pixHD-style, 2-scale by default), plus lsgan_d_loss / lsgan_g_loss list-based helpers. These will pair with UNetViT3D to form a 3D pix2pix adversarial baseline alongside the existing regression and flow-matching DynaCell baselines. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(dynacell): add DynacellGAN Lightning module for 3D pix2pix baseline DynacellGAN composes the existing UNetViT3D generator with the new MultiScalePatchGAN3D discriminator, training with LSGAN + L1 (λ=100) via manual optimization (automatic_optimization=False). A requires_grad toggle separates the D step from the G step so each backward only populates the active side's grads; D grads are also explicitly cleared between optimizers so the test invariant (no D grads after a G step) holds for Phase 2 DDP correctness. Validation logs the existing loss/validate weighted-mean alias, so ModelCheckpoint(monitor="loss/validate") keys off the same metric used across the other DynaCell baselines. configure_optimizers builds two AdamW + WarmupCosine pairs directly to fit the two-optimizer return shape Lightning's manual mode expects. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(dynacell): add pix2pix3d_unetvit recipe, model overlays, and smoke leaf The recipe declares dynacell.engine.DynacellGAN with UNetViT3D's production generator config and a 2-scale spectral-normalized PatchGAN3D. fit / predict model overlays mirror the unetvit3d ones with lr_g/lr_d split (no scalar lr) and bf16-mixed precision. The single-GPU smoke leaf for er/pix2pix3d_unetvit/ipsc_confocal sets benchmark.dataset_ref: null to disable the compose-hook resolver so it can point data_path at SEC61B_test48.zarr directly, and uses batch_size=2 to match the inherited num_samples=2 (HCSDataModule divisibility rule). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(dynacell): add ER iPSC production train leaf for pix2pix3d_unetvit Mirrors er/unetvit3d/ipsc_confocal/train.yml with the model overlay and all model_name / experiment_id / run_root / job_name strings retargeted to pix2pix3d_unetvit. Keeps ModelCheckpoint.monitor: loss/validate unchanged — DynacellGAN logs that key via on_validation_epoch_end. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(dynacell): add pix2pix3d_unetvit single-set train leaves across organelles Adds 7 production single-GPU training leaves for pix2pix3d_unetvit: - er/pix2pix3d_unetvit/a549_mantis - {membrane, mito, nucleus}/pix2pix3d_unetvit/{ipsc_confocal, a549_mantis} Each leaf mirrors the matching unetvit3d cell with model_overlay, model_name, experiment_id, logger name/save_dir, ckpt dirpath, launcher run_root/job_name retargeted to pix2pix3d_unetvit. ModelCheckpoint monitor=loss/validate preserved. Joint training leaves (joint_ipsc_confocal_a549_mantis) deferred — still gated on the open BatchedConcatDataModule DDP deadlock (handoff 2026-04-26). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(dynacell): add iPSC predict + eval leaves for er/pix2pix3d_unetvit Mirrors er/unetvit3d/ipsc_confocal/{predict,eval}__ipsc_confocal.* with the model overlay, model_name, experiment_id, ckpt_path, output_store filename, and eval pred_path / save_dir all retargeted to pix2pix3d_unetvit. predict_method stays full_image (the only mode DynacellGAN.predict_step supports for v1). ckpt_path uses an explicit REPLACE_ME_WITH_PRODUCTION_CHECKPOINT_PATH placeholder so torch.load raises FileNotFoundError with the placeholder visible if the user forgets to override it — running predict against an untrained generator would silently produce garbage. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(dynacell): add a549_mantis predict + eval triplet for er/pix2pix3d_unetvit/ipsc Adds the 3 a549_mantis test splits (mock, denv, zikv) for the iPSC-trained pix2pix3d_unetvit ER cell — enables cross-dataset transfer evaluation. Mirrors the unetvit3d ipsc_confocal predict/eval triplet with the same retargeting rule already used for the iPSC pair. ckpt_path uses the REPLACE_ME_WITH_PRODUCTION_CHECKPOINT_PATH placeholder so any forgotten override fails loudly via FileNotFoundError. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(dynacell): /simplify pass — reuse configure_adamw_scheduler, dedupe _log_samples DynacellGAN.configure_optimizers used to rebuild AdamW + WarmupCosineSchedule inline despite the class docstring claiming it called the shared helper. Now calls configure_adamw_scheduler twice and concatenates, dropping the redundant schedule guard and the unused monai/torch.optim imports. _log_samples was a byte-identical instance method on DynacellUNet, DynacellFlowMatching, and DynacellGAN. Promoted to a module-level free function and updated the five call sites across all three classes. Updated test_dynacell_gan_validate_logs_alias to monkeypatch the module attribute instead of the instance method. Also trimmed three narrating comments inside DynacellGAN (the plan/PR context comments belong in the PR description, not the source) and five per-layer banner comments in patchgan3d.py (the conv literals state the shape and the class docstring covers the layer table). Safe under the running production training's Slurm auto-requeue: no constructor signatures change, AdamW + WarmupCosine pairs are built with the same args, optimizer state shape is unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(dynacell): Phase 2 — DDP topology + 4-GPU smoke + production leaves Adds the DDP infrastructure for pix2pix3d_unetvit so the GAN baseline can train on 4 H200s in ~13 h instead of the ~50 h single-GPU pace measured on job 33132605 (now cancelled). - recipes/topology/ddp_4gpu_gan.yml — DDP variant that wraps the strategy in find_unused_parameters: true. Required because DynacellGAN runs manual_optimization with two optimizers, and the per-step requires_grad toggle leaves one side's params "unused" from the reducer's perspective on each backward. Plain ddp_4gpu.yml stays single-strategy for the other models. - model_overlays/pix2pix3d_unetvit_fit_ddp.yml — mirrors the single-GPU fit overlay but composes ddp_4gpu_gan.yml topology + 4-GPU hardware. - er/pix2pix3d_unetvit/ipsc_confocal/train_smoke_4gpu.yml — 4-GPU smoke variant. Same SEC61B_test48 store, batch_size=2 matching the inherited num_samples=2 divisibility rule. Validates DDP sharding + the find_unused_parameters DDP reducer behavior under the GAN's manual-opt pattern before committing the 4-GPU production run. - er/pix2pix3d_unetvit/ipsc_confocal/train_4gpu.yml — 4-GPU production leaf. Keeps per-rank batch_size=4 (effective 16 across 4 GPUs) and unchanged lr_g/lr_d so the DDP run is comparable to the single-GPU reference at the per-rank optimization settings. logger name and job_name suffixed with _4gpu to disambiguate in W&B/slurm; run_root and ckpt dirpath unchanged so Phase 4 predict leaves keep working. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(dynacell): retarget membrane pix2pix3d_unetvit save paths The membrane single-set train leaves landed with placeholder paths under .../ipsc/memb_temp/ — diverging from the celldiff convention (memb, not memb_temp) and conflating the a549_mantis run dir with the iPSC tree. Both leaves now point at the canonical .../ipsc/memb/ and .../a549_mantis/memb/ trees, matching the celldiff layout so all organelles can be cross-referenced uniformly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(dynacell): expand pix2pix3d_unetvit predict/eval matrix Mirrors the er/pix2pix3d_unetvit/ipsc_confocal/ predict+eval set across membrane, mito, and nucleus so every iPSC-trained pix2pix3d_unetvit run has an apples-to-apples inference + evaluation path against both the ipsc_confocal self-target and the three a549_mantis conditions (mock, denv, zikv). Predict leaves swap target-inherited normalizations for a source-only NormalizeSampled (Phase3D) and clear the RandWeightedCropd augmentation. Eval leaves enable compute_feature_metrics only on the ipsc_confocal self-target, matching the existing er convention. Ckpt paths are placeholders pending the corresponding production runs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(dynacell): add pix2pix3d_unetvit joint train + predict leaves Adds the joint_ipsc_confocal_a549_mantis variant for all four organelles (er, membrane, mito, nucleus): one train.yml authoring BatchedConcatDataModule with two HCSDataModule children plus the four predict leaves (self iPSC + three a549_mantis conditions). The joint train differs from the celldiff joint by using NormalizeSampled (fov_statistics) instead of MinMaxSampled so joint and single-set pix2pix3d_unetvit runs share the same normalization contract for apples-to-apples ablations. batch_size=2 × num_samples=2 matches the single-set 4 GPU-samples/step (per the BatchedConcatDataModule rule in CLAUDE.md that joint does NOT divide by num_samples). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(viscy-models): drop speculative num_layers from PatchGAN3D PatchGAN3D and MultiScalePatchGAN3D accepted a num_layers constructor parameter that only worked with the value 5 — any other value raised ValueError. The argument was justified as future-proofing for ablations that don't exist yet, which is exactly the speculative flexibility CLAUDE.md prohibits: "No 'flexibility' or 'configurability' that wasn't requested. ... No error handling for impossible scenarios." Re-add it only if and when a non-5 variant lands. Recipe + test fixture updated in lockstep; uv run pytest packages/viscy-models/tests/test_gan/ and the DynacellGAN init/forward tests pass (11/11). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(dynacell): apply TTUR — drop pix2pix3d_unetvit lr_d to lr_g/10 The 1:1 LR schedule collapsed to D-dominance on DDP run 33176091 (wandb 20260520-002629): loss/d_train fell from 1.3e-2 (ep1) to ~1e-4 (ep12+), loss/g_adv pinned at ~1.0 (saturated MSE(d_fake, 1)), loss/g_l1 flat at ~0.45 from epoch ~5 onward — the adversarial gradient was effectively zero, generator was training as pure L1. Slowing the discriminator 10x (lr_d = lr_g / 10 = 3e-5) is the canonical pix2pix-style fix for D-dominance. Applies to both the single-GPU and DDP fit overlays so the entire Phase 3 baseline (all 4 organelles × {iPSC, A549, joint}) inherits the same adversarial balance regime. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(viscy-models): add nonsat / RpGAN losses + R1/R2 grad penalties Adds three loss families and two gradient penalties to viscy_models.gan, extending the existing multi-scale 3D PatchGAN convention to the modern StyleGAN2 / R3GAN recipe: - nonsat_d_loss / nonsat_g_loss: softplus non-saturating loss, verbatim from stylegan2-ada-pytorch/training/loss.py. - rpgan_d_loss / rpgan_g_loss: relativistic pairing loss, verbatim from brownvc/R3GAN/Trainer.py. - r1_penalty / r2_penalty: Mescheder zero-centered gradient penalty on real / fake input. Multi-scale aggregation = mean of per-scale ||grad||^2 (NOT ||grad of sum||^2, which would mix cross-scale derivatives through MultiScalePatchGAN3D's pool-derived second scale). All losses preserve the existing list-of-per-scale-Tensors convention. LSGAN losses are unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(dynacell): wire DynacellGAN modernization knobs (StyleGAN2/R3GAN recipe) Adds opt-in modernization to DynacellGAN with legacy-safe defaults (loss_type=lsgan, r1_gamma=0, ema_kimg=None, lecam_gamma=0). The 14 existing leaves composing pix2pix3d_unetvit_fit.yml are unaffected because none of them set the new knobs and the new defaults preserve pre-modernization behavior. New constructor knobs: - loss_type {lsgan, nonsat, rpgan} + lambda_adv - r1_gamma / r2_gamma / r1_every (lazy schedule with StyleGAN2-style * r1_every unbiased rescaling). RpGAN raises ValueError unless r1_gamma > 0 (R3GAN Theorem 3.1 convergence requirement). - ema_kimg + use_ema_at_predict for the generator EMA shadow. Decay derived per-step as 0.5 ** (global_batch / (ema_kimg * 1000)). - lecam_gamma + lecam_decay (decay=0.9 matches google/lecam-gan reference). Training-step structural changes: - D-step counter (_d_step_count buffer) drives the lazy R1 schedule; self.global_step would advance per opt.step() (2 per training_step with manual_optimization) so it can't be used directly. - R1 / R2 path is outer-guarded on (r1_gamma > 0 or r2_gamma > 0) so legacy leaves skip the grad-of-grad compute and the spectral-norm power-iteration advance. - R1 / R2 compute is wrapped in autocast(enabled=False); Lightning's bf16-mixed plugin would otherwise put bf16 into D forwards which is fragile under create_graph=True grad-of-grad. - RpGAN G step recomputes fresh d_real against the post-update discriminator (matches R3GAN Trainer.py convention). - LeCam EMA update uses sync_dist'd batch means via self.all_gather so all ranks update the buffers identically without explicit all_reduce. - Generator EMA update after opt_g.step() (params are DDP-synced at that point, so per-rank EMA stays in lockstep). Inference paths: - _inference_generator helper returns EMA when available AND use_ema_at_predict; used by forward and predict_step. - validation_step runs both raw and EMA generator forwards (when EMA exists); emits both loss/validate (back-compat) and loss/validate_ema aliases. Checkpoint loading at engine.py:802 changed to strict=False with filtered warning logs (expected_missing for generator_ema.* / _d_step_count / _lecam_ema_*). When loading a pre-modernization checkpoint with EMA enabled, the EMA shadow is seeded from the loaded generator weights so inference paths return the loaded weights instead of the random-init deepcopy. 16 DynacellGAN tests cover legacy-default safety, modernized smoke, the R1/R2 gamma guard, lazy-R1 step counter, RpGAN constructor validation, EMA seeding on legacy load, LeCam buffer registration gating, two-pass validation logging, and inference-helper EMA selection. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(dynacell): add modernized pix2pix3d overlays + SEC61B Run A leaf New opt-in configs for the StyleGAN2/R3GAN-style modernized pix2pix3d recipe: - pix2pix3d_unetvit_fit_modernized.yml: single-GPU overlay with loss_type=nonsat, r1_gamma=10, r1_every=16, ema_kimg=10, lr_g=lr_d=2e-4. Existing pix2pix3d_unetvit_fit.yml stays untouched so the 14 legacy leaves continue running LSGAN unchanged. - pix2pix3d_unetvit_fit_ddp_modernized.yml: 4-GPU DDP sibling, same model knobs, composes ddp_4gpu_gan topology (find_unused_parameters=True). - er/pix2pix3d_unetvit/ipsc_confocal/train_4gpu_modernized.yml: Run A of the plan. SEC61B headline experiment. ModelCheckpoint monitors loss/validate_ema (the EMA generator's val loss = inference path). Checkpoints land in a distinct subtree from the LSGAN baseline so there is no collision risk with in-flight runs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(dynacell): harden DynacellGAN against silent failures (PR #428 review) Addresses findings from the PR review subagents: - ema_kimg=0 raises ValueError (silent EMA freeze prevention). - RpGAN now requires both r1_gamma > 0 AND r2_gamma > 0 (R3GAN Theorem 3.1 requires both penalties for sharp-distribution convergence; previously only r1_gamma was validated). - ckpt_path strict=False load: missing-but-not-expected keys now RAISE RuntimeError instead of warnings.warn (silent half-loaded models burn hours of training before users notice). Unexpected-keys path switched from warnings.warn to _logger.warning. - Partial-EMA checkpoints (some but not all generator_ema.* keys) now RAISE — silently half-seeding the EMA shadow would produce nonsense outputs in the missing layers. - on_predict_start logs which generator (EMA vs raw) will be used at inference, so users notice silent fallbacks (predict overlay forgot ema_kimg, use_ema_at_predict=False, etc.). New tests: - test_dynacell_gan_rpgan_requires_r2 - test_dynacell_gan_ema_kimg_zero_raises - test_dynacell_gan_rpgan_smoke (end-to-end trainer.fit with rpgan + R1+R2; exercises the fresh-d_real RpGAN G-step path) - test_dynacell_gan_ema_update_math (closed-form EMA decay verification) - test_dynacell_gan_use_ema_at_predict_false (predict_step returns raw output when EMA is gated off) - test_dynacell_gan_ckpt_unexpected_missing_raises (corrupted ckpt surfaces as RuntimeError, not a swallowed warning) Existing test_dynacell_gan_ema_seed_on_legacy_load updated to use use_spectral_norm=False on both sides — with the stricter unexpected-missing check, SN parametrization name mismatches would otherwise trip the raise. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(dynacell): validate r1_every >= 1 in DynacellGAN.__init__ r1_every is the modulo period for the lazy R1/R2 schedule and is also used as the unbiased rescaling factor on the penalty term. Without an init-time guard, r1_every=0 would ZeroDivisionError on the first D step when R1/R2 is enabled, and a negative value would silently fire the gate on the wrong cadence. Raise at construction so the bad config fails before any training time is burned. Surfaced by Copilot review on PR #428. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(dynacell): gate val sample logging on use_ema_at_predict DynacellGAN.validation_step assigned pred_for_samples=pred_ema whenever the EMA submodule existed, regardless of use_ema_at_predict. This contradicted predict_step / forward (both routed through _inference_generator, which respects the flag) — the dashboard could show EMA-generator samples while real predictions were emitted by the raw generator. Now matches the inference path. Regression test stubs out both generators with distinguishable constant modules and reads back validation_step_outputs to confirm which one fed the dashboard for each value of use_ema_at_predict. Surfaced by Copilot review on PR #428. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * perf(dynacell): keep _d_step_count as Python int to drop per-step sync Lazy-reg gate previously read `int(self._d_step_count)` each training step where the counter was a CUDA buffer. Each .item() call forced a device→host sync. Switch to a plain Python int (incremented in training_step, modulo-gated against r1_every) and persist it via on_save_checkpoint / on_load_checkpoint hooks. All DDP ranks start at 0 and advance deterministically, so the buffer-was-broadcast invariant is preserved without per-step sync overhead. Side benefit: `_d_step_count` is no longer a state_dict key, so the strict-load filter drops one expected-missing prefix. Legacy checkpoints without `_d_step_count` resume cleanly at 0 via the dict.get fallback. Surfaced by Copilot review on PR #428. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(dynacell): point modernized overlay header at PR #428, not local path The header comment referenced /home/alex.kalinin/.claude/plans/..., which won't exist for other contributors or in CI. Swap for the PR URL so the recipe rationale is reachable from the file. Surfaced by Copilot review on PR #428. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(dynacell): point SEC61B Run A leaf header at PR #428, not local path Same fix as the modernized overlay header — replace /home/alex.kalinin/.claude/plans/... reference with the PR URL so it resolves for everyone. Surfaced by Copilot review on PR #428. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Alexandr Kalinin <alxndrkalinin@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…g/dynacell-dynadtw PR #432 base modular-viscy-staging diverged from staging/dynacell-dynadtw when #404 squashed all of dynacell-models history into a single commit (393abdf). dynacell-models has 30 newer commits since dynadtw's last sync at f1df36b — re-eval campaign generator, controlled FOV-parallel eval (#426), 3D pix2pix GAN baseline (#425), DeepFeatureBatcher / precompute deep features, cache identity refactors, pix2pix3d_unetvit modernization (#428), backbone bookkeeping refactor onto _BackboneLists. Conflict triage (92 conflicts): - 89 files where staging/dynacell-dynadtw carried stale copies of dynacell-models content (no post-branch edits on dynadtw side): 84 AA conflicts (eval YAML configs, dynacell evaluation sources/tests, CLAUDE.md, tools) + 5 UU conflicts (engine.py, __init__.py, __main__.py, test_engine.py, .github/workflows/test.yml). Resolved by taking the modular-viscy-staging post-#404 version. - 3 files with genuine concurrent edits, resolved per direction: * .gitignore — dynadtw side is a strict superset (added local evaluation_matrix.md exclusion). Take dynadtw. * packages/viscy-models/.../foundation/cell_dino.py — dynadtw added CellDinoModel wrapper + per-image normalize flag; modular-viscy-staging didn't touch post-branch. Take dynadtw. * pyproject.toml — modular-viscy-staging added pytest "slow" marker + addopts "-m not slow" (no-overlap, kept); both sides identically added aics* git sources, py312 bump, pythonpath. iohub: modular-viscy-staging kept a git override pin at afe11f6a (~0.3.3); dynadtw dropped it for iohub>=0.3.5 from PyPI. Took dynadtw's strategy (PyPI 0.3.5 is released; viscy-data dep pin already at >=0.3.5 on this side). - uv.lock regenerated from the merged pyproject.toml (iohub 0.3.5 from PyPI registry). Sanity check: uv sync --all-packages --all-extras + uv run pytest -m "not slow" applications/dynacell/tests/ shows 51 failed / 459 passed / 78 skipped. All 51 failures are pre-existing on dynacell-models HEAD (test_benchmark_config_composition strategy/leaf composition issues + test_evaluation_metrics gain-and-offset) — verified by running the same suite on the clean backbone-refactor worktree. The merge introduces no new regressions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Resolves the 258 conflicts blocking PR #429 (252 AA from independent adds across both branches' dynacell scaffolding, plus 6 UU). AA resolution: - 250 files → take ours (backbone-refactor). The diffs are uniformly "ours is the newer evolution": cache_dir organelle-namespacing (fdf71bb), any-GPU launcher profile (7943915), Phase B preprocess_version (6c23471), _BackboneLists refactor (PR #429 scope), microssim → cubic (PR #429 scope), pixel-metric unification (PR #429 scope), tests targeting all of the above. Theirs is the older PR #404 squash + PR #432 absorption state. - 2 files → take theirs: * packages/viscy-models/src/viscy_models/gan/__init__.py * packages/viscy-models/src/viscy_models/gan/losses.py Base side has the pix2pix3d_unetvit GAN modernization (lsgan/nonsat/rpgan losses, R1/R2 penalties, EMA, LeCam). Job 33232131 depends on this — must not be regressed. UU resolution: - .gitignore → theirs (add evaluation_matrix.md path). - pyproject.toml (root) → theirs (drop temp cubic/iohub git overrides; both are released on PyPI now). - applications/dynacell/pyproject.toml → ours (cubic==0.7.0a5, no microssim — this IS PR #429's stated change). - applications/dynacell/src/dynacell/engine.py → theirs (DynacellGAN modernization, same reason as the GAN package files). - applications/dynacell/tests/test_engine.py → theirs (GAN modernization test suite). - uv.lock → regenerated via uv sync after fixing pyprojects. Sanity: cross-module imports (dynacell.evaluation + DynacellGAN + viscy_models.gan including new nonsat/rpgan/r1/r2 losses) succeed. pytest applications/dynacell/tests/test_pipeline_cache.py applications/dynacell/tests/test_engine.py → 83/83 pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Makes
applications/dynacellinto the reusabledynacellpackage in VisCy.Splits config responsibilities more cleanly:
dynacellcode plus generic model recipes/examples.dynacell-paper.This PR also adds opt-in validation loss support for CellDiff flow-matching so benchmark reruns can log a scalar validation curve without forcing extra validation work on every flow-matching job.
What changed
Config and package reorg
applications/dynacell/examples/configs/intoapplications/dynacell/configs/.configs/recipes/for reusable fragments andconfigs/examples/for generic runnable examples.lightning_logs/, SEC61B paper configs, and the hard-codedhcs_sec61b_3d.ymlrecipe.Absorb reusable benchmark modules from
dynacell-paperdynacell.datawith path-based manifest/collection/spec loaders and Pydantic schemas.dynacell.reportingwith Hydra config, tables, figures, and tests.dynacell.evaluationwith configs, spectral-PCC tooling, I/O helpers, segmentation, metrics, and tests.dynacell.preprocesswith the reusableconfig.pyandzarr_utils.pyutilities.Package and CLI shape
dynacellCLI soevaluateandreportroute to Hydra entry points whilefit/predict/test/validatestill go throughviscy_utils._configsinside the Python package so installeddynacell evaluate/reportcommands resolve config from package data instead of the source tree.dynacell.__init__lazy so lightweight imports do not pull in the full training stack.eval,report, andpreprocess, plus clearer missing-extra install hints from the CLI router.Runtime, training, and transform fixes
overwrite=support toHCSPredictionWriter, including explicit handling/logging when prediction channels already exist.VisCyCLIcheckpoint hparam precedence sopredict/test/validatehonor user config over stale checkpoint values, whilefitresume still keeps checkpoint training hparams.ckpt_pathloading toDynacellFlowMatchingso predict-time settings come from config rather than checkpoint metadata.compute_validation_losstoDynacellFlowMatching. When enabled it logsloss/val/<idx>and aggregateloss/validatewhile preserving epoch-end ODE sample generation; when disabled it keeps the previous cheaper validation path._aggregate_validation_losses()helper so UNet and flow-matching use the same weighted aggregation logic.BatchedRandAffinedwith configurablepadding_mode,safe_crop_size, andsafe_crop_coverage, plus warnings/tests around unsupported X/Y-rotation guarantees.Evaluation and reporting follow-up fixes
corr_coef/PCC computation and add regression tests.evaluate_model()returns consistent with fresh evaluation output types.use_gpuin evaluation instead of always taking the CUDA path when visible.FileNotFoundErrorfor missing preprocess configs instead of failing less clearly.Benchmark schema + launcher (landed earlier on this branch)
configs/benchmarks/virtual_staining/layout with shared axes (train_sets/,targets/,model_overlays/,launcher_profiles/,predict_sets/) and per-leaf train/predict files composed viaviscy_utils.compose.load_composed_config.applications/dynacell/tools/submit_benchmark_job.pywith--dry-run,--print-script,--print-resolved-config, and--override. It composes a leaf, strips thelauncher:+benchmark:reserved keys, freezes the resolved config to{run_root}/resolved/, and renders an sbatch script to{run_root}/slurm/._maybe_compose_configinviscy_utils/cli.pyso directdynacell fit -c <leaf>also strips those reserved keys.ckpt_sha256_12sidecar to the evaluation cache so repeated lookups skip full-file hashing.Evaluation pipeline scale-up
DeepFeatureBatcher+ upfrontprecompute_deep_featurespass amortize deep-extractor forwards across FOVs and timepoints; per-sideDeepFeatureBatcher.pushstreams raw crops with a single flush per side.runtime.executor=process/fov_workers/threads_per_worker) via spawn-contextProcessPoolExecutor, fcntl GPU serialization lock, layered BLAS/OMP thread caps, and region timers dumped toeval_timing.csv(feat(dynacell): controlled FOV-parallel eval + thread budgeting #426).compute_pixel_metricstensors device-resident (zero-copyascupy/asnumpyhandoff to cubic) so spectral-PCC and FSC stay on GPU on long timelapses.Grouped multi-condition eval + re-eval campaigns
dynacell evaluate-groupeddriver loads SuperModel + DINOv3 + DynaCLR + CELL-DINO once and loops over N I/O variants (typically A549 mock/denv/zikv) sharing the same target. Per-condition overlays may overrideio.*/save.*/runtime.*/ cache /limit_positions/force_recompute; model-loading fields are guarded against override at runtime.submit_benchmark_batch.py --arraymode for cluster-scale re-eval submission; re-eval campaign generator emits grouped leaves underconfigs/benchmarks/virtual_staining/_internal/leaf/grouped/.dynacell precompute-gt;compute_feature_metrics=trueis now the default in eval configs.Feature metrics migration to
torch-fidelity+ linear probestorch-fidelity's eigvals-based composition; ship per-(FOV, t) FID/KID/median-cosine and dataset-level FID/KID/PRC/MIND/cosine (feat(dynacell): migrate eval metrics to torch-fidelity, add probes #423).[gt; pred]with GroupKFold and per-foldMADScalerfit, no leakage) + indistinguishability mapping; cross-condition probe scaffolding.N < 2cells; auto-shrink KID subset size for small cohorts.3D pix2pix GAN baseline + modernization
pix2pix3d_unetvit) with UNetViT3D generator + PatchGAN discriminator; benchmark train + predict leaves added under the existing schema (feat(dynacell): add 3D pix2pix GAN baseline (pix2pix3d_unetvit) #425).Topology / trainer-recipe ownership cleanup
Four commits (
9734d07 → f9b8f1e → 5b7eaae → 3fdb7cf) untangle three layers that were previously mingled inrecipes/trainer/fit_*.yml:runtime_single_gpu.yml → runtime_shared.yml: the file's content (srun +PYTHONUNBUFFERED/NCCL_DEBUG/PYTHONFAULTHANDLERenv vars) has nothing single-GPU-specific, and every leaf — including the 4-GPU UNeXt2 leaf — composed it. Rename + 14 leaf-base updates + schema-doc references.recipes/topology/single_gpu.yml+ddp_4gpu.yml) ownstrainer.accelerator/devices/strategy/num_nodes. Duplicated independently underapplications/dynacell/configs/recipes/topology/andapplications/cytoland/examples/configs/recipes/topology/because CLAUDE.md forbids cross-application imports.recipes/trainer/fit.yml+predict.ymlper app, replacingfit_1gpu/fit_4gpu/fit_fm_4gpu/predict_gpu. Mode invariants only (seed, log cadence, callbacks,WandbLoggerclass pinned with per-app project). Topology and precision are owned by the topology recipes and model overlays respectively. Every benchmark model overlay and example leaf now composes[fit.yml + topology/*.yml]and sets precision + max_epochs explicitly.hardware_h200_single,hardware_gpu_any_long,hardware_4gpu) drop theirtrainer:block; they now own onlylauncher.sbatch.*.strategy: ddp → autoon every single-GPU leaf (Lightning-equivalent at devices=1, just intent-honest); dynacell examplesfnet3d/unetvit3d/unext2/fit.ymland benchmarkunext2.ymlnow pinWandbLogger project=dynacellinstead of falling through to Lightning's default TensorBoard;fit_1gpu-derived paths move fromsave_top_k: 4tosave_top_k: 5.precision: 32-true;examples/celldiff/fit.ymlkeeps FM-style epoch-based checkpointing (save_top_k: -1,every_n_epochs: 10, no monitor); cytoland FCMAE pretraining keepsstrategy: ddp_find_unused_parameters_true; cytoland→dynacell bridge configs keepproject: dynacelloverride over cytoland's default.applications/dynacell/tools/LEGACY/examples_configs/(pre-schema CellDiff configs) is removed outright per CLAUDE.md's "avoid backwards-compatibility hacks" rule. The equivalence tests that composed LEGACY (test_*_leaf_matches_legacy,test_fnet3d_paper_leaf_matches_ran_config,test_byte_equivalence_sec61b_train_leaf) are replaced with forward-looking composition sanity tests.~/.claude/plans/vectorized-sleeping-clock.md; commits ran through/review-planand/verify-plansubagent rounds before landing.Dependency and packaging updates
microssimto a compatible Git revision and refreshuv.lockso the newdynacell[eval]extra resolves with the current workspace.Why
This is the VisCy side of the
dynacell-papersplit:dynacellin VisCy becomes the reusable runtime.dynacell-paperowns benchmark-instance configs, orchestration, and manuscript-specific assets.That gives us a cleaner boundary for future work like multi-experiment collections and benchmark-specific runners without keeping paper state in the shared application package.
The flow-matching validation-loss change is intentionally opt-in because it adds extra validation compute and the resulting metric is still stochastic; default flow-matching checkpointing remains epoch-based.
The topology / trainer-recipe cleanup at the end removes a silent-drop-DDP trap: before this change, pairing a mismatched trainer recipe and benchmark hardware profile let the hardware profile's
trainer.deviceswin (both layers redundantly set it), which could silently run a DDP training on 1 GPU. Topology is now the single source of truth.Verification
uvx ruff check applications/dynacell/uvx ruff format --check applications/dynacell/.venv/bin/python -m pytest applications/dynacell/tests -q.venv/bin/python -m pytest applications/dynacell/tests/test_data_manifests.py applications/dynacell/tests/test_reporting_tables.py applications/dynacell/tests/test_reporting_tables_extended.py applications/dynacell/tests/test_reporting_figures.py applications/dynacell/tests/test_evaluation_metrics.py applications/dynacell/tests/test_evaluation_pipeline.py applications/dynacell/tests/test_evaluation_io.py applications/dynacell/tests/test_preprocess_config.py applications/dynacell/tests/test_preprocess_zarr_utils.py applications/dynacell/tests/test_cli_routing.py -q.venv/bin/python -m pytest applications/dynacell/tests/test_engine.py -q.venv/bin/python -m pytest applications/dynacell/tests/test_training_integration.py::test_celldiff_fm_validation_loss_keeps_generation applications/dynacell/tests/test_training_integration.py::test_celldiff_fm_warmup_cosine_fast_dev_run applications/dynacell/tests/test_training_integration.py::test_celldiff_fm_constant_schedule_fast_dev_run -q.venv/bin/dynacell fit --print_config -c applications/dynacell/configs/examples/unext2/fit.ymluv run pytest applications/dynacell/tests/test_benchmark_config_composition.py applications/dynacell/tests/test_submit_benchmark_job.py -v(benchmark schema + submit-tool tests)ddp → autoat devices=1,WandbLoggeradded where previously default TB).uv run dynacell fit -c <benchmark-unext2-leaf> --trainer.fast_dev_run=true --trainer.devices=1 --trainer.strategy=autoran one training step cleanly on an A40 interactive node (loss 1.048).submit_benchmark_job.py <benchmark-unext2-leaf>queued a real 4-GPU DDP job (31122607) via the new schema path.