It would be beneficial to introduce a new statuslike reproducibility-failed to ?bioimageio.core.test_model(). Currently, models that pass the initial metadata check but fail later tests are categorized under the same broad status valid-format`.
Adding this intermediate status would allow us to differentiate between two distinct failure cases:
-
valid-format: The model passes metadata checks but fails core execution (e.g., environment setup or Python code errors).
-
reproducibility-failed (New): The model has a working environment and executes successfully, but the output does not meet the required reproducibility threshold.
-
passed: The model passes all checks, including reproducibility.
This change would help identify models that are "functional" even if they aren't perfectly reproducible.
For BioImage.IO: The inference test CI is currently used as a temporary workaround to identify models work in the test run functionality.
It would be beneficial to introduce a new statuslike
reproducibility-failedto ?bioimageio.core.test_model(). Currently, models that pass the initial metadata check but fail later tests are categorized under the same broad statusvalid-format`.Adding this intermediate status would allow us to differentiate between two distinct failure cases:
valid-format: The model passes metadata checks but fails core execution (e.g., environment setup or Python code errors).reproducibility-failed(New): The model has a working environment and executes successfully, but the output does not meet the required reproducibility threshold.passed: The model passes all checks, including reproducibility.This change would help identify models that are "functional" even if they aren't perfectly reproducible.
For BioImage.IO: The inference test CI is currently used as a temporary workaround to identify models work in the test run functionality.