Skip to content

Modernization, speed, memory usage, and flexibility improvements#188

Merged
rvandewater merged 121 commits into
masterfrom
development
Jun 8, 2026
Merged

Modernization, speed, memory usage, and flexibility improvements#188
rvandewater merged 121 commits into
masterfrom
development

Conversation

@rvandewater

@rvandewater rvandewater commented Jun 3, 2026

Copy link
Copy Markdown
Owner

Summary by CodeRabbit

  • New Features

    • Added automated CI/CD workflows for code quality checks and E2E integration testing.
    • Introduced icu-benchmarks CLI entry point for streamlined command execution.
    • Added new SLURM cluster job submission scripts for batch processing.
  • Build & Installation

    • Migrated project to modern pyproject.toml-based build system with uv package manager.
    • Explicit Python 3.10+ support with improved dependency management.
    • Removed legacy setup.py and environment configuration files.
  • Documentation

    • Updated README with refreshed installation and usage guidance.
    • Added comprehensive CONTRIBUTING.md with development setup instructions.
    • Updated PAPER.md with new experiment framework and training workflows.
  • Infrastructure

    • Added pre-commit hooks for automated code quality and formatting checks.
    • Enhanced data processing with Polars as primary backend for improved performance.
    • Improved type annotations throughout codebase for better code reliability.

@coderabbitai

coderabbitai Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch development

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 17

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
docs/adding_model/instructions.md (1)

20-27: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix shell continuation examples to be copy-paste safe.

The command examples use \ with inline comments on the same line, which can break shell continuation when users copy/paste. Move comments to separate lines (or remove them) so the command executes as documented.

Proposed doc fix
-icu-benchmarks train \
-    -d demo_data/mortality24/mimic_demo \ # Insert cohort dataset here
+icu-benchmarks train \
+    # Insert cohort dataset here
+    -d demo_data/mortality24/mimic_demo \
     -n mimic_demo \
-    -t BinaryClassification \ # Insert task name here
+    # Insert task name here
+    -t BinaryClassification \
     -tn Mortality24 \
     --log-dir ../yaib_logs/ \
-    -m RNN \ # Insert model here
+    # Insert model here
+    -m RNN \

Also applies to: 200-207

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/adding_model/instructions.md` around lines 20 - 27, The shell example
using the multi-line "icu-benchmarks train \" block contains inline comments
after backslashes which break copy/paste; update the example so comments are on
separate lines (or removed) and ensure each line that ends with a backslash
contains no trailing text or comment and no trailing spaces so the continuation
works. Specifically edit the example lines containing the command
"icu-benchmarks train" and its flags (-d, -n, -t, -tn, --log-dir, -m, -s) to
move explanatory notes to their own lines above/below the command or delete
them, and mirror the same fix for the other occurrence mentioned (lines
200-207).
🟡 Minor comments (8)
experiments/experiment_submission.sh-2-3 (1)

2-3: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add input validation for required positional arguments.

The script uses $1 and $2 without validation. If called without arguments, the script will fail with unclear errors or create incorrectly named jobs.

🛡️ Proposed fix to add validation
 #!/bin/bash
+if [ $# -ne 2 ]; then
+  echo "Usage: $0 TASK_NAME MODEL_NAME"
+  exit 1
+fi
 export TASK_NAME=$1
 export MODEL_NAME=$2
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@experiments/experiment_submission.sh` around lines 2 - 3, The script assigns
TASK_NAME and MODEL_NAME from positional args without checks; add validation at
the start of experiment_submission.sh to ensure both $1 and $2 are provided: if
either TASK_NAME or MODEL_NAME is empty, print a clear usage/error message and
exit with a non-zero status. Update references to TASK_NAME and MODEL_NAME to
rely on these validated variables so the script fails fast with a helpful
message when arguments are missing.
docs/imputation_methods.md-33-34 (1)

33-34: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Correct the config directory path in the docs.

This points to configs/imputation, but the repo’s config layout in this PR uses configs/imputation_models. Please align the path to avoid broken onboarding instructions.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/imputation_methods.md` around lines 33 - 34, Update the docs string in
docs/imputation_methods.md to point to the actual config directory used in this
PR: replace the reference to "configs/imputation" with
"configs/imputation_models" (so the sentence reads that you must create a gin
file named "newmethod.gin" in configs/imputation_models to match the name used
in the gin.configurable decorator).
README.md-283-283 (1)

283-283: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix case of CONTRIBUTING link target.

CONTRIBUTING.MD does not match the actual file name CONTRIBUTING.md.

Suggested fix
-We appreciate contributions to the project. Please read the [contribution guidelines](CONTRIBUTING.MD) before submitting a pull request.
+We appreciate contributions to the project. Please read the [contribution guidelines](CONTRIBUTING.md) before submitting a pull request.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@README.md` at line 283, Update the markdown link target in README: replace
the incorrect case "[contribution guidelines](CONTRIBUTING.MD)" with the
correctly-cased filename "[contribution guidelines](CONTRIBUTING.md)" so the
link points to the actual CONTRIBUTING.md file; ensure the link text remains the
same and only the URI's case is changed.
README.md-9-9 (1)

9-9: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Python badge link target is inconsistent with displayed version.

Badge says Python 3.12 but link points to the Python 3.10.0 release page.

Suggested fix
-[![python](https://img.shields.io/badge/-Python_3.12-blue?logo=python&logoColor=white)](https://www.python.org/downloads/release/python-3100/)
+[![python](https://img.shields.io/badge/-Python_3.12-blue?logo=python&logoColor=white)](https://www.python.org/downloads/release/python-3120/)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@README.md` at line 9, The Python badge markdown "[![python](...)](...)" shows
3.12 but links to the 3.10 release; make them consistent by either changing the
link target to the Python 3.12 release page (e.g., update the URL in the second
parentheses to the 3.12 release) or change the badge label to 3.10 so the
displayed version matches the link; update the badge markdown accordingly.
PAPER.md-29-34 (1)

29-34: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Align documented Python version with current project baseline.

This file documents Python 3.10 while the updated project setup elsewhere uses 3.12.

Suggested fix
-Then install Python 3.10, pin it for the repository, and sync the environment:
+Then install Python 3.12, pin it for the repository, and sync the environment:
@@
-uv python install 3.10
-uv python pin 3.10
+uv python install 3.12
+uv python pin 3.12
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@PAPER.md` around lines 29 - 34, Update the documented Python version from
3.10 to 3.12 in the instructions that show the uv commands: change the examples
for `uv python install 3.10` and `uv python pin 3.10` to use 3.12 (keep `uv
sync` the same) so the PAPER.md matches the project's current Python baseline.
.github/workflows/e2e-integration.yml-52-54 (1)

52-54: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix mismatched path in failure message.

The check validates yaib_logs/mimic_demo, but the error message says ../yaib_logs/....

Suggested fix
-            echo "ERROR: Expected log directory ../yaib_logs/mimic_demo was not created."
+            echo "ERROR: Expected log directory yaib_logs/mimic_demo was not created."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/e2e-integration.yml around lines 52 - 54, The failure
message checks for the directory "yaib_logs/mimic_demo" but prints a mismatched
"../yaib_logs/mimic_demo"; update the echo in the if block so the error text
references the same path string used in the condition (i.e.,
"yaib_logs/mimic_demo") — locate the echo line that currently prints
"../yaib_logs/mimic_demo" and change it to match the checked path.
icu_benchmarks/run_utils.py-66-66 (1)

66-66: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix typo in CLI help text.

verbosly should be verbosely in the --verbose help string.

Suggested fix
-        help="Set to log verbosly. Disable for clean logs.",
+        help="Set to log verbosely. Disable for clean logs.",
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@icu_benchmarks/run_utils.py` at line 66, Typo in the CLI help string for the
--verbose argument: change "verbosly" to "verbosely" in the add_argument call
that defines the --verbose flag (the help parameter on the
parser.add_argument/ArgumentParser entry for "--verbose"); update the help text
to "Set to log verbosely. Disable for clean logs." so the CLI message is
correct.
icu_benchmarks/data/loader.py-230-241 (1)

230-241: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Emit the deprecation warning only once.

The same DeprecationWarning is raised twice back-to-back, which creates duplicate noise for callers.

Proposed fix
         warnings.warn(
             "CommonPandasDataset is deprecated. Use CommonPolarsDataset instead.",
             DeprecationWarning,
             stacklevel=2,
         )
         if not isinstance(vars, dict):
             raise ValueError(f"Expected vars to be of type dict, got {type(vars)} instead")
-        warnings.warn(
-            "CommonPandasDataset is deprecated. Use CommonPolarsDataset instead.",
-            DeprecationWarning,
-            stacklevel=2,
-        )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@icu_benchmarks/data/loader.py` around lines 230 - 241, The duplicate
DeprecationWarning for CommonPandasDataset should be emitted only once: remove
the second warnings.warn call and ensure the single warning stays in the code
path that executes before any validation (the block surrounding the
CommonPandasDataset handling in loader.py); keep the isinstance(vars, dict)
check and the ValueError as-is but do not repeat warnings.warn for
CommonPandasDataset a second time so callers only see one deprecation message.
🧹 Nitpick comments (5)
configs/tasks/BinaryClassificationNoDynamic.gin (2)

1-50: ⚡ Quick win

Clarify the filename: "NoDynamic" is misleading.

The filename suggests dynamic features are excluded, but the config includes dynamic data mappings (line 17) and defines 42 dynamic features (lines 26-31). The "NoDynamic" appears to mean "no feature generation from dynamic data" (line 42: generate_features = False), not "no dynamic features."

Consider renaming to BinaryClassificationNoGeneratedFeatures.gin or adding a clarifying comment at the top.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@configs/tasks/BinaryClassificationNoDynamic.gin` around lines 1 - 50, The
config filename "NoDynamic" is misleading because the file still maps and uses
dynamic features (vars DYNAMIC, preprocess.file_names) while only disabling
feature generation via default_preprocessor.generate_features = False; update
the repo by either renaming the config to something like
BinaryClassificationNoGeneratedFeatures.gin (or
BinaryClassification_NoFeatureGeneration.gin) or add a clarifying top-of-file
comment explaining that dynamic variables (vars["DYNAMIC"],
preprocess.file_names) are present but feature generation is disabled via
default_preprocessor.generate_features = False; ensure any documentation or CI
references to the old name are updated if you rename.

16-33: ⚖️ Poor tradeoff

Consider extracting common config to reduce duplication.

The file_names mapping (lines 16-20) and vars structure (lines 22-33) are duplicated in BinaryClassificationNoStatic.gin. Extracting these to a shared base config (e.g., common/BinaryClassificationBase.gin) would improve maintainability.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@configs/tasks/BinaryClassificationNoDynamic.gin` around lines 16 - 33, Create
a new shared config (e.g., common/BinaryClassificationBase.gin) that contains
the duplicated preprocess.file_names mapping and vars dictionary, then remove
those blocks from BinaryClassificationNoDynamic.gin and
BinaryClassificationNoStatic.gin and include or import the new common config at
the top of both files; specifically extract the preprocess.file_names and vars
symbols (including keys "DYNAMIC"/"OUTCOME"/"STATIC", GROUP, LABEL, SEQUENCE,
and the DYNAMIC/STATIC feature lists) into BinaryClassificationBase.gin so both
configs reference the same definitions.
configs/tasks/BinaryClassificationNoStatic.gin (1)

32-34: 💤 Low value

Question: Why define static features when they're disabled?

The config defines static features (line 32, line 19) but immediately disables them via static_features = False (line 34). While this provides a template for easy toggling, it adds unused configuration. Consider either:

  • Omitting static feature definitions entirely, or
  • Adding a comment explaining they're defined as a template for easy enabling.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@configs/tasks/BinaryClassificationNoStatic.gin` around lines 32 - 34, The
config defines a STATIC list and immediately disables it via static_features =
False; either remove the STATIC key/value entirely to avoid unused config or
keep it but add a one-line comment above STATIC explaining it's a template for
easy toggling (e.g., "template for enabling static features; set static_features
= True to use"); update the block containing "STATIC" and the static_features
variable accordingly and ensure any references to STATIC elsewhere are reviewed
before removal.
experiments/slurm_base_features.sh (1)

14-14: 💤 Low value

Consider parameterizing the hard-coded path.

The absolute path /dhc/home/robin.vandewater/projects/yaib_updated is user-specific. For broader reusability, consider making this an environment variable or documenting that users must update this path.

♻️ Suggested approach
-YAIB_PATH=/dhc/home/robin.vandewater/projects/yaib_updated `#/dhc/home/robin.vandewater/projects/yaib`
+YAIB_PATH=${YAIB_PATH:-/dhc/home/robin.vandewater/projects/yaib_updated}

This allows users to override via environment variable while keeping a default.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@experiments/slurm_base_features.sh` at line 14, Replace the hard-coded
YAIB_PATH assignment with a parameterized approach: read YAIB_PATH from the
environment if set (fall back to the current default value), and update
references that use YAIB_PATH accordingly; specifically change the YAIB_PATH
variable initialization in experiments/slurm_base_features.sh so it prefers an
existing environment variable (e.g., $YAIB_PATH) and documents in a comment that
users can override it, ensuring scripts that rely on YAIB_PATH keep the same
variable name.
icu_benchmarks/models/__init__.py (1)

55-78: 💤 Low value

Consider exporting the type aliases in __all__.

The type aliases DLModel, MLModelClassifier, and MLModelRegression are not included in __all__, which means they won't be accessible via from icu_benchmarks.models import * or auto-complete in some IDEs. If these type aliases are intended for public use in type hints elsewhere in the codebase, consider adding them.

📝 Proposed addition
 __all__ = [
     "GRUNet",
     "RNNet",
     "LSTMNet",
     "TemporalConvNet",
     "BaseTransformer",
     "Transformer",
     "LocalTransformer",
     "CBClassifier",
     "RUSBClassifier",
     "BRFClassifier",
     "LGBMClassifier",
     "LGBMRegressor",
     "XGBClassifier",
     "LogisticRegression",
     "LinearRegression",
     "ElasticNet",
     "RFClassifier",
     "SVMClassifier",
     "SVMRegressor",
     "MLPRegressor",
     "MLPClassifier",
     "PerceptronClassifier",
+    "DLModel",
+    "MLModelClassifier",
+    "MLModelRegression",
 ]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@icu_benchmarks/models/__init__.py` around lines 55 - 78, The __all__ list is
missing the public type aliases DLModel, MLModelClassifier, and
MLModelRegression; update the module export list to include these names so they
are exported on from icu_benchmarks.models import * and show up in IDE
autocompletion—locate the __all__ definition in this file (the list containing
"GRUNet", "RNNet", etc.) and add "DLModel", "MLModelClassifier", and
"MLModelRegression" to that list.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/code-quality.yml:
- Around line 5-7: The workflow triggers list currently includes branches:
[main, "release/*", "development"] in both the push and pull_request blocks;
update both arrays to also include "master" so CI runs for pushes and PRs
targeting master (e.g., change branches to [main, master, "release/*",
"development"] in the push and pull_request sections).

In @.github/workflows/e2e-integration.yml:
- Around line 18-32: The "Set up Python" step is hardcoded to install 3.12;
update it to use the matrix value so each job uses its intended Python version.
Change the run command in the "Set up Python" step (currently "uv python install
3.12") to reference the matrix variable (e.g. use ${{ matrix.python-version }})
and keep the existing python-version matrix definition so the matrix entries
("python-version") drive the installed version.

In @.github/workflows/python-build.yml:
- Around line 9-10: The workflow tag filter currently uses both "v*" and
"[0-9]+.[0-9]+.[0-9]+", which makes semver enforcement ineffective because "v*"
already matches all v-prefixed tags and the numeric pattern lacks the leading
"v"; update the tag pattern to require a leading v (e.g. replace the numeric
entry with v[0-9]+.[0-9]+.[0-9]+) and remove the broad "v*" entry so only
v-prefixed semver tags trigger the job, or alternatively keep "v*" removed and
enforce stricter semver validation inside the job conditional/script (update the
tag filter entries accordingly in the workflow file).

In @.pre-commit-config.yaml:
- Line 4: The global YAML key exclude currently set to '\.ipynb$' prevents the
nbstripout hook from running; remove or narrow this global exclude and instead
apply any exclusion at the specific hook level so the nbstripout hook (id:
nbstripout) can run; update the global exclude entry and any duplicate
occurrences (the same pattern around lines 80-84) so that notebooks are not
globally excluded, or add a per-hook exclude under the nbstripout hook to
preserve intended exceptions without disabling the hook entirely.

In `@experiments/slurm_base_template.sh`:
- Line 23: Fix the typo on the line that prints the task type: replace the
misspelled command `echi "Task type:" ${TASK}` with the correct shell builtin
`echo` so the script prints the TASK variable properly (look for the line
containing the `echi` invocation that references `${TASK}`).

In `@icu_benchmarks/data/loader.py`:
- Around line 454-461: The code assumes grouping_column is always set but the
signature allows grouping_column: Optional[str] so update the selection and
subsequent groupby usage to handle None: when building the columns list for
self.dyn_df (the block that currently does self.dyn_df =
self.dyn_df[list(select_columns) + [grouping_column]]), only append
grouping_column if grouping_column is not None and exists in the dataframe;
likewise, guard any later self.dyn_df.groupby(grouping_column) or references to
grouping_column (lines ~463-470) so that if grouping_column is None you skip the
groupby path and use the non-grouped logic or a suitable aggregation, avoiding
any use of None as a column name. Ensure you check presence in
self.dyn_df.columns before selecting.

In `@icu_benchmarks/data/preprocessor.py`:
- Around line 171-173: The three calls that run .unique() on
data[DataSplit.*][DataSegment.features] after splitting (the lines using
DataSplit.train/val/test and DataSegment.features) can drop feature rows
independently and silently misalign features and outcomes; remove these
post-split .unique() calls and instead perform deduplication before splitting or
deduplicate features and outcomes together while preserving row alignment (e.g.,
dedupe on the combined frame or apply the same mask/index to both
DataSegment.features and DataSegment.outcomes). Update the preprocessing flow
where the variable data is prepared so deduplication happens once in a place
that affects all relevant segments consistently rather than only on
DataSegment.features after split.
- Around line 241-242: The StepImputeModel is being constructed with the wrong
callable reference: currently passing self.model_impute while the actual
implemented method is _model_impute; update the constructor calls (e.g., the
occurrence in the dynamic segment where StepImputeModel(model=self.model_impute,
sel=...) is added) to pass the real callable self._model_impute instead, and
make the same replacement for the other occurrences noted (around the blocks at
463-466 and 495-496) so the Pandas preprocessor path invokes the implemented
_model_impute function.

In `@icu_benchmarks/data/split_process_data.py`:
- Around line 161-165: The modality filtering result is assigned to local
variable data but subsequent preprocessing still uses sanitized_data, so the
selection is ignored; call modality_selection and then propagate its results by
assigning back to sanitized_data (and update vars) so later steps operate on the
filtered dataset—i.e., after modality_selection(sanitized_data,
modality_mapping, selected_modalities, vars) set sanitized_data = data (and keep
vars as returned) instead of leaving sanitized_data unchanged.
- Around line 216-218: The call to caching(...) inside the `if generate_cache:`
branch incorrectly passes `load_cache` as the write flag so generated caches are
never persisted; change the argument passed to the caching function (the write
gate) to use `generate_cache` (or an explicit True) instead of `load_cache` so
that when `generate_cache` is true the function actually writes the cache for
`cache_dir`, `cache_file`, and `sanitized_data`.
- Around line 103-106: The current code replaces vars[VarType.label] with a
single-element list when an explicit label is provided, which changes its type
for downstream code expecting a scalar; update the branch in
split_process_data.py so that when label is not None you assign
vars[VarType.label] = label (a scalar) instead of [label], keeping the existing
behavior when label is None and the original list-handling intact (use the
existing checks around vars, VarType.label and the local variable label to
locate and change the assignment).
- Around line 133-135: The hookup of the pretrained imputation model is
constrained to PandasClassificationPreprocessor, causing Polars or other
preprocessors to miss it; update the logic in split_process_data.py to call
set_imputation_model on preprocessor_instance more generally (e.g., if
hasattr(preprocessor_instance, "set_imputation_model") or by including
PolarsClassificationPreprocessor in the isinstance check) so that any
preprocessor implementation exposing set_imputation_model receives
pretrained_imputation_model; ensure you reference preprocessor_instance and the
set_imputation_model method when changing the conditional.

In `@icu_benchmarks/imputation/amputations.py`:
- Around line 222-223: The mask initialization uses torch.zeros_like(data) which
fails if data is a DataFrame-like object; change the initialization to create
the mask from the tensor used for imputation (e.g. X) instead. Specifically,
replace the torch.zeros_like(data) call with torch.zeros_like(X) (or
torch.zeros(X.shape, dtype=X.dtype, device=X.device)) so the mask variable
aligns with downstream calls like data.mask(mask) and any branching on mechanism
(e.g. MAR) works without a TypeError; update the place where mask is defined
(mask = ...) in the amputation logic accordingly.

In `@icu_benchmarks/imputation/diffwave.py`:
- Around line 387-393: The residual stack currently reinitializes h and runs the
inner loop twice, causing duplicated computation and double-counting skip_n; to
fix, initialize h = noise once before a single loop over
range(self.num_res_layers) and inside that loop call h, skip_n =
self.residual_blocks[n]((h, conditional, diffusion_step_embed)) then accumulate
skip += skip_n only once, removing the outer loop and the extra skip += skip_n
after the loop so the residual blocks execute sequentially and skip values are
counted correctly.

In `@icu_benchmarks/models/train.py`:
- Around line 101-103: The fallback dataset_names = {} can cause KeyError when
code later indexes dataset_names["train"], ["val"], or ["test"]; update the
initialization to ensure those keys exist (e.g., set dataset_names = {"train":
None, "val": None, "test": None}) or change all direct indexing to safe lookups
(dataset_names.get("train") / .get("val") / .get("test") with sensible
defaults). Locate and fix occurrences of the variable dataset_names in this file
(the initial block shown plus the other uses around the sections noted at
116-117 and 216) to either initialize required keys or replace direct indexing
with .get to avoid KeyError.
- Around line 254-257: The code writes SHAP test values to
"shap_values_test.parquet" but logs and downstream aggregation expect
"test_shap_values.parquet"; update the write and log to use the same filename
that the aggregator expects (use "test_shap_values.parquet") — modify the block
that opens (log_dir / "shap_values_test.parquet") and the logging.info call so
both reference (log_dir / "test_shap_values.parquet") and ensure this aligns
with how aggregation reads the file; keep references to shaps_test.write_parquet
and the trainer.lightning_module.train_shap_values check unchanged.

In `@pyproject.toml`:
- Around line 32-60: Update pyproject.toml to resolve the version mismatches
with requirements.txt by aligning the dependency pins: change "numpy==1.26.0" to
"numpy==1.24.3", add an explicit pin for "matplotlib==3.7.1", change
"pandas>=2.2.2" to "pandas==2.2.2", change "polars>=1.9.0" to "polars==1.9.0",
change "wandb>=0.17.3" to "wandb==0.17.3", and change "tensorboard>=2.12.2" to
"tensorboard==2.12.2"; alternatively if you intend pyproject.toml to be
authoritative, remove or update requirements.txt so both sources match (ensure
consistency for numpy, matplotlib, pandas, polars, wandb, tensorboard).

---

Outside diff comments:
In `@docs/adding_model/instructions.md`:
- Around line 20-27: The shell example using the multi-line "icu-benchmarks
train \" block contains inline comments after backslashes which break
copy/paste; update the example so comments are on separate lines (or removed)
and ensure each line that ends with a backslash contains no trailing text or
comment and no trailing spaces so the continuation works. Specifically edit the
example lines containing the command "icu-benchmarks train" and its flags (-d,
-n, -t, -tn, --log-dir, -m, -s) to move explanatory notes to their own lines
above/below the command or delete them, and mirror the same fix for the other
occurrence mentioned (lines 200-207).

---

Minor comments:
In @.github/workflows/e2e-integration.yml:
- Around line 52-54: The failure message checks for the directory
"yaib_logs/mimic_demo" but prints a mismatched "../yaib_logs/mimic_demo"; update
the echo in the if block so the error text references the same path string used
in the condition (i.e., "yaib_logs/mimic_demo") — locate the echo line that
currently prints "../yaib_logs/mimic_demo" and change it to match the checked
path.

In `@docs/imputation_methods.md`:
- Around line 33-34: Update the docs string in docs/imputation_methods.md to
point to the actual config directory used in this PR: replace the reference to
"configs/imputation" with "configs/imputation_models" (so the sentence reads
that you must create a gin file named "newmethod.gin" in
configs/imputation_models to match the name used in the gin.configurable
decorator).

In `@experiments/experiment_submission.sh`:
- Around line 2-3: The script assigns TASK_NAME and MODEL_NAME from positional
args without checks; add validation at the start of experiment_submission.sh to
ensure both $1 and $2 are provided: if either TASK_NAME or MODEL_NAME is empty,
print a clear usage/error message and exit with a non-zero status. Update
references to TASK_NAME and MODEL_NAME to rely on these validated variables so
the script fails fast with a helpful message when arguments are missing.

In `@icu_benchmarks/data/loader.py`:
- Around line 230-241: The duplicate DeprecationWarning for CommonPandasDataset
should be emitted only once: remove the second warnings.warn call and ensure the
single warning stays in the code path that executes before any validation (the
block surrounding the CommonPandasDataset handling in loader.py); keep the
isinstance(vars, dict) check and the ValueError as-is but do not repeat
warnings.warn for CommonPandasDataset a second time so callers only see one
deprecation message.

In `@icu_benchmarks/run_utils.py`:
- Line 66: Typo in the CLI help string for the --verbose argument: change
"verbosly" to "verbosely" in the add_argument call that defines the --verbose
flag (the help parameter on the parser.add_argument/ArgumentParser entry for
"--verbose"); update the help text to "Set to log verbosely. Disable for clean
logs." so the CLI message is correct.

In `@PAPER.md`:
- Around line 29-34: Update the documented Python version from 3.10 to 3.12 in
the instructions that show the uv commands: change the examples for `uv python
install 3.10` and `uv python pin 3.10` to use 3.12 (keep `uv sync` the same) so
the PAPER.md matches the project's current Python baseline.

In `@README.md`:
- Line 283: Update the markdown link target in README: replace the incorrect
case "[contribution guidelines](CONTRIBUTING.MD)" with the correctly-cased
filename "[contribution guidelines](CONTRIBUTING.md)" so the link points to the
actual CONTRIBUTING.md file; ensure the link text remains the same and only the
URI's case is changed.
- Line 9: The Python badge markdown "[![python](...)](...)" shows 3.12 but links
to the 3.10 release; make them consistent by either changing the link target to
the Python 3.12 release page (e.g., update the URL in the second parentheses to
the 3.12 release) or change the badge label to 3.10 so the displayed version
matches the link; update the badge markdown accordingly.

---

Nitpick comments:
In `@configs/tasks/BinaryClassificationNoDynamic.gin`:
- Around line 1-50: The config filename "NoDynamic" is misleading because the
file still maps and uses dynamic features (vars DYNAMIC, preprocess.file_names)
while only disabling feature generation via
default_preprocessor.generate_features = False; update the repo by either
renaming the config to something like
BinaryClassificationNoGeneratedFeatures.gin (or
BinaryClassification_NoFeatureGeneration.gin) or add a clarifying top-of-file
comment explaining that dynamic variables (vars["DYNAMIC"],
preprocess.file_names) are present but feature generation is disabled via
default_preprocessor.generate_features = False; ensure any documentation or CI
references to the old name are updated if you rename.
- Around line 16-33: Create a new shared config (e.g.,
common/BinaryClassificationBase.gin) that contains the duplicated
preprocess.file_names mapping and vars dictionary, then remove those blocks from
BinaryClassificationNoDynamic.gin and BinaryClassificationNoStatic.gin and
include or import the new common config at the top of both files; specifically
extract the preprocess.file_names and vars symbols (including keys
"DYNAMIC"/"OUTCOME"/"STATIC", GROUP, LABEL, SEQUENCE, and the DYNAMIC/STATIC
feature lists) into BinaryClassificationBase.gin so both configs reference the
same definitions.

In `@configs/tasks/BinaryClassificationNoStatic.gin`:
- Around line 32-34: The config defines a STATIC list and immediately disables
it via static_features = False; either remove the STATIC key/value entirely to
avoid unused config or keep it but add a one-line comment above STATIC
explaining it's a template for easy toggling (e.g., "template for enabling
static features; set static_features = True to use"); update the block
containing "STATIC" and the static_features variable accordingly and ensure any
references to STATIC elsewhere are reviewed before removal.

In `@experiments/slurm_base_features.sh`:
- Line 14: Replace the hard-coded YAIB_PATH assignment with a parameterized
approach: read YAIB_PATH from the environment if set (fall back to the current
default value), and update references that use YAIB_PATH accordingly;
specifically change the YAIB_PATH variable initialization in
experiments/slurm_base_features.sh so it prefers an existing environment
variable (e.g., $YAIB_PATH) and documents in a comment that users can override
it, ensuring scripts that rely on YAIB_PATH keep the same variable name.

In `@icu_benchmarks/models/__init__.py`:
- Around line 55-78: The __all__ list is missing the public type aliases
DLModel, MLModelClassifier, and MLModelRegression; update the module export list
to include these names so they are exported on from icu_benchmarks.models import
* and show up in IDE autocompletion—locate the __all__ definition in this file
(the list containing "GRUNet", "RNNet", etc.) and add "DLModel",
"MLModelClassifier", and "MLModelRegression" to that list.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 7aca635c-af4a-469e-b5c9-e1a38d32161c

📥 Commits

Reviewing files that changed from the base of the PR and between 5333dae and a69c586.

⛔ Files ignored due to path filters (1)
  • docs/figures/yaib_flow_combined.svg is excluded by !**/*.svg
📒 Files selected for processing (100)
  • .github/workflows/ci.yml
  • .github/workflows/code-quality.yml
  • .github/workflows/e2e-integration.yml
  • .github/workflows/python-build.yml
  • .gitignore
  • .pre-commit-config.yaml
  • CONTRIBUTING.MD
  • CONTRIBUTING.md
  • LICENSE
  • PAPER.md
  • README.md
  • configs/imputation_models/BRNN.gin
  • configs/imputation_models/Diffusion.gin
  • configs/imputation_models/MICE.gin
  • configs/imputation_models/NP.gin
  • configs/imputation_models/RNN.gin
  • configs/imputation_models/SSSDSA.gin
  • configs/models/LogisticRegressionDebug.gin
  • configs/prediction_models/BRFClassifier.gin
  • configs/prediction_models/CBClassifier.gin
  • configs/prediction_models/GRU.gin
  • configs/prediction_models/LSTM.gin
  • configs/prediction_models/LogisticRegression.gin
  • configs/prediction_models/RFClassifier.gin
  • configs/prediction_models/RUSBClassifier.gin
  • configs/prediction_models/Transformer.gin
  • configs/prediction_models/XGBClassifier.gin
  • configs/prediction_models/common/DLCommon.gin
  • configs/prediction_models/common/DLTuning.gin
  • configs/prediction_models/common/MLCommon.gin
  • configs/prediction_models/common/MLTuning.gin
  • configs/tasks/BinaryClassification.gin
  • configs/tasks/BinaryClassificationNoDynamic.gin
  • configs/tasks/BinaryClassificationNoStatic.gin
  • configs/tasks/DatasetImputation.gin
  • configs/tasks/Regression.gin
  • configs/tasks/common/CrossValidation.gin
  • configs/tasks/common/Dataloader.gin
  • configs/tasks/common/Imports.gin
  • configs/tasks/common/PredictionTaskVariables.gin
  • docs/adding_model/instructions.md
  • docs/adding_model/rnn.py
  • docs/development.md
  • docs/imputation_methods.md
  • docs/wanddb.md
  • environment.yml
  • experiments/benchmark_classification.yml
  • experiments/benchmark_regression.yml
  • experiments/demo_benchmark_classification.yml
  • experiments/demo_benchmark_regression.yml
  • experiments/experiment_eval_pooled.yml
  • experiments/experiment_full_training.yml
  • experiments/experiment_submission.sh
  • experiments/slurm_base_features.sh
  • experiments/slurm_base_template.sh
  • icu_benchmarks/__init__.py
  • icu_benchmarks/cross_validation.py
  • icu_benchmarks/data/loader.py
  • icu_benchmarks/data/pooling.py
  • icu_benchmarks/data/preprocessor.py
  • icu_benchmarks/data/split_process_data.py
  • icu_benchmarks/imputation/amputations.py
  • icu_benchmarks/imputation/baselines.py
  • icu_benchmarks/imputation/csdi.py
  • icu_benchmarks/imputation/diffusion.py
  • icu_benchmarks/imputation/diffwave.py
  • icu_benchmarks/imputation/layers/s4layer.py
  • icu_benchmarks/imputation/mlp.py
  • icu_benchmarks/imputation/np.py
  • icu_benchmarks/imputation/rnn.py
  • icu_benchmarks/imputation/simple_diffusion.py
  • icu_benchmarks/imputation/sssds4.py
  • icu_benchmarks/imputation/sssdsa.py
  • icu_benchmarks/models/__init__.py
  • icu_benchmarks/models/constants.py
  • icu_benchmarks/models/custom_metrics.py
  • icu_benchmarks/models/dl_models/layers.py
  • icu_benchmarks/models/dl_models/rnn.py
  • icu_benchmarks/models/dl_models/tcn.py
  • icu_benchmarks/models/dl_models/transformer.py
  • icu_benchmarks/models/ml_models/lgbm.py
  • icu_benchmarks/models/ml_models/sklearn.py
  • icu_benchmarks/models/train.py
  • icu_benchmarks/models/utils.py
  • icu_benchmarks/models/wrappers.py
  • icu_benchmarks/run.py
  • icu_benchmarks/run_utils.py
  • icu_benchmarks/tuning/hyperparameters.py
  • pyproject.toml
  • requirements.txt
  • scripts/evaluate_results/aggregrate_experiment.py
  • scripts/plotting/plotting.py
  • scripts/sample_usage/random_search/LGBM.sh
  • scripts/sample_usage/random_search/LSTM.sh
  • scripts/sweep_configs/01_experiment_1.yaml
  • scripts/sweep_configs/hyperparameter_sweeps/brnn_sweep.yml
  • scripts/sweep_configs/hyperparameter_sweeps/np_sweep.yml
  • scripts/sweep_configs/hyperparameter_sweeps/saits_sweep.yml
  • setup.cfg
  • setup.py
💤 Files with no reviewable changes (16)
  • .github/workflows/ci.yml
  • CONTRIBUTING.MD
  • LICENSE
  • environment.yml
  • configs/prediction_models/LogisticRegression.gin
  • setup.py
  • configs/prediction_models/RUSBClassifier.gin
  • configs/prediction_models/GRU.gin
  • setup.cfg
  • configs/tasks/Regression.gin
  • configs/tasks/BinaryClassification.gin
  • configs/prediction_models/RFClassifier.gin
  • configs/tasks/DatasetImputation.gin
  • configs/prediction_models/BRFClassifier.gin
  • configs/prediction_models/LSTM.gin
  • configs/prediction_models/Transformer.gin

Comment thread .github/workflows/code-quality.yml Outdated
Comment thread .github/workflows/e2e-integration.yml Outdated
Comment thread .github/workflows/python-build.yml Outdated
Comment thread .pre-commit-config.yaml
Comment thread experiments/slurm_base_template.sh Outdated
Comment thread icu_benchmarks/imputation/amputations.py Outdated
Comment thread icu_benchmarks/imputation/diffwave.py
Comment thread icu_benchmarks/models/train.py
Comment thread icu_benchmarks/models/train.py Outdated
Comment thread pyproject.toml
Fix duplicated nested loop and state reset in residual stack.

Lines 388-393 currently re-run the residual stack inside a nested loop, reset h to noise each outer iteration, and double-count skip_n. This breaks the intended iterative residual flow.
random feedback improvements:



    Bug Fixes
        Fixed a SLURM script syntax error and corrected a CLI help typo.

    Chores
        CI workflows now run push/pull_request on more branches; E2E tests adjusted to run on selected Python versions and updated setup/log message; removed pinned dependencies from requirements; loosened several dev/runtime dependency pins.

    Documentation
        Updated Python version badge and contribution link in the README.

    Refactor
        Adjusted imputation mask initialization, training dataset defaults and SHAP output name, residual block iteration behavior, and minor dataset init cleanup.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
.github/workflows/e2e-integration.yml (1)

59-65: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Artifact name collision across matrix jobs.

All matrix jobs will try to upload to the same artifact name yaib-e2e-logs, causing overwrites or failures with actions/upload-artifact@v4. Include the Python version in the artifact name.

🐛 Proposed fix
       - name: Upload run logs as artifact
         if: always()
         uses: actions/upload-artifact@v4
         with:
-          name: yaib-e2e-logs
+          name: yaib-e2e-logs-py${{ matrix.python-version }}
           path: yaib_logs/
           retention-days: 7
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/e2e-integration.yml around lines 59 - 65, The artifact
name is colliding across matrix jobs because every job uses the fixed name
"yaib-e2e-logs"; change the upload step (the actions/upload-artifact@v4
invocation) to include the matrix Python version in the artifact name (e.g.,
incorporate matrix.python-version) so each matrix job uploads a unique artifact
name; update the "name" field for the artifact upload to interpolate the Python
version (using the workflow expression for matrix.python-version) to avoid
overwrites and collisions.
icu_benchmarks/models/train.py (1)

117-122: ⚠️ Potential issue | 🔴 Critical

Fix type mismatch: assure_minimum_length is passed a dataset object, not a pl.DataFrame (train.py:117-122)

assure_minimum_length is annotated as assure_minimum_length(dataset: pl.DataFrame) -> pl.DataFrame and calls pl.concat([dataset, dataset]) when len(dataset) < 2. At lines 117-122, train_dataset/val_dataset are instantiated from dataset_class (for polars=True, this is PredictionPolarsDataset, which is a torch.utils.data.Dataset, not a Polars DataFrame). This will raise at runtime when the split has fewer than 2 samples, and the function’s contract doesn’t match its call sites.

Update either assure_minimum_length to support dataset objects, or pass the underlying Polars DataFrame that pl.concat expects.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@icu_benchmarks/models/train.py` around lines 117 - 122, assure_minimum_length
currently expects a polars DataFrame but is being called with dataset objects
(train_dataset / val_dataset, e.g. PredictionPolarsDataset), causing a type
mismatch; update assure_minimum_length to accept both polars.DataFrame and
torch.utils.data.Dataset by adding a type union in its signature and branching:
if the input is a torch.utils.data.Dataset use
torch.utils.data.ConcatDataset([dataset, dataset]) or otherwise duplicate items
appropriately, and if the input is a pl.DataFrame keep using pl.concat; ensure
the function returns the same concrete type it was given and update its type
annotation accordingly so calls from train.py (train_dataset, val_dataset) no
longer fail at runtime.
experiments/slurm_base_template.sh (1)

39-48: ⚠️ Potential issue | 🟠 Major

Replace -c with --complete-train (current -c is ignored)

experiments/slurm_base_template.sh passes -c to icu-benchmarks train (line 45), but icu_benchmarks/run_utils.py only defines --complete-train (no -c short flag). Because icu_benchmarks/run.py uses parse_known_args, the -c is silently ignored, so args.complete_train stays false (training uses held-out splits). Replace -c with --complete-train.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@experiments/slurm_base_template.sh` around lines 39 - 48, The SLURM job
template is passing the short flag `-c` to the CLI but the parser in
icu_benchmarks/run_utils.py only defines `--complete-train`, and because run.py
uses parse_known_args the `-c` is ignored; update the command invocation in the
SLURM template (the `icu-benchmarks train ...` line) to replace `-c` with
`--complete-train` so that args.complete_train is set correctly when parsed.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In @.github/workflows/e2e-integration.yml:
- Around line 59-65: The artifact name is colliding across matrix jobs because
every job uses the fixed name "yaib-e2e-logs"; change the upload step (the
actions/upload-artifact@v4 invocation) to include the matrix Python version in
the artifact name (e.g., incorporate matrix.python-version) so each matrix job
uploads a unique artifact name; update the "name" field for the artifact upload
to interpolate the Python version (using the workflow expression for
matrix.python-version) to avoid overwrites and collisions.

In `@experiments/slurm_base_template.sh`:
- Around line 39-48: The SLURM job template is passing the short flag `-c` to
the CLI but the parser in icu_benchmarks/run_utils.py only defines
`--complete-train`, and because run.py uses parse_known_args the `-c` is
ignored; update the command invocation in the SLURM template (the
`icu-benchmarks train ...` line) to replace `-c` with `--complete-train` so that
args.complete_train is set correctly when parsed.

In `@icu_benchmarks/models/train.py`:
- Around line 117-122: assure_minimum_length currently expects a polars
DataFrame but is being called with dataset objects (train_dataset / val_dataset,
e.g. PredictionPolarsDataset), causing a type mismatch; update
assure_minimum_length to accept both polars.DataFrame and
torch.utils.data.Dataset by adding a type union in its signature and branching:
if the input is a torch.utils.data.Dataset use
torch.utils.data.ConcatDataset([dataset, dataset]) or otherwise duplicate items
appropriately, and if the input is a pl.DataFrame keep using pl.concat; ensure
the function returns the same concrete type it was given and update its type
annotation accordingly so calls from train.py (train_dataset, val_dataset) no
longer fail at runtime.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: c1c06115-3606-41a7-b095-8e90d8c3e031

📥 Commits

Reviewing files that changed from the base of the PR and between a69c586 and 2b51e1b.

📒 Files selected for processing (12)
  • .github/workflows/code-quality.yml
  • .github/workflows/e2e-integration.yml
  • .github/workflows/python-build.yml
  • README.md
  • experiments/slurm_base_template.sh
  • icu_benchmarks/data/loader.py
  • icu_benchmarks/imputation/amputations.py
  • icu_benchmarks/imputation/diffwave.py
  • icu_benchmarks/models/train.py
  • icu_benchmarks/run_utils.py
  • pyproject.toml
  • requirements.txt
💤 Files with no reviewable changes (3)
  • requirements.txt
  • icu_benchmarks/imputation/amputations.py
  • icu_benchmarks/data/loader.py
🚧 Files skipped from review as they are similar to previous changes (6)
  • .github/workflows/code-quality.yml
  • icu_benchmarks/run_utils.py
  • .github/workflows/python-build.yml
  • pyproject.toml
  • icu_benchmarks/imputation/diffwave.py
  • README.md

@rvandewater rvandewater changed the title Major update to main Modernization, speed, memory usage, and flexibility improvements Jun 8, 2026
@rvandewater rvandewater merged commit 8cb3e4c into master Jun 8, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants