Modernization, speed, memory usage, and flexibility improvements#188
Conversation
Reduced time-outs of cluster
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
|
There was a problem hiding this comment.
Actionable comments posted: 17
Note
Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
docs/adding_model/instructions.md (1)
20-27:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winFix shell continuation examples to be copy-paste safe.
The command examples use
\with inline comments on the same line, which can break shell continuation when users copy/paste. Move comments to separate lines (or remove them) so the command executes as documented.Proposed doc fix
-icu-benchmarks train \ - -d demo_data/mortality24/mimic_demo \ # Insert cohort dataset here +icu-benchmarks train \ + # Insert cohort dataset here + -d demo_data/mortality24/mimic_demo \ -n mimic_demo \ - -t BinaryClassification \ # Insert task name here + # Insert task name here + -t BinaryClassification \ -tn Mortality24 \ --log-dir ../yaib_logs/ \ - -m RNN \ # Insert model here + # Insert model here + -m RNN \Also applies to: 200-207
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/adding_model/instructions.md` around lines 20 - 27, The shell example using the multi-line "icu-benchmarks train \" block contains inline comments after backslashes which break copy/paste; update the example so comments are on separate lines (or removed) and ensure each line that ends with a backslash contains no trailing text or comment and no trailing spaces so the continuation works. Specifically edit the example lines containing the command "icu-benchmarks train" and its flags (-d, -n, -t, -tn, --log-dir, -m, -s) to move explanatory notes to their own lines above/below the command or delete them, and mirror the same fix for the other occurrence mentioned (lines 200-207).
🟡 Minor comments (8)
experiments/experiment_submission.sh-2-3 (1)
2-3:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAdd input validation for required positional arguments.
The script uses
$1and$2without validation. If called without arguments, the script will fail with unclear errors or create incorrectly named jobs.🛡️ Proposed fix to add validation
#!/bin/bash +if [ $# -ne 2 ]; then + echo "Usage: $0 TASK_NAME MODEL_NAME" + exit 1 +fi export TASK_NAME=$1 export MODEL_NAME=$2🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@experiments/experiment_submission.sh` around lines 2 - 3, The script assigns TASK_NAME and MODEL_NAME from positional args without checks; add validation at the start of experiment_submission.sh to ensure both $1 and $2 are provided: if either TASK_NAME or MODEL_NAME is empty, print a clear usage/error message and exit with a non-zero status. Update references to TASK_NAME and MODEL_NAME to rely on these validated variables so the script fails fast with a helpful message when arguments are missing.docs/imputation_methods.md-33-34 (1)
33-34:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winCorrect the config directory path in the docs.
This points to
configs/imputation, but the repo’s config layout in this PR usesconfigs/imputation_models. Please align the path to avoid broken onboarding instructions.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/imputation_methods.md` around lines 33 - 34, Update the docs string in docs/imputation_methods.md to point to the actual config directory used in this PR: replace the reference to "configs/imputation" with "configs/imputation_models" (so the sentence reads that you must create a gin file named "newmethod.gin" in configs/imputation_models to match the name used in the gin.configurable decorator).README.md-283-283 (1)
283-283:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winFix case of CONTRIBUTING link target.
CONTRIBUTING.MDdoes not match the actual file nameCONTRIBUTING.md.Suggested fix
-We appreciate contributions to the project. Please read the [contribution guidelines](CONTRIBUTING.MD) before submitting a pull request. +We appreciate contributions to the project. Please read the [contribution guidelines](CONTRIBUTING.md) before submitting a pull request.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@README.md` at line 283, Update the markdown link target in README: replace the incorrect case "[contribution guidelines](CONTRIBUTING.MD)" with the correctly-cased filename "[contribution guidelines](CONTRIBUTING.md)" so the link points to the actual CONTRIBUTING.md file; ensure the link text remains the same and only the URI's case is changed.README.md-9-9 (1)
9-9:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winPython badge link target is inconsistent with displayed version.
Badge says Python 3.12 but link points to the Python 3.10.0 release page.
Suggested fix
-[](https://www.python.org/downloads/release/python-3100/) +[](https://www.python.org/downloads/release/python-3120/)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@README.md` at line 9, The Python badge markdown "[](...)" shows 3.12 but links to the 3.10 release; make them consistent by either changing the link target to the Python 3.12 release page (e.g., update the URL in the second parentheses to the 3.12 release) or change the badge label to 3.10 so the displayed version matches the link; update the badge markdown accordingly.PAPER.md-29-34 (1)
29-34:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAlign documented Python version with current project baseline.
This file documents Python 3.10 while the updated project setup elsewhere uses 3.12.
Suggested fix
-Then install Python 3.10, pin it for the repository, and sync the environment: +Then install Python 3.12, pin it for the repository, and sync the environment: @@ -uv python install 3.10 -uv python pin 3.10 +uv python install 3.12 +uv python pin 3.12🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@PAPER.md` around lines 29 - 34, Update the documented Python version from 3.10 to 3.12 in the instructions that show the uv commands: change the examples for `uv python install 3.10` and `uv python pin 3.10` to use 3.12 (keep `uv sync` the same) so the PAPER.md matches the project's current Python baseline..github/workflows/e2e-integration.yml-52-54 (1)
52-54:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winFix mismatched path in failure message.
The check validates
yaib_logs/mimic_demo, but the error message says../yaib_logs/....Suggested fix
- echo "ERROR: Expected log directory ../yaib_logs/mimic_demo was not created." + echo "ERROR: Expected log directory yaib_logs/mimic_demo was not created."🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/e2e-integration.yml around lines 52 - 54, The failure message checks for the directory "yaib_logs/mimic_demo" but prints a mismatched "../yaib_logs/mimic_demo"; update the echo in the if block so the error text references the same path string used in the condition (i.e., "yaib_logs/mimic_demo") — locate the echo line that currently prints "../yaib_logs/mimic_demo" and change it to match the checked path.icu_benchmarks/run_utils.py-66-66 (1)
66-66:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winFix typo in CLI help text.
verboslyshould beverboselyin the--verbosehelp string.Suggested fix
- help="Set to log verbosly. Disable for clean logs.", + help="Set to log verbosely. Disable for clean logs.",🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@icu_benchmarks/run_utils.py` at line 66, Typo in the CLI help string for the --verbose argument: change "verbosly" to "verbosely" in the add_argument call that defines the --verbose flag (the help parameter on the parser.add_argument/ArgumentParser entry for "--verbose"); update the help text to "Set to log verbosely. Disable for clean logs." so the CLI message is correct.icu_benchmarks/data/loader.py-230-241 (1)
230-241:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winEmit the deprecation warning only once.
The same
DeprecationWarningis raised twice back-to-back, which creates duplicate noise for callers.Proposed fix
warnings.warn( "CommonPandasDataset is deprecated. Use CommonPolarsDataset instead.", DeprecationWarning, stacklevel=2, ) if not isinstance(vars, dict): raise ValueError(f"Expected vars to be of type dict, got {type(vars)} instead") - warnings.warn( - "CommonPandasDataset is deprecated. Use CommonPolarsDataset instead.", - DeprecationWarning, - stacklevel=2, - )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@icu_benchmarks/data/loader.py` around lines 230 - 241, The duplicate DeprecationWarning for CommonPandasDataset should be emitted only once: remove the second warnings.warn call and ensure the single warning stays in the code path that executes before any validation (the block surrounding the CommonPandasDataset handling in loader.py); keep the isinstance(vars, dict) check and the ValueError as-is but do not repeat warnings.warn for CommonPandasDataset a second time so callers only see one deprecation message.
🧹 Nitpick comments (5)
configs/tasks/BinaryClassificationNoDynamic.gin (2)
1-50: ⚡ Quick winClarify the filename: "NoDynamic" is misleading.
The filename suggests dynamic features are excluded, but the config includes dynamic data mappings (line 17) and defines 42 dynamic features (lines 26-31). The "NoDynamic" appears to mean "no feature generation from dynamic data" (line 42:
generate_features = False), not "no dynamic features."Consider renaming to
BinaryClassificationNoGeneratedFeatures.ginor adding a clarifying comment at the top.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@configs/tasks/BinaryClassificationNoDynamic.gin` around lines 1 - 50, The config filename "NoDynamic" is misleading because the file still maps and uses dynamic features (vars DYNAMIC, preprocess.file_names) while only disabling feature generation via default_preprocessor.generate_features = False; update the repo by either renaming the config to something like BinaryClassificationNoGeneratedFeatures.gin (or BinaryClassification_NoFeatureGeneration.gin) or add a clarifying top-of-file comment explaining that dynamic variables (vars["DYNAMIC"], preprocess.file_names) are present but feature generation is disabled via default_preprocessor.generate_features = False; ensure any documentation or CI references to the old name are updated if you rename.
16-33: ⚖️ Poor tradeoffConsider extracting common config to reduce duplication.
The
file_namesmapping (lines 16-20) andvarsstructure (lines 22-33) are duplicated inBinaryClassificationNoStatic.gin. Extracting these to a shared base config (e.g.,common/BinaryClassificationBase.gin) would improve maintainability.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@configs/tasks/BinaryClassificationNoDynamic.gin` around lines 16 - 33, Create a new shared config (e.g., common/BinaryClassificationBase.gin) that contains the duplicated preprocess.file_names mapping and vars dictionary, then remove those blocks from BinaryClassificationNoDynamic.gin and BinaryClassificationNoStatic.gin and include or import the new common config at the top of both files; specifically extract the preprocess.file_names and vars symbols (including keys "DYNAMIC"/"OUTCOME"/"STATIC", GROUP, LABEL, SEQUENCE, and the DYNAMIC/STATIC feature lists) into BinaryClassificationBase.gin so both configs reference the same definitions.configs/tasks/BinaryClassificationNoStatic.gin (1)
32-34: 💤 Low valueQuestion: Why define static features when they're disabled?
The config defines static features (line 32, line 19) but immediately disables them via
static_features = False(line 34). While this provides a template for easy toggling, it adds unused configuration. Consider either:
- Omitting static feature definitions entirely, or
- Adding a comment explaining they're defined as a template for easy enabling.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@configs/tasks/BinaryClassificationNoStatic.gin` around lines 32 - 34, The config defines a STATIC list and immediately disables it via static_features = False; either remove the STATIC key/value entirely to avoid unused config or keep it but add a one-line comment above STATIC explaining it's a template for easy toggling (e.g., "template for enabling static features; set static_features = True to use"); update the block containing "STATIC" and the static_features variable accordingly and ensure any references to STATIC elsewhere are reviewed before removal.experiments/slurm_base_features.sh (1)
14-14: 💤 Low valueConsider parameterizing the hard-coded path.
The absolute path
/dhc/home/robin.vandewater/projects/yaib_updatedis user-specific. For broader reusability, consider making this an environment variable or documenting that users must update this path.♻️ Suggested approach
-YAIB_PATH=/dhc/home/robin.vandewater/projects/yaib_updated `#/dhc/home/robin.vandewater/projects/yaib` +YAIB_PATH=${YAIB_PATH:-/dhc/home/robin.vandewater/projects/yaib_updated}This allows users to override via environment variable while keeping a default.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@experiments/slurm_base_features.sh` at line 14, Replace the hard-coded YAIB_PATH assignment with a parameterized approach: read YAIB_PATH from the environment if set (fall back to the current default value), and update references that use YAIB_PATH accordingly; specifically change the YAIB_PATH variable initialization in experiments/slurm_base_features.sh so it prefers an existing environment variable (e.g., $YAIB_PATH) and documents in a comment that users can override it, ensuring scripts that rely on YAIB_PATH keep the same variable name.icu_benchmarks/models/__init__.py (1)
55-78: 💤 Low valueConsider exporting the type aliases in
__all__.The type aliases
DLModel,MLModelClassifier, andMLModelRegressionare not included in__all__, which means they won't be accessible viafrom icu_benchmarks.models import *or auto-complete in some IDEs. If these type aliases are intended for public use in type hints elsewhere in the codebase, consider adding them.📝 Proposed addition
__all__ = [ "GRUNet", "RNNet", "LSTMNet", "TemporalConvNet", "BaseTransformer", "Transformer", "LocalTransformer", "CBClassifier", "RUSBClassifier", "BRFClassifier", "LGBMClassifier", "LGBMRegressor", "XGBClassifier", "LogisticRegression", "LinearRegression", "ElasticNet", "RFClassifier", "SVMClassifier", "SVMRegressor", "MLPRegressor", "MLPClassifier", "PerceptronClassifier", + "DLModel", + "MLModelClassifier", + "MLModelRegression", ]🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@icu_benchmarks/models/__init__.py` around lines 55 - 78, The __all__ list is missing the public type aliases DLModel, MLModelClassifier, and MLModelRegression; update the module export list to include these names so they are exported on from icu_benchmarks.models import * and show up in IDE autocompletion—locate the __all__ definition in this file (the list containing "GRUNet", "RNNet", etc.) and add "DLModel", "MLModelClassifier", and "MLModelRegression" to that list.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/workflows/code-quality.yml:
- Around line 5-7: The workflow triggers list currently includes branches:
[main, "release/*", "development"] in both the push and pull_request blocks;
update both arrays to also include "master" so CI runs for pushes and PRs
targeting master (e.g., change branches to [main, master, "release/*",
"development"] in the push and pull_request sections).
In @.github/workflows/e2e-integration.yml:
- Around line 18-32: The "Set up Python" step is hardcoded to install 3.12;
update it to use the matrix value so each job uses its intended Python version.
Change the run command in the "Set up Python" step (currently "uv python install
3.12") to reference the matrix variable (e.g. use ${{ matrix.python-version }})
and keep the existing python-version matrix definition so the matrix entries
("python-version") drive the installed version.
In @.github/workflows/python-build.yml:
- Around line 9-10: The workflow tag filter currently uses both "v*" and
"[0-9]+.[0-9]+.[0-9]+", which makes semver enforcement ineffective because "v*"
already matches all v-prefixed tags and the numeric pattern lacks the leading
"v"; update the tag pattern to require a leading v (e.g. replace the numeric
entry with v[0-9]+.[0-9]+.[0-9]+) and remove the broad "v*" entry so only
v-prefixed semver tags trigger the job, or alternatively keep "v*" removed and
enforce stricter semver validation inside the job conditional/script (update the
tag filter entries accordingly in the workflow file).
In @.pre-commit-config.yaml:
- Line 4: The global YAML key exclude currently set to '\.ipynb$' prevents the
nbstripout hook from running; remove or narrow this global exclude and instead
apply any exclusion at the specific hook level so the nbstripout hook (id:
nbstripout) can run; update the global exclude entry and any duplicate
occurrences (the same pattern around lines 80-84) so that notebooks are not
globally excluded, or add a per-hook exclude under the nbstripout hook to
preserve intended exceptions without disabling the hook entirely.
In `@experiments/slurm_base_template.sh`:
- Line 23: Fix the typo on the line that prints the task type: replace the
misspelled command `echi "Task type:" ${TASK}` with the correct shell builtin
`echo` so the script prints the TASK variable properly (look for the line
containing the `echi` invocation that references `${TASK}`).
In `@icu_benchmarks/data/loader.py`:
- Around line 454-461: The code assumes grouping_column is always set but the
signature allows grouping_column: Optional[str] so update the selection and
subsequent groupby usage to handle None: when building the columns list for
self.dyn_df (the block that currently does self.dyn_df =
self.dyn_df[list(select_columns) + [grouping_column]]), only append
grouping_column if grouping_column is not None and exists in the dataframe;
likewise, guard any later self.dyn_df.groupby(grouping_column) or references to
grouping_column (lines ~463-470) so that if grouping_column is None you skip the
groupby path and use the non-grouped logic or a suitable aggregation, avoiding
any use of None as a column name. Ensure you check presence in
self.dyn_df.columns before selecting.
In `@icu_benchmarks/data/preprocessor.py`:
- Around line 171-173: The three calls that run .unique() on
data[DataSplit.*][DataSegment.features] after splitting (the lines using
DataSplit.train/val/test and DataSegment.features) can drop feature rows
independently and silently misalign features and outcomes; remove these
post-split .unique() calls and instead perform deduplication before splitting or
deduplicate features and outcomes together while preserving row alignment (e.g.,
dedupe on the combined frame or apply the same mask/index to both
DataSegment.features and DataSegment.outcomes). Update the preprocessing flow
where the variable data is prepared so deduplication happens once in a place
that affects all relevant segments consistently rather than only on
DataSegment.features after split.
- Around line 241-242: The StepImputeModel is being constructed with the wrong
callable reference: currently passing self.model_impute while the actual
implemented method is _model_impute; update the constructor calls (e.g., the
occurrence in the dynamic segment where StepImputeModel(model=self.model_impute,
sel=...) is added) to pass the real callable self._model_impute instead, and
make the same replacement for the other occurrences noted (around the blocks at
463-466 and 495-496) so the Pandas preprocessor path invokes the implemented
_model_impute function.
In `@icu_benchmarks/data/split_process_data.py`:
- Around line 161-165: The modality filtering result is assigned to local
variable data but subsequent preprocessing still uses sanitized_data, so the
selection is ignored; call modality_selection and then propagate its results by
assigning back to sanitized_data (and update vars) so later steps operate on the
filtered dataset—i.e., after modality_selection(sanitized_data,
modality_mapping, selected_modalities, vars) set sanitized_data = data (and keep
vars as returned) instead of leaving sanitized_data unchanged.
- Around line 216-218: The call to caching(...) inside the `if generate_cache:`
branch incorrectly passes `load_cache` as the write flag so generated caches are
never persisted; change the argument passed to the caching function (the write
gate) to use `generate_cache` (or an explicit True) instead of `load_cache` so
that when `generate_cache` is true the function actually writes the cache for
`cache_dir`, `cache_file`, and `sanitized_data`.
- Around line 103-106: The current code replaces vars[VarType.label] with a
single-element list when an explicit label is provided, which changes its type
for downstream code expecting a scalar; update the branch in
split_process_data.py so that when label is not None you assign
vars[VarType.label] = label (a scalar) instead of [label], keeping the existing
behavior when label is None and the original list-handling intact (use the
existing checks around vars, VarType.label and the local variable label to
locate and change the assignment).
- Around line 133-135: The hookup of the pretrained imputation model is
constrained to PandasClassificationPreprocessor, causing Polars or other
preprocessors to miss it; update the logic in split_process_data.py to call
set_imputation_model on preprocessor_instance more generally (e.g., if
hasattr(preprocessor_instance, "set_imputation_model") or by including
PolarsClassificationPreprocessor in the isinstance check) so that any
preprocessor implementation exposing set_imputation_model receives
pretrained_imputation_model; ensure you reference preprocessor_instance and the
set_imputation_model method when changing the conditional.
In `@icu_benchmarks/imputation/amputations.py`:
- Around line 222-223: The mask initialization uses torch.zeros_like(data) which
fails if data is a DataFrame-like object; change the initialization to create
the mask from the tensor used for imputation (e.g. X) instead. Specifically,
replace the torch.zeros_like(data) call with torch.zeros_like(X) (or
torch.zeros(X.shape, dtype=X.dtype, device=X.device)) so the mask variable
aligns with downstream calls like data.mask(mask) and any branching on mechanism
(e.g. MAR) works without a TypeError; update the place where mask is defined
(mask = ...) in the amputation logic accordingly.
In `@icu_benchmarks/imputation/diffwave.py`:
- Around line 387-393: The residual stack currently reinitializes h and runs the
inner loop twice, causing duplicated computation and double-counting skip_n; to
fix, initialize h = noise once before a single loop over
range(self.num_res_layers) and inside that loop call h, skip_n =
self.residual_blocks[n]((h, conditional, diffusion_step_embed)) then accumulate
skip += skip_n only once, removing the outer loop and the extra skip += skip_n
after the loop so the residual blocks execute sequentially and skip values are
counted correctly.
In `@icu_benchmarks/models/train.py`:
- Around line 101-103: The fallback dataset_names = {} can cause KeyError when
code later indexes dataset_names["train"], ["val"], or ["test"]; update the
initialization to ensure those keys exist (e.g., set dataset_names = {"train":
None, "val": None, "test": None}) or change all direct indexing to safe lookups
(dataset_names.get("train") / .get("val") / .get("test") with sensible
defaults). Locate and fix occurrences of the variable dataset_names in this file
(the initial block shown plus the other uses around the sections noted at
116-117 and 216) to either initialize required keys or replace direct indexing
with .get to avoid KeyError.
- Around line 254-257: The code writes SHAP test values to
"shap_values_test.parquet" but logs and downstream aggregation expect
"test_shap_values.parquet"; update the write and log to use the same filename
that the aggregator expects (use "test_shap_values.parquet") — modify the block
that opens (log_dir / "shap_values_test.parquet") and the logging.info call so
both reference (log_dir / "test_shap_values.parquet") and ensure this aligns
with how aggregation reads the file; keep references to shaps_test.write_parquet
and the trainer.lightning_module.train_shap_values check unchanged.
In `@pyproject.toml`:
- Around line 32-60: Update pyproject.toml to resolve the version mismatches
with requirements.txt by aligning the dependency pins: change "numpy==1.26.0" to
"numpy==1.24.3", add an explicit pin for "matplotlib==3.7.1", change
"pandas>=2.2.2" to "pandas==2.2.2", change "polars>=1.9.0" to "polars==1.9.0",
change "wandb>=0.17.3" to "wandb==0.17.3", and change "tensorboard>=2.12.2" to
"tensorboard==2.12.2"; alternatively if you intend pyproject.toml to be
authoritative, remove or update requirements.txt so both sources match (ensure
consistency for numpy, matplotlib, pandas, polars, wandb, tensorboard).
---
Outside diff comments:
In `@docs/adding_model/instructions.md`:
- Around line 20-27: The shell example using the multi-line "icu-benchmarks
train \" block contains inline comments after backslashes which break
copy/paste; update the example so comments are on separate lines (or removed)
and ensure each line that ends with a backslash contains no trailing text or
comment and no trailing spaces so the continuation works. Specifically edit the
example lines containing the command "icu-benchmarks train" and its flags (-d,
-n, -t, -tn, --log-dir, -m, -s) to move explanatory notes to their own lines
above/below the command or delete them, and mirror the same fix for the other
occurrence mentioned (lines 200-207).
---
Minor comments:
In @.github/workflows/e2e-integration.yml:
- Around line 52-54: The failure message checks for the directory
"yaib_logs/mimic_demo" but prints a mismatched "../yaib_logs/mimic_demo"; update
the echo in the if block so the error text references the same path string used
in the condition (i.e., "yaib_logs/mimic_demo") — locate the echo line that
currently prints "../yaib_logs/mimic_demo" and change it to match the checked
path.
In `@docs/imputation_methods.md`:
- Around line 33-34: Update the docs string in docs/imputation_methods.md to
point to the actual config directory used in this PR: replace the reference to
"configs/imputation" with "configs/imputation_models" (so the sentence reads
that you must create a gin file named "newmethod.gin" in
configs/imputation_models to match the name used in the gin.configurable
decorator).
In `@experiments/experiment_submission.sh`:
- Around line 2-3: The script assigns TASK_NAME and MODEL_NAME from positional
args without checks; add validation at the start of experiment_submission.sh to
ensure both $1 and $2 are provided: if either TASK_NAME or MODEL_NAME is empty,
print a clear usage/error message and exit with a non-zero status. Update
references to TASK_NAME and MODEL_NAME to rely on these validated variables so
the script fails fast with a helpful message when arguments are missing.
In `@icu_benchmarks/data/loader.py`:
- Around line 230-241: The duplicate DeprecationWarning for CommonPandasDataset
should be emitted only once: remove the second warnings.warn call and ensure the
single warning stays in the code path that executes before any validation (the
block surrounding the CommonPandasDataset handling in loader.py); keep the
isinstance(vars, dict) check and the ValueError as-is but do not repeat
warnings.warn for CommonPandasDataset a second time so callers only see one
deprecation message.
In `@icu_benchmarks/run_utils.py`:
- Line 66: Typo in the CLI help string for the --verbose argument: change
"verbosly" to "verbosely" in the add_argument call that defines the --verbose
flag (the help parameter on the parser.add_argument/ArgumentParser entry for
"--verbose"); update the help text to "Set to log verbosely. Disable for clean
logs." so the CLI message is correct.
In `@PAPER.md`:
- Around line 29-34: Update the documented Python version from 3.10 to 3.12 in
the instructions that show the uv commands: change the examples for `uv python
install 3.10` and `uv python pin 3.10` to use 3.12 (keep `uv sync` the same) so
the PAPER.md matches the project's current Python baseline.
In `@README.md`:
- Line 283: Update the markdown link target in README: replace the incorrect
case "[contribution guidelines](CONTRIBUTING.MD)" with the correctly-cased
filename "[contribution guidelines](CONTRIBUTING.md)" so the link points to the
actual CONTRIBUTING.md file; ensure the link text remains the same and only the
URI's case is changed.
- Line 9: The Python badge markdown "[](...)" shows 3.12 but links
to the 3.10 release; make them consistent by either changing the link target to
the Python 3.12 release page (e.g., update the URL in the second parentheses to
the 3.12 release) or change the badge label to 3.10 so the displayed version
matches the link; update the badge markdown accordingly.
---
Nitpick comments:
In `@configs/tasks/BinaryClassificationNoDynamic.gin`:
- Around line 1-50: The config filename "NoDynamic" is misleading because the
file still maps and uses dynamic features (vars DYNAMIC, preprocess.file_names)
while only disabling feature generation via
default_preprocessor.generate_features = False; update the repo by either
renaming the config to something like
BinaryClassificationNoGeneratedFeatures.gin (or
BinaryClassification_NoFeatureGeneration.gin) or add a clarifying top-of-file
comment explaining that dynamic variables (vars["DYNAMIC"],
preprocess.file_names) are present but feature generation is disabled via
default_preprocessor.generate_features = False; ensure any documentation or CI
references to the old name are updated if you rename.
- Around line 16-33: Create a new shared config (e.g.,
common/BinaryClassificationBase.gin) that contains the duplicated
preprocess.file_names mapping and vars dictionary, then remove those blocks from
BinaryClassificationNoDynamic.gin and BinaryClassificationNoStatic.gin and
include or import the new common config at the top of both files; specifically
extract the preprocess.file_names and vars symbols (including keys
"DYNAMIC"/"OUTCOME"/"STATIC", GROUP, LABEL, SEQUENCE, and the DYNAMIC/STATIC
feature lists) into BinaryClassificationBase.gin so both configs reference the
same definitions.
In `@configs/tasks/BinaryClassificationNoStatic.gin`:
- Around line 32-34: The config defines a STATIC list and immediately disables
it via static_features = False; either remove the STATIC key/value entirely to
avoid unused config or keep it but add a one-line comment above STATIC
explaining it's a template for easy toggling (e.g., "template for enabling
static features; set static_features = True to use"); update the block
containing "STATIC" and the static_features variable accordingly and ensure any
references to STATIC elsewhere are reviewed before removal.
In `@experiments/slurm_base_features.sh`:
- Line 14: Replace the hard-coded YAIB_PATH assignment with a parameterized
approach: read YAIB_PATH from the environment if set (fall back to the current
default value), and update references that use YAIB_PATH accordingly;
specifically change the YAIB_PATH variable initialization in
experiments/slurm_base_features.sh so it prefers an existing environment
variable (e.g., $YAIB_PATH) and documents in a comment that users can override
it, ensuring scripts that rely on YAIB_PATH keep the same variable name.
In `@icu_benchmarks/models/__init__.py`:
- Around line 55-78: The __all__ list is missing the public type aliases
DLModel, MLModelClassifier, and MLModelRegression; update the module export list
to include these names so they are exported on from icu_benchmarks.models import
* and show up in IDE autocompletion—locate the __all__ definition in this file
(the list containing "GRUNet", "RNNet", etc.) and add "DLModel",
"MLModelClassifier", and "MLModelRegression" to that list.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 7aca635c-af4a-469e-b5c9-e1a38d32161c
⛔ Files ignored due to path filters (1)
docs/figures/yaib_flow_combined.svgis excluded by!**/*.svg
📒 Files selected for processing (100)
.github/workflows/ci.yml.github/workflows/code-quality.yml.github/workflows/e2e-integration.yml.github/workflows/python-build.yml.gitignore.pre-commit-config.yamlCONTRIBUTING.MDCONTRIBUTING.mdLICENSEPAPER.mdREADME.mdconfigs/imputation_models/BRNN.ginconfigs/imputation_models/Diffusion.ginconfigs/imputation_models/MICE.ginconfigs/imputation_models/NP.ginconfigs/imputation_models/RNN.ginconfigs/imputation_models/SSSDSA.ginconfigs/models/LogisticRegressionDebug.ginconfigs/prediction_models/BRFClassifier.ginconfigs/prediction_models/CBClassifier.ginconfigs/prediction_models/GRU.ginconfigs/prediction_models/LSTM.ginconfigs/prediction_models/LogisticRegression.ginconfigs/prediction_models/RFClassifier.ginconfigs/prediction_models/RUSBClassifier.ginconfigs/prediction_models/Transformer.ginconfigs/prediction_models/XGBClassifier.ginconfigs/prediction_models/common/DLCommon.ginconfigs/prediction_models/common/DLTuning.ginconfigs/prediction_models/common/MLCommon.ginconfigs/prediction_models/common/MLTuning.ginconfigs/tasks/BinaryClassification.ginconfigs/tasks/BinaryClassificationNoDynamic.ginconfigs/tasks/BinaryClassificationNoStatic.ginconfigs/tasks/DatasetImputation.ginconfigs/tasks/Regression.ginconfigs/tasks/common/CrossValidation.ginconfigs/tasks/common/Dataloader.ginconfigs/tasks/common/Imports.ginconfigs/tasks/common/PredictionTaskVariables.gindocs/adding_model/instructions.mddocs/adding_model/rnn.pydocs/development.mddocs/imputation_methods.mddocs/wanddb.mdenvironment.ymlexperiments/benchmark_classification.ymlexperiments/benchmark_regression.ymlexperiments/demo_benchmark_classification.ymlexperiments/demo_benchmark_regression.ymlexperiments/experiment_eval_pooled.ymlexperiments/experiment_full_training.ymlexperiments/experiment_submission.shexperiments/slurm_base_features.shexperiments/slurm_base_template.shicu_benchmarks/__init__.pyicu_benchmarks/cross_validation.pyicu_benchmarks/data/loader.pyicu_benchmarks/data/pooling.pyicu_benchmarks/data/preprocessor.pyicu_benchmarks/data/split_process_data.pyicu_benchmarks/imputation/amputations.pyicu_benchmarks/imputation/baselines.pyicu_benchmarks/imputation/csdi.pyicu_benchmarks/imputation/diffusion.pyicu_benchmarks/imputation/diffwave.pyicu_benchmarks/imputation/layers/s4layer.pyicu_benchmarks/imputation/mlp.pyicu_benchmarks/imputation/np.pyicu_benchmarks/imputation/rnn.pyicu_benchmarks/imputation/simple_diffusion.pyicu_benchmarks/imputation/sssds4.pyicu_benchmarks/imputation/sssdsa.pyicu_benchmarks/models/__init__.pyicu_benchmarks/models/constants.pyicu_benchmarks/models/custom_metrics.pyicu_benchmarks/models/dl_models/layers.pyicu_benchmarks/models/dl_models/rnn.pyicu_benchmarks/models/dl_models/tcn.pyicu_benchmarks/models/dl_models/transformer.pyicu_benchmarks/models/ml_models/lgbm.pyicu_benchmarks/models/ml_models/sklearn.pyicu_benchmarks/models/train.pyicu_benchmarks/models/utils.pyicu_benchmarks/models/wrappers.pyicu_benchmarks/run.pyicu_benchmarks/run_utils.pyicu_benchmarks/tuning/hyperparameters.pypyproject.tomlrequirements.txtscripts/evaluate_results/aggregrate_experiment.pyscripts/plotting/plotting.pyscripts/sample_usage/random_search/LGBM.shscripts/sample_usage/random_search/LSTM.shscripts/sweep_configs/01_experiment_1.yamlscripts/sweep_configs/hyperparameter_sweeps/brnn_sweep.ymlscripts/sweep_configs/hyperparameter_sweeps/np_sweep.ymlscripts/sweep_configs/hyperparameter_sweeps/saits_sweep.ymlsetup.cfgsetup.py
💤 Files with no reviewable changes (16)
- .github/workflows/ci.yml
- CONTRIBUTING.MD
- LICENSE
- environment.yml
- configs/prediction_models/LogisticRegression.gin
- setup.py
- configs/prediction_models/RUSBClassifier.gin
- configs/prediction_models/GRU.gin
- setup.cfg
- configs/tasks/Regression.gin
- configs/tasks/BinaryClassification.gin
- configs/prediction_models/RFClassifier.gin
- configs/tasks/DatasetImputation.gin
- configs/prediction_models/BRFClassifier.gin
- configs/prediction_models/LSTM.gin
- configs/prediction_models/Transformer.gin
random feedback improvements:
Bug Fixes
Fixed a SLURM script syntax error and corrected a CLI help typo.
Chores
CI workflows now run push/pull_request on more branches; E2E tests adjusted to run on selected Python versions and updated setup/log message; removed pinned dependencies from requirements; loosened several dev/runtime dependency pins.
Documentation
Updated Python version badge and contribution link in the README.
Refactor
Adjusted imputation mask initialization, training dataset defaults and SHAP output name, residual block iteration behavior, and minor dataset init cleanup.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
.github/workflows/e2e-integration.yml (1)
59-65:⚠️ Potential issue | 🟠 Major | ⚡ Quick winArtifact name collision across matrix jobs.
All matrix jobs will try to upload to the same artifact name
yaib-e2e-logs, causing overwrites or failures withactions/upload-artifact@v4. Include the Python version in the artifact name.🐛 Proposed fix
- name: Upload run logs as artifact if: always() uses: actions/upload-artifact@v4 with: - name: yaib-e2e-logs + name: yaib-e2e-logs-py${{ matrix.python-version }} path: yaib_logs/ retention-days: 7🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/e2e-integration.yml around lines 59 - 65, The artifact name is colliding across matrix jobs because every job uses the fixed name "yaib-e2e-logs"; change the upload step (the actions/upload-artifact@v4 invocation) to include the matrix Python version in the artifact name (e.g., incorporate matrix.python-version) so each matrix job uploads a unique artifact name; update the "name" field for the artifact upload to interpolate the Python version (using the workflow expression for matrix.python-version) to avoid overwrites and collisions.icu_benchmarks/models/train.py (1)
117-122:⚠️ Potential issue | 🔴 CriticalFix type mismatch:
assure_minimum_lengthis passed a dataset object, not apl.DataFrame(train.py:117-122)
assure_minimum_lengthis annotated asassure_minimum_length(dataset: pl.DataFrame) -> pl.DataFrameand callspl.concat([dataset, dataset])whenlen(dataset) < 2. At lines 117-122,train_dataset/val_datasetare instantiated fromdataset_class(forpolars=True, this isPredictionPolarsDataset, which is atorch.utils.data.Dataset, not a PolarsDataFrame). This will raise at runtime when the split has fewer than 2 samples, and the function’s contract doesn’t match its call sites.Update either
assure_minimum_lengthto support dataset objects, or pass the underlying PolarsDataFramethatpl.concatexpects.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@icu_benchmarks/models/train.py` around lines 117 - 122, assure_minimum_length currently expects a polars DataFrame but is being called with dataset objects (train_dataset / val_dataset, e.g. PredictionPolarsDataset), causing a type mismatch; update assure_minimum_length to accept both polars.DataFrame and torch.utils.data.Dataset by adding a type union in its signature and branching: if the input is a torch.utils.data.Dataset use torch.utils.data.ConcatDataset([dataset, dataset]) or otherwise duplicate items appropriately, and if the input is a pl.DataFrame keep using pl.concat; ensure the function returns the same concrete type it was given and update its type annotation accordingly so calls from train.py (train_dataset, val_dataset) no longer fail at runtime.experiments/slurm_base_template.sh (1)
39-48:⚠️ Potential issue | 🟠 MajorReplace
-cwith--complete-train(current-cis ignored)
experiments/slurm_base_template.shpasses-ctoicu-benchmarks train(line 45), buticu_benchmarks/run_utils.pyonly defines--complete-train(no-cshort flag). Becauseicu_benchmarks/run.pyusesparse_known_args, the-cis silently ignored, soargs.complete_trainstays false (training uses held-out splits). Replace-cwith--complete-train.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@experiments/slurm_base_template.sh` around lines 39 - 48, The SLURM job template is passing the short flag `-c` to the CLI but the parser in icu_benchmarks/run_utils.py only defines `--complete-train`, and because run.py uses parse_known_args the `-c` is ignored; update the command invocation in the SLURM template (the `icu-benchmarks train ...` line) to replace `-c` with `--complete-train` so that args.complete_train is set correctly when parsed.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In @.github/workflows/e2e-integration.yml:
- Around line 59-65: The artifact name is colliding across matrix jobs because
every job uses the fixed name "yaib-e2e-logs"; change the upload step (the
actions/upload-artifact@v4 invocation) to include the matrix Python version in
the artifact name (e.g., incorporate matrix.python-version) so each matrix job
uploads a unique artifact name; update the "name" field for the artifact upload
to interpolate the Python version (using the workflow expression for
matrix.python-version) to avoid overwrites and collisions.
In `@experiments/slurm_base_template.sh`:
- Around line 39-48: The SLURM job template is passing the short flag `-c` to
the CLI but the parser in icu_benchmarks/run_utils.py only defines
`--complete-train`, and because run.py uses parse_known_args the `-c` is
ignored; update the command invocation in the SLURM template (the
`icu-benchmarks train ...` line) to replace `-c` with `--complete-train` so that
args.complete_train is set correctly when parsed.
In `@icu_benchmarks/models/train.py`:
- Around line 117-122: assure_minimum_length currently expects a polars
DataFrame but is being called with dataset objects (train_dataset / val_dataset,
e.g. PredictionPolarsDataset), causing a type mismatch; update
assure_minimum_length to accept both polars.DataFrame and
torch.utils.data.Dataset by adding a type union in its signature and branching:
if the input is a torch.utils.data.Dataset use
torch.utils.data.ConcatDataset([dataset, dataset]) or otherwise duplicate items
appropriately, and if the input is a pl.DataFrame keep using pl.concat; ensure
the function returns the same concrete type it was given and update its type
annotation accordingly so calls from train.py (train_dataset, val_dataset) no
longer fail at runtime.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: c1c06115-3606-41a7-b095-8e90d8c3e031
📒 Files selected for processing (12)
.github/workflows/code-quality.yml.github/workflows/e2e-integration.yml.github/workflows/python-build.ymlREADME.mdexperiments/slurm_base_template.shicu_benchmarks/data/loader.pyicu_benchmarks/imputation/amputations.pyicu_benchmarks/imputation/diffwave.pyicu_benchmarks/models/train.pyicu_benchmarks/run_utils.pypyproject.tomlrequirements.txt
💤 Files with no reviewable changes (3)
- requirements.txt
- icu_benchmarks/imputation/amputations.py
- icu_benchmarks/data/loader.py
🚧 Files skipped from review as they are similar to previous changes (6)
- .github/workflows/code-quality.yml
- icu_benchmarks/run_utils.py
- .github/workflows/python-build.yml
- pyproject.toml
- icu_benchmarks/imputation/diffwave.py
- README.md
Summary by CodeRabbit
New Features
icu-benchmarksCLI entry point for streamlined command execution.Build & Installation
pyproject.toml-based build system withuvpackage manager.setup.pyand environment configuration files.Documentation
Infrastructure