Merge some bugfixes for DIANN dataset into Development#193
Conversation
📝 WalkthroughWalkthroughCentralized preview reading and DIANN 2.0+ vs 1.x detection added; DIANN checkbox default flipped to FALSE and auto-synced to detected format (with mismatch warnings); preview/read errors now produce user notifications and return NULL; one import removed from NAMESPACE. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant UI as Load Page UI
participant Server as Load Page Server
participant Preview as Preview Reader
participant Detector as DIANN Detector
participant Notifier as Notification System
User->>UI: upload/select file
UI->>Server: request preview read
Server->>Preview: .read_preview(filepath)
Preview-->>Server: preview_df or NULL
Server->>Detector: .is_diann_2plus(preview_df)
Detector-->>Server: detected_format (TRUE/FALSE/NULL)
Server->>UI: updateCheckboxInput(diann_2plus) if detection changed
Server->>Notifier: notify detected format
Notifier-->>User: show format notification
Note over Server,UI: compare manual vs detected selection
Server->>Server: if mismatch -> warn
Server->>Notifier: show mismatch warning
Notifier-->>User: warn user
alt getData() error
Server->>Server: remove spinner
Server->>Notifier: notify failure
Server-->>UI: return NULL
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
R/utils.R (1)
29-37: Detection logic LGTM; consider also guarding against non-DIANN files.The precedence rule (legacy column wins when both patterns somehow coexist) and
NULL/empty handling look correct and are well covered by the new tests. One minor thought: a file with neither signature (e.g. an unrelated CSV) correctly returnsFALSE, but the server will then silently notify "Detected DIANN 1.x format" — you may want a third "unknown" outcome in the future to avoid misleading the user. Not a blocker.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/utils.R` around lines 29 - 37, Replace the boolean-only check with a three-way detector so callers cannot be misled by non-DIANN files: implement a new function detect_diann_version(preview_df) (or update .is_diann_2plus to delegate) that computes has_numbered_fragments and has_legacy_fragments using the existing patterns and then returns one of "diann_2plus", "diann_1x", or "unknown" (use the existing precedence where legacy wins if both present); update any callers that currently rely on .is_diann_2plus to use detect_diann_version and branch on the explicit "unknown" result instead of assuming DIANN 1.x.R/module-loadpage-server.R (1)
126-140: Mismatch observer may emit a spurious warning during auto-sync.When
.is_diann_2plus()detection flips the checkbox viaupdateCheckboxInput()above, Shiny firesinput$diann_2pluson the next flush. Because thisobserveEventre-evaluatespreview_data()and.is_diann_2plus()at event time, the comparison will normally match — but ifpreview_data()changes between the update and the event (e.g. user swaps files quickly), you can get a transient false-positive "mismatch" warning. Cheaper and more robust: track the last detected value in areactiveValand compare against that instead of recomputing detection here.Also:
isTRUE(input$diann_2plus) != detected_2plusworks, but!identical(isTRUE(...), detected_2plus)reads clearer.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/module-loadpage-server.R` around lines 126 - 140, Replace the runtime recomputation of format detection inside the observeEvent with a stored last-detected value: create a reactiveVal (e.g. last_diann_detected <- reactiveVal()) that you set whenever you run .is_diann_2plus(preview_data()) (the same place that calls updateCheckboxInput), and in the observeEvent(input$diann_2plus, ...) compare the checkbox state to last_diann_detected() instead of calling .is_diann_2plus(preview) again; use !identical(isTRUE(input$diann_2plus), last_diann_detected()) for the mismatch test and keep the existing showNotification logic and ignoreInit = TRUE.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@R/module-loadpage-server.R`:
- Around line 108-123: The observer unconditionally calls updateCheckboxInput()
and showNotification() on any reactive invalidation; change it to compute
is_2plus via .is_diann_2plus(preview_data()) but only call
updateCheckboxInput(session, "diann_2plus", ...) and showNotification(...) when
the detected value actually differs from the current input state (use
isolate(input$diann_2plus) or a small reactiveVal like last_diann_2plus to
compare). Keep the same req(...) checks and only perform the update/notification
when is_2plus != current value to avoid spurious toasts and extra reactive churn
from updateCheckboxInput triggering other observers.
In `@R/utils.R`:
- Around line 8-18: The parquet branch in .read_preview currently materializes
the whole file; replace the arrow::read_parquet(filepath) call with
metadata-only access using arrow::ParquetFileReader$create(filepath)$GetSchema()
to obtain column names without reading data, then construct and return an empty
data.frame/data.table with those column names (so downstream .is_diann_2plus()
can inspect names) — update .read_preview to use
ParquetFileReader$create(filepath)$GetSchema() and map the schema names into an
empty table instead of loading the full parquet file.
---
Nitpick comments:
In `@R/module-loadpage-server.R`:
- Around line 126-140: Replace the runtime recomputation of format detection
inside the observeEvent with a stored last-detected value: create a reactiveVal
(e.g. last_diann_detected <- reactiveVal()) that you set whenever you run
.is_diann_2plus(preview_data()) (the same place that calls updateCheckboxInput),
and in the observeEvent(input$diann_2plus, ...) compare the checkbox state to
last_diann_detected() instead of calling .is_diann_2plus(preview) again; use
!identical(isTRUE(input$diann_2plus), last_diann_detected()) for the mismatch
test and keep the existing showNotification logic and ignoreInit = TRUE.
In `@R/utils.R`:
- Around line 29-37: Replace the boolean-only check with a three-way detector so
callers cannot be misled by non-DIANN files: implement a new function
detect_diann_version(preview_df) (or update .is_diann_2plus to delegate) that
computes has_numbered_fragments and has_legacy_fragments using the existing
patterns and then returns one of "diann_2plus", "diann_1x", or "unknown" (use
the existing precedence where legacy wins if both present); update any callers
that currently rely on .is_diann_2plus to use detect_diann_version and branch on
the explicit "unknown" result instead of assuming DIANN 1.x.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: ebf1501e-8bde-48fa-962e-4b53b67797d8
📒 Files selected for processing (5)
NAMESPACER/module-loadpage-server.RR/module-loadpage-ui.RR/utils.Rtests/testthat/test-utils.R
💤 Files with no reviewable changes (1)
- NAMESPACE
| importFrom(DT,renderDataTable) | ||
| importFrom(Hmisc,describe) | ||
| importFrom(MSstatsBioNet,annotateProteinInfoFromIndra) | ||
| importFrom(MSstatsBioNet,deleteEdgeFromNetwork) |
There was a problem hiding this comment.
Rebase to devel and install the latest version of MSstatsBioNet from Bioconductor devel / github. Then rerun devtools::document(). This shouldn't be deleted.
2b3f1f9 to
9c0a174
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
R/module-loadpage-server.R (1)
85-111: Forward reference tolast_detected_diann_formatis fragile.Line 95 invokes
last_detected_diann_format(NULL)insideobserve({ ... }), but thereactiveValitself is declared later on line 111. This happens to work because observer bodies execute during the reactive flush (after the module function body finishes), but the ordering makes the code harder to follow and would break if the observe body were ever invoked eagerly during setup. Consider declaringlast_detected_diann_formatabove the preview observe so the lexical order matches the dependency order.♻️ Proposed reorder
+ # Track last detected DIANN format to avoid redundant notifications + last_detected_diann_format <- reactiveVal(NULL) + # Read first 100 rows for preview-based UI features. # Supported: Metamorpheus PTM (modification ID dropdown), DIANN (version auto-detection). # TODO: Extend to other input formats (Spectronaut, MaxQuant) as needed. observe({ should_preview <- (isTRUE(input$filetype == "meta") && isTRUE(input$BIO == "PTM")) || (isTRUE(input$filetype == "diann") && isTRUE(input$BIO != "PTM")) ... }) - - # Track last detected DIANN format to avoid redundant notifications - last_detected_diann_format <- reactiveVal(NULL)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/module-loadpage-server.R` around lines 85 - 111, The observe block reads and reset the reactiveVal last_detected_diann_format before that reactiveVal is declared, which is a fragile forward reference; move the declaration of last_detected_diann_format <- reactiveVal(NULL) so it appears above the observe({...}) that calls last_detected_diann_format(NULL) (i.e., declare the reactiveVal before the preview observe/preview_data logic) to ensure lexical order matches dependency order and avoid eager-initialization breakage.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@R/module-loadpage-server.R`:
- Around line 85-111: The observe block reads and reset the reactiveVal
last_detected_diann_format before that reactiveVal is declared, which is a
fragile forward reference; move the declaration of last_detected_diann_format <-
reactiveVal(NULL) so it appears above the observe({...}) that calls
last_detected_diann_format(NULL) (i.e., declare the reactiveVal before the
preview observe/preview_data logic) to ensure lexical order matches dependency
order and avoid eager-initialization breakage.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: d9a58da5-d32b-4686-8933-29f5b3dcabff
📒 Files selected for processing (4)
R/module-loadpage-server.RR/module-loadpage-ui.RR/utils.Rtests/testthat/test-utils.R
- Auto detect the DIANN dataset file type and automatically toggle the checkbox for better user experience, still retaining the ability to let user manually toggle the checkbox if required - Show a warning if the user selects an invalid state of the checkbox based on the type of file uploaded - Handle application crashing in case if the user uses an invalid configuration while uplaoding the data - Write test cases for all edge cases of the DIANN file version detection
9c0a174 to
cdde69f
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
R/utils.R (1)
8-24: Prior OOM concern addressed; minor simplification possible.The switch to
arrow::open_dataset(...)$schemareads metadata only and avoids materializing large Parquet reports — nicely resolves the earlier feedback.One nit:
setNames(lapply(col_names, ...), col_names)is redundant sincelapplyover a character vector already produces a named list. You can dropsetNamesor, if you just need a 0-row frame with the right column names, build it more directly.♻️ Optional simplification
schema <- arrow::open_dataset(filepath, format = "parquet")$schema col_names <- schema$names - empty_df <- as.data.frame( - setNames(lapply(col_names, function(x) logical(0)), col_names) - ) - empty_df + # 0-row frame preserving column names for downstream detection + as.data.frame(matrix(logical(0), nrow = 0, ncol = length(col_names), + dimnames = list(NULL, col_names)))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/utils.R` around lines 8 - 24, In .read_preview(), simplify the Parquet-empty-dataframe construction by removing the redundant setNames call: build empty_df from lapply(col_names, function(x) logical(0)) (which already yields a named list) or directly create a 0-row data.frame with the correct column names using col_names; keep the same variable names (col_names, empty_df) and return behavior unchanged so the function still returns a 0-row data.frame for parquet files.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@R/utils.R`:
- Around line 8-24: In .read_preview(), simplify the Parquet-empty-dataframe
construction by removing the redundant setNames call: build empty_df from
lapply(col_names, function(x) logical(0)) (which already yields a named list) or
directly create a 0-row data.frame with the correct column names using
col_names; keep the same variable names (col_names, empty_df) and return
behavior unchanged so the function still returns a 0-row data.frame for parquet
files.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 441487d5-b571-4cdd-8b73-689e9d473421
📒 Files selected for processing (5)
NAMESPACER/module-loadpage-server.RR/module-loadpage-ui.RR/utils.Rtests/testthat/test-utils.R
💤 Files with no reviewable changes (1)
- NAMESPACE
🚧 Files skipped from review as they are similar to previous changes (3)
- R/module-loadpage-ui.R
- R/module-loadpage-server.R
- tests/testthat/test-utils.R
Motivation and Context
The DIANN dataset support in MSstatsShiny previously defaulted the DIANN 2.0+ checkbox to an unsafe state, lacked robust auto-detection of DIANN file versions, and could crash or mis-handle uploads when user selection conflicted with file contents. This PR improves UX and robustness by auto-detecting DIANN file format from a file preview, keeping the user able to override the choice, warning on mismatches, and preventing crashes from invalid upload configurations.
Solution Summary
On file preview the app now infers DIANN 1.x vs DIANN 2.0+ using column-name patterns and updates the DIANN 2.0+ checkbox to match detected format (while allowing manual override). A mismatch watcher warns the user if their manual selection conflicts with detection. File-preview and upload flows use a centralized, resilient preview reader and added error handling to avoid application crashes. Unit tests were added to cover detection and preview edge cases, including Parquet schema-only previews.
Detailed Changes
NAMESPACE
UI (R/module-loadpage-ui.R)
Server (R/module-loadpage-server.R)
Utilities (R/utils.R)
Other server/UI behaviors
Unit Tests Added / Modified
Coding Guidelines