Skip to content

Merge some bugfixes for DIANN dataset into Development#193

Merged
swaraj-neu merged 4 commits into
develfrom
bugfix/diann
Apr 24, 2026
Merged

Merge some bugfixes for DIANN dataset into Development#193
swaraj-neu merged 4 commits into
develfrom
bugfix/diann

Conversation

@swaraj-neu
Copy link
Copy Markdown
Contributor

@swaraj-neu swaraj-neu commented Apr 23, 2026

  • Default the DIANN 2.0+ checkbox to unchecked
  • Auto detect the DIANN dataset file type and automatically toggle the checkbox for better user experience, still retaining the ability to let user manually toggle the checkbox if required
  • Show a warning if the user selects an invalid state of the checkbox based on the type of file uploaded
  • Handle application crashing in case if the user uses an invalid configuration while uplaoding the data
  • Write test cases for all edge cases of the DIANN file version detection

Motivation and Context

The DIANN dataset support in MSstatsShiny previously defaulted the DIANN 2.0+ checkbox to an unsafe state, lacked robust auto-detection of DIANN file versions, and could crash or mis-handle uploads when user selection conflicted with file contents. This PR improves UX and robustness by auto-detecting DIANN file format from a file preview, keeping the user able to override the choice, warning on mismatches, and preventing crashes from invalid upload configurations.

Solution Summary

On file preview the app now infers DIANN 1.x vs DIANN 2.0+ using column-name patterns and updates the DIANN 2.0+ checkbox to match detected format (while allowing manual override). A mismatch watcher warns the user if their manual selection conflicts with detection. File-preview and upload flows use a centralized, resilient preview reader and added error handling to avoid application crashes. Unit tests were added to cover detection and preview edge cases, including Parquet schema-only previews.

Detailed Changes

  • NAMESPACE

    • No changes to NAMESPACE in this PR (imports remain as-is).
  • UI (R/module-loadpage-ui.R)

    • DIANN 2.0+ checkbox default changed from TRUE to FALSE so the pre-2.0 intensity input panel is shown by default when DIANN is selected.
  • Server (R/module-loadpage-server.R)

    • Centralized preview reads via .read_preview() for preview-based UI features (applies to DIANN as well as Metamorpheus PTM).
    • preview_data reactive introduced to hold the preview result.
    • Added last_detected_diann_format reactive to avoid redundant notifications.
    • After preview, run .is_diann_2plus(preview) to detect DIANN file version.
    • Auto-update DIANN 2.0+ checkbox via updateCheckboxInput() only when detected state changes; show informative notifications for detected DIANN 2.0+ vs 1.x.
    • observeEvent on user changes to diann_2plus emits a warning if manual selection conflicts with detected format.
    • get_data / upload flow hardened: getData(input) errors are caught, modal spinner removed, a failure notification is shown, and NULL is returned to prevent crashes.
    • Preview failures produce standardized "Could not preview file. Please verify the file format." warning.
  • Utilities (R/utils.R)

    • Added .read_preview(filepath, filename = NULL, nrows = 100):
      • Reads up to nrows for delimited text using data.table::fread.
      • For Parquet (.parquet/.pq), uses Arrow to open dataset schema and returns an empty data.frame with correct column names (schema-only), avoiding full reads.
      • Returns NULL on read errors.
    • Added .is_diann_2plus(preview_df):
      • Detects DIANN 2.0+ by presence of numbered fragment quantity columns matching "^Fr\.[0-9]+\.Quantity$" and absence of legacy fragment columns ("Fragment.Quant.Corrected", "FragmentQuantCorrected").
      • Returns FALSE for NULL or empty inputs or when legacy columns are present.
  • Other server/UI behaviors

    • Metamorpheus PTM preview logic generalized to use .read_preview and standardized warning on preview failure.
    • extract/resolve helper functions present/used for PTM modification ID UI.
    • Various UI renderUI outputs continue to rely on preview_data where applicable.

Unit Tests Added / Modified

  • tests/testthat/test-utils.R
    • New tests for .is_diann_2plus:
      • TRUE for DIANN 2.0+ patterns (Fr.0.Quantity … Fr.11.Quantity).
      • FALSE for DIANN 1.x legacy columns (Fragment.Quant.Corrected, FragmentQuantCorrected).
      • Precedence: legacy fragment columns cause FALSE when both legacy and numbered columns coexist.
      • Defensive cases: NULL preview, empty data.frame, data without fragment columns.
    • New tests for .read_preview:
      • CSV/TSV parsing with nrows limiting.
      • Returns NULL for missing/non-existent files.
      • Tolerant handling for malformed bytes.
      • Behavior when filename is NULL.
      • Parquet (.parquet and .pq) schema-only reads via Arrow (returns empty data.frame with correct columns).
    • Integration tests:
      • End-to-end checks reading temporary CSV/parquet preview and verifying .is_diann_2plus classification for DIANN 1.x vs 2.0+.

Coding Guidelines

  • No coding guideline violations detected:
    • Internal helpers use dot-prefix naming (.read_preview, .is_diann_2plus).
    • Non-exported internals are annotated with @nord where present.
    • Error handling uses tryCatch to return NULL on preview errors for graceful degradation.
    • Shiny reactive idioms (reactiveVal, observe, observeEvent, updateCheckboxInput) are used appropriately.
    • Tests are organized and cover edge cases for detection and preview behavior.

@swaraj-neu swaraj-neu requested a review from tonywu1999 April 23, 2026 02:24
@swaraj-neu swaraj-neu self-assigned this Apr 23, 2026
@swaraj-neu swaraj-neu added the bug Something isn't working label Apr 23, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 23, 2026

📝 Walkthrough

Walkthrough

Centralized preview reading and DIANN 2.0+ vs 1.x detection added; DIANN checkbox default flipped to FALSE and auto-synced to detected format (with mismatch warnings); preview/read errors now produce user notifications and return NULL; one import removed from NAMESPACE.

Changes

Cohort / File(s) Summary
Namespace Management
NAMESPACE
Removed import of MSstatsQualityMetricsPlot from MSstats.
Preview & DIANN Detection Utilities
R/utils.R
Added .read_preview() to read limited CSV/TSV or parquet schema-only previews and .is_diann_2plus() to detect DIANN 2.0+ by per-fragment column patterns.
Load Page Server Logic
R/module-loadpage-server.R
Switched preview reads to .read_preview(), added DIANN detection flow that updates diann_2plus only on detection changes, emits detection notifications, warns on manual/detected mismatches, and handles getData() errors by removing spinners, notifying user, and returning NULL.
Load Page UI
R/module-loadpage-ui.R
Changed DIANN 2.0+ checkbox default from TRUE to FALSE, altering initial conditional-panel visibility for DIANN inputs.
Tests
tests/testthat/test-utils.R
Added tests for .read_preview() (CSV/TSV nrows limiting, parquet schema dispatch, malformed input tolerance) and .is_diann_2plus() (DIANN 2.0+ vs legacy patterns, precedence, NULL/empty handling), plus integration reads for CSV/parquet.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant UI as Load Page UI
    participant Server as Load Page Server
    participant Preview as Preview Reader
    participant Detector as DIANN Detector
    participant Notifier as Notification System

    User->>UI: upload/select file
    UI->>Server: request preview read
    Server->>Preview: .read_preview(filepath)
    Preview-->>Server: preview_df or NULL
    Server->>Detector: .is_diann_2plus(preview_df)
    Detector-->>Server: detected_format (TRUE/FALSE/NULL)
    Server->>UI: updateCheckboxInput(diann_2plus) if detection changed
    Server->>Notifier: notify detected format
    Notifier-->>User: show format notification

    Note over Server,UI: compare manual vs detected selection
    Server->>Server: if mismatch -> warn
    Server->>Notifier: show mismatch warning
    Notifier-->>User: warn user

    alt getData() error
        Server->>Server: remove spinner
        Server->>Notifier: notify failure
        Server-->>UI: return NULL
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested labels

Review effort 3/5

Suggested reviewers

  • tonywu1999
  • devonjkohler

Poem

🐰 A hop, a peek, a preview file,
I sniff the columns all the while,
DIANN 2.0 I can detect,
Checkbox synced — no disconnect! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Title check ❓ Inconclusive The title mentions 'bugfixes for DIANN dataset' which aligns with the PR's main objectives of fixing DIANN-related issues, but it is vague and doesn't convey the specific changes like auto-detection, checkbox defaults, or error handling improvements. Consider a more specific title such as 'Auto-detect DIANN version and improve error handling for dataset uploads' or 'Add DIANN 2.0+ auto-detection with manual override and validation' to better reflect the key improvements.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch bugfix/diann

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
R/utils.R (1)

29-37: Detection logic LGTM; consider also guarding against non-DIANN files.

The precedence rule (legacy column wins when both patterns somehow coexist) and NULL/empty handling look correct and are well covered by the new tests. One minor thought: a file with neither signature (e.g. an unrelated CSV) correctly returns FALSE, but the server will then silently notify "Detected DIANN 1.x format" — you may want a third "unknown" outcome in the future to avoid misleading the user. Not a blocker.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@R/utils.R` around lines 29 - 37, Replace the boolean-only check with a
three-way detector so callers cannot be misled by non-DIANN files: implement a
new function detect_diann_version(preview_df) (or update .is_diann_2plus to
delegate) that computes has_numbered_fragments and has_legacy_fragments using
the existing patterns and then returns one of "diann_2plus", "diann_1x", or
"unknown" (use the existing precedence where legacy wins if both present);
update any callers that currently rely on .is_diann_2plus to use
detect_diann_version and branch on the explicit "unknown" result instead of
assuming DIANN 1.x.
R/module-loadpage-server.R (1)

126-140: Mismatch observer may emit a spurious warning during auto-sync.

When .is_diann_2plus() detection flips the checkbox via updateCheckboxInput() above, Shiny fires input$diann_2plus on the next flush. Because this observeEvent re-evaluates preview_data() and .is_diann_2plus() at event time, the comparison will normally match — but if preview_data() changes between the update and the event (e.g. user swaps files quickly), you can get a transient false-positive "mismatch" warning. Cheaper and more robust: track the last detected value in a reactiveVal and compare against that instead of recomputing detection here.

Also: isTRUE(input$diann_2plus) != detected_2plus works, but !identical(isTRUE(...), detected_2plus) reads clearer.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@R/module-loadpage-server.R` around lines 126 - 140, Replace the runtime
recomputation of format detection inside the observeEvent with a stored
last-detected value: create a reactiveVal (e.g. last_diann_detected <-
reactiveVal()) that you set whenever you run .is_diann_2plus(preview_data())
(the same place that calls updateCheckboxInput), and in the
observeEvent(input$diann_2plus, ...) compare the checkbox state to
last_diann_detected() instead of calling .is_diann_2plus(preview) again; use
!identical(isTRUE(input$diann_2plus), last_diann_detected()) for the mismatch
test and keep the existing showNotification logic and ignoreInit = TRUE.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@R/module-loadpage-server.R`:
- Around line 108-123: The observer unconditionally calls updateCheckboxInput()
and showNotification() on any reactive invalidation; change it to compute
is_2plus via .is_diann_2plus(preview_data()) but only call
updateCheckboxInput(session, "diann_2plus", ...) and showNotification(...) when
the detected value actually differs from the current input state (use
isolate(input$diann_2plus) or a small reactiveVal like last_diann_2plus to
compare). Keep the same req(...) checks and only perform the update/notification
when is_2plus != current value to avoid spurious toasts and extra reactive churn
from updateCheckboxInput triggering other observers.

In `@R/utils.R`:
- Around line 8-18: The parquet branch in .read_preview currently materializes
the whole file; replace the arrow::read_parquet(filepath) call with
metadata-only access using arrow::ParquetFileReader$create(filepath)$GetSchema()
to obtain column names without reading data, then construct and return an empty
data.frame/data.table with those column names (so downstream .is_diann_2plus()
can inspect names) — update .read_preview to use
ParquetFileReader$create(filepath)$GetSchema() and map the schema names into an
empty table instead of loading the full parquet file.

---

Nitpick comments:
In `@R/module-loadpage-server.R`:
- Around line 126-140: Replace the runtime recomputation of format detection
inside the observeEvent with a stored last-detected value: create a reactiveVal
(e.g. last_diann_detected <- reactiveVal()) that you set whenever you run
.is_diann_2plus(preview_data()) (the same place that calls updateCheckboxInput),
and in the observeEvent(input$diann_2plus, ...) compare the checkbox state to
last_diann_detected() instead of calling .is_diann_2plus(preview) again; use
!identical(isTRUE(input$diann_2plus), last_diann_detected()) for the mismatch
test and keep the existing showNotification logic and ignoreInit = TRUE.

In `@R/utils.R`:
- Around line 29-37: Replace the boolean-only check with a three-way detector so
callers cannot be misled by non-DIANN files: implement a new function
detect_diann_version(preview_df) (or update .is_diann_2plus to delegate) that
computes has_numbered_fragments and has_legacy_fragments using the existing
patterns and then returns one of "diann_2plus", "diann_1x", or "unknown" (use
the existing precedence where legacy wins if both present); update any callers
that currently rely on .is_diann_2plus to use detect_diann_version and branch on
the explicit "unknown" result instead of assuming DIANN 1.x.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ebf1501e-8bde-48fa-962e-4b53b67797d8

📥 Commits

Reviewing files that changed from the base of the PR and between 6a5ae5b and 3b0c186.

📒 Files selected for processing (5)
  • NAMESPACE
  • R/module-loadpage-server.R
  • R/module-loadpage-ui.R
  • R/utils.R
  • tests/testthat/test-utils.R
💤 Files with no reviewable changes (1)
  • NAMESPACE

Comment thread R/module-loadpage-server.R
Comment thread R/utils.R
Comment thread NAMESPACE
importFrom(DT,renderDataTable)
importFrom(Hmisc,describe)
importFrom(MSstatsBioNet,annotateProteinInfoFromIndra)
importFrom(MSstatsBioNet,deleteEdgeFromNetwork)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rebase to devel and install the latest version of MSstatsBioNet from Bioconductor devel / github. Then rerun devtools::document(). This shouldn't be deleted.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
R/module-loadpage-server.R (1)

85-111: Forward reference to last_detected_diann_format is fragile.

Line 95 invokes last_detected_diann_format(NULL) inside observe({ ... }), but the reactiveVal itself is declared later on line 111. This happens to work because observer bodies execute during the reactive flush (after the module function body finishes), but the ordering makes the code harder to follow and would break if the observe body were ever invoked eagerly during setup. Consider declaring last_detected_diann_format above the preview observe so the lexical order matches the dependency order.

♻️ Proposed reorder
+    # Track last detected DIANN format to avoid redundant notifications
+    last_detected_diann_format <- reactiveVal(NULL)
+
     # Read first 100 rows for preview-based UI features.
     # Supported: Metamorpheus PTM (modification ID dropdown), DIANN (version auto-detection).
     # TODO: Extend to other input formats (Spectronaut, MaxQuant) as needed.
     observe({
       should_preview <- (isTRUE(input$filetype == "meta") && isTRUE(input$BIO == "PTM")) ||
                        (isTRUE(input$filetype == "diann") && isTRUE(input$BIO != "PTM"))
       ...
     })
-
-    # Track last detected DIANN format to avoid redundant notifications
-    last_detected_diann_format <- reactiveVal(NULL)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@R/module-loadpage-server.R` around lines 85 - 111, The observe block reads
and reset the reactiveVal last_detected_diann_format before that reactiveVal is
declared, which is a fragile forward reference; move the declaration of
last_detected_diann_format <- reactiveVal(NULL) so it appears above the
observe({...}) that calls last_detected_diann_format(NULL) (i.e., declare the
reactiveVal before the preview observe/preview_data logic) to ensure lexical
order matches dependency order and avoid eager-initialization breakage.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@R/module-loadpage-server.R`:
- Around line 85-111: The observe block reads and reset the reactiveVal
last_detected_diann_format before that reactiveVal is declared, which is a
fragile forward reference; move the declaration of last_detected_diann_format <-
reactiveVal(NULL) so it appears above the observe({...}) that calls
last_detected_diann_format(NULL) (i.e., declare the reactiveVal before the
preview observe/preview_data logic) to ensure lexical order matches dependency
order and avoid eager-initialization breakage.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d9a58da5-d32b-4686-8933-29f5b3dcabff

📥 Commits

Reviewing files that changed from the base of the PR and between 3b0c186 and 9c0a174.

📒 Files selected for processing (4)
  • R/module-loadpage-server.R
  • R/module-loadpage-ui.R
  • R/utils.R
  • tests/testthat/test-utils.R

- Auto detect the DIANN dataset file type and automatically toggle the checkbox for better user experience, still retaining the ability to let user manually toggle the checkbox if required
- Show a warning if the user selects an invalid state of the checkbox based on the type of file uploaded
- Handle application crashing in case if the user uses an invalid configuration while uplaoding the data
- Write test cases for all edge cases of the DIANN file version detection
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
R/utils.R (1)

8-24: Prior OOM concern addressed; minor simplification possible.

The switch to arrow::open_dataset(...)$schema reads metadata only and avoids materializing large Parquet reports — nicely resolves the earlier feedback.

One nit: setNames(lapply(col_names, ...), col_names) is redundant since lapply over a character vector already produces a named list. You can drop setNames or, if you just need a 0-row frame with the right column names, build it more directly.

♻️ Optional simplification
       schema <- arrow::open_dataset(filepath, format = "parquet")$schema
       col_names <- schema$names
-      empty_df <- as.data.frame(
-        setNames(lapply(col_names, function(x) logical(0)), col_names)
-      )
-      empty_df
+      # 0-row frame preserving column names for downstream detection
+      as.data.frame(matrix(logical(0), nrow = 0, ncol = length(col_names),
+                           dimnames = list(NULL, col_names)))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@R/utils.R` around lines 8 - 24, In .read_preview(), simplify the
Parquet-empty-dataframe construction by removing the redundant setNames call:
build empty_df from lapply(col_names, function(x) logical(0)) (which already
yields a named list) or directly create a 0-row data.frame with the correct
column names using col_names; keep the same variable names (col_names, empty_df)
and return behavior unchanged so the function still returns a 0-row data.frame
for parquet files.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@R/utils.R`:
- Around line 8-24: In .read_preview(), simplify the Parquet-empty-dataframe
construction by removing the redundant setNames call: build empty_df from
lapply(col_names, function(x) logical(0)) (which already yields a named list) or
directly create a 0-row data.frame with the correct column names using
col_names; keep the same variable names (col_names, empty_df) and return
behavior unchanged so the function still returns a 0-row data.frame for parquet
files.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 441487d5-b571-4cdd-8b73-689e9d473421

📥 Commits

Reviewing files that changed from the base of the PR and between 9c0a174 and cdde69f.

📒 Files selected for processing (5)
  • NAMESPACE
  • R/module-loadpage-server.R
  • R/module-loadpage-ui.R
  • R/utils.R
  • tests/testthat/test-utils.R
💤 Files with no reviewable changes (1)
  • NAMESPACE
🚧 Files skipped from review as they are similar to previous changes (3)
  • R/module-loadpage-ui.R
  • R/module-loadpage-server.R
  • tests/testthat/test-utils.R

@swaraj-neu swaraj-neu merged commit 83156ff into devel Apr 24, 2026
1 of 2 checks passed
@coderabbitai coderabbitai Bot mentioned this pull request May 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants