Skip to content

refactor(converters): Migrate converters to MSstatsConvert#91

Merged
tonywu1999 merged 3 commits into
develfrom
refactor-converters
Apr 16, 2026
Merged

refactor(converters): Migrate converters to MSstatsConvert#91
tonywu1999 merged 3 commits into
develfrom
refactor-converters

Conversation

@tonywu1999
Copy link
Copy Markdown
Contributor

@tonywu1999 tonywu1999 commented Apr 10, 2026

Motivation and Context

This PR refactors the MSstatsTMT package to consolidate converter implementations by delegating to the upstream MSstatsConvert package. Previously, MSstatsTMT maintained local implementations of five converter functions (MaxQtoMSstatsTMTFormat, OpenMStoMSstatsTMTFormat, PDtoMSstatsTMTFormat, SpectroMinetoMSstatsTMTFormat, and PhilosophertoMSstatsTMTFormat) with wrapper logic for logging, preprocessing, and data handling. This refactoring removes the local implementations and establishes MSstatsTMT as a re-exporting interface to the canonical converter implementations in MSstatsConvert, reducing code duplication and maintenance burden while ensuring a single source of truth for data conversion logic.

Changes

Converter Implementation Changes

  • R/converters.R: Replaced local implementations of five converter functions with exported re-exports that alias functions from MSstatsConvert via @importFrom roxygen directives. Removed associated helper functions (.documentFunction, .getPhilosopherInput) and all wrapper logic for logging, preprocessing, and data handling
    • Functions converted to re-exports: MaxQtoMSstatsTMTFormat, OpenMStoMSstatsTMTFormat, PDtoMSstatsTMTFormat, SpectroMinetoMSstatsTMTFormat, PhilosophertoMSstatsTMTFormat
    • File reduced from ~426 lines to ~18 lines
    • Re-exports now directly delegate to MSstatsConvert::<function> implementations

Documentation Refactoring

  • Removed converter documentation files: man/MaxQtoMSstatsTMTFormat.Rd, man/OpenMStoMSstatsTMTFormat.Rd, man/PDtoMSstatsTMTFormat.Rd, man/SpectroMinetoMSstatsTMTFormat.Rd, man/PhilosophertoMSstatsTMTFormat.Rd (now provided by MSstatsConvert package)
  • Removed example dataset documentation files: man/evidence.Rd, man/proteinGroups.Rd, man/raw.pd.Rd, man/raw.mine.Rd, man/annotation.pd.Rd, man/annotation.mq.Rd, man/annotation.mine.Rd, man/raw.om.Rd
  • Added reexports documentation: man/reexports.Rd documents the imported converter functions with cross-references to MSstatsConvert source
  • Updated internal documentation: man/dot-documentFunction.Rd updated to reference R/utils_docs.R source location instead of R/converters.R; removed argument documentation for .documentFunction
  • Removed internal function documentation: man/dot-getPhilosopherInput.Rd

Utility Functions

  • R/utils_docs.R: New placeholder utility file introduced containing internal .documentFunction() for roxygen documentation parameter inheritance. Function implements parameter documentation for fewMeasurements, useUniquePeptide, summaryforMultipleRows, removeProtein_with1Feature, removeProtein_with1Protein, removeOxidationMpeptides, and removeMpeptides, but returns NULL with no operational logic
  • R/utils_MSstatsTMT.R: Removed large block of roxygen documentation examples and dataset descriptions (~224 lines deleted)

Namespace and Dependencies

  • NAMESPACE: Updated to import five converter functions from MSstatsConvert; removed imports for preprocessing utilities (MSstatsBalancedDesign, MSstatsClean, MSstatsImport, MSstatsLogsSettings, MSstatsMakeAnnotation, MSstatsPreprocess)

Data Management

  • data/datalist: Removed entries for example datasets (annotatio.mine, annotation.mq, annotation.pd, evidence, proteinGroups, raw.mine, raw.pd, raw.om, input.pd), retaining only test.pairwise, input.pd, and quant.pd.msstats
  • Example data no longer bundled with MSstatsTMT; examples now load data from MSstatsConvert package resources

Vignette Updates

  • vignettes/MSstatsTMT.Rmd: Updated converter examples to load input data inline via data.table::fread(system.file(..., package="MSstatsConvert")) instead of relying on bundled example datasets
    • Proteome Discoverer example: loads raw.pd and annotation.pd directly via PDtoMSstatsTMTFormat()
    • MaxQuant example: loads evidence, proteinGroups, and annotation.mq directly via MaxQtoMSstatsTMTFormat()
    • SpectroMine example: loads raw.mine and annotation.mine directly via SpectroMinetoMSstatsTMTFormat()
    • OpenMS example: loads raw.om directly via OpenMStoMSstatsTMTFormat()
    • Philosopher example: loads msstats.csv and MSstatsTMT_annotation.csv directly via PhilosophertoMSstatsTMTFormat()
    • Fixed protein identifier in dataProcessPlotsTMT example from P04406 to O75947
    • Corrected head() call reference to quant.msstats$ProteinLevelData

Metadata and Configuration

  • DESCRIPTION: Updated RoxygenNote from 7.3.2 to 7.3.3 (reflects roxygen2 version used for documentation generation)
  • .gitignore: Added duplicate .Rproj pattern entry (resulting in redundant ignore rule with existing *.Rproj entry)
  • MSstatsTMT.Rproj: Removed RStudio project configuration file

Testing

  • tests/testthat/test-converters.R: Entire test file removed (41 lines deleted). Deleted test suites included:

    • PDtoMSstatsTMTFormat error condition tests
    • MaxQtoMSstatsTMTFormat error condition tests
    • SpectroMinetoMSstatsTMTFormat error condition tests
    • PhilosophertoMSstatsTMTFormat error condition tests and functional integration test validating output has 550 rows

    Note: Converter tests have been consolidated in the upstream MSstatsConvert package rather than duplicated in MSstatsTMT. This represents a shift in testing responsibility to the dependency package.

Coding Guidelines

No violations of standard R package coding guidelines are present. The refactoring properly uses roxygen directives (@importFrom, @export) and follows R conventions for re-exporting functions from dependencies. However, two minor issues exist:

  • Redundant .gitignore entry: The addition of .Rproj pattern duplicates the existing *.Rproj pattern without adding new functionality
  • Test consolidation without documentation: The removal of all converter tests from MSstatsTMT without explicit documentation or changelog entry indicating that tests have been moved to MSstatsConvert may obscure test coverage from package users and reviewers

Motivation and Solution

This refactoring consolidates duplicate converter implementations by delegating to the upstream MSstatsConvert package. Five converter functions—MaxQtoMSstatsTMTFormat, OpenMStoMSstatsTMTFormat, PDtoMSstatsTMTFormat, SpectroMinetoMSstatsTMTFormat, and PhilosophertoMSstatsTMTFormat—are converted from local implementations with wrapper logic (logging, validation, preprocessing) to simple re-exports that directly reference their counterparts in MSstatsConvert. This eliminates code duplication, reduces maintenance burden, and centralizes conversion logic in a single package. Supporting infrastructure (documentation, tests, example datasets, and helper functions) are removed or relocated accordingly.

Detailed Changes

Core Converter Functions:

  • R/converters.R reduced from ~426 lines to ~18 lines by replacing function implementations with @importFrom directives and re-export stubs
  • Five converter functions now delegate entirely to MSstatsConvert implementations; local wrapper logic (logging, preprocessing, renaming) removed
  • Internal helper functions .documentFunction() and .getPhilosopherInput() removed from R/converters.R

Namespace and Imports:

  • NAMESPACE updated to import five converter functions from MSstatsConvert and removed imports for MSstatsBalancedDesign, MSstatsClean, MSstatsImport, MSstatsLogsSettings, MSstatsMakeAnnotation, and MSstatsPreprocess

Documentation:

  • Removed man pages for all five converters (MaxQtoMSstatsTMTFormat.Rd, OpenMStoMSstatsTMTFormat.Rd, PDtoMSstatsTMTFormat.Rd, SpectroMinetoMSstatsTMTFormat.Rd, PhilosophertoMSstatsTMTFormat.Rd)
  • Removed man pages for bundled example datasets (evidence.Rd, proteinGroups.Rd, raw.pd.Rd, raw.mine.Rd, raw.om.Rd, annotation.pd.Rd, annotation.mq.Rd, annotation.mine.Rd)
  • Removed man/dot-getPhilosopherInput.Rd (helper function documentation)
  • Updated man/dot-documentFunction.Rd to reference new location R/utils_docs.R and simplified usage signature
  • Added man/reexports.Rd to document imported converters from MSstatsConvert
  • Updated man/input.pd.Rd description to reflect it as example dataset rather than converter output

New/Modified Utility Files:

  • Created R/utils_docs.R containing internal .documentFunction() placeholder for roxygen parameter inheritance (returns NULL, no operational logic)
  • Removed extensive roxygen documentation and examples from R/utils_MSstatsTMT.R (~227 lines removed)

Example Data:

  • data/datalist reduced to retain only test.pairwise, input.pd, and quant.pd.msstats; removed bundled example datasets for Proteome Discoverer, MaxQuant, SpectroMine, and OpenMS

Vignette Updates:

  • vignettes/MSstatsTMT.Rmd refactored to load example inputs from MSstatsConvert via system.file() and data.table::fread() instead of bundled datasets
  • Fixed protein identifier from P04406 to O75947 in dataProcessPlotsTMT example
  • Corrected downstream head() reference to use quant.msstats$ProteinLevelData instead of quant.pd.msstats$ProteinLevelData

Build Configuration:

  • Removed MSstatsTMT.Rproj (RStudio project configuration file)
  • Updated DESCRIPTION RoxygenNote from 7.3.2 to 7.3.3
  • Added redundant .Rproj ignore pattern to .gitignore (duplicate of existing *.Rproj entry)

Unit Tests

  • Deleted tests/testthat/test-converters.R entirely, removing all local test coverage for PDtoMSstatsTMTFormat, MaxQtoMSstatsTMTFormat, SpectroMinetoMSstatsTMTFormat, and PhilosophertoMSstatsTMTFormat
  • Per PR description, converter tests were consolidated upstream in MSstatsConvert but this transition lacks explicit documentation in changelog

Coding Guideline Issues

  • .gitignore now contains a redundant duplicate entry: added .Rproj alongside the pre-existing *.Rproj pattern, resulting in identical ignore behavior declared twice
  • Removal of local converter tests without explicit changelog or documentation indicating tests have been moved upstream to MSstatsConvert

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 10, 2026

📝 Walkthrough

Walkthrough

This PR migrates converter implementations out of MSstatsTMT and replaces them with re-exports from MSstatsConvert, removes many local example datasets and their roxygen/man pages, relocates a small internal doc helper, updates vignette examples to load data from MSstatsConvert, and makes minor project metadata/gitignore edits.

Changes

Cohort / File(s) Summary
Converter migration & reexports
NAMESPACE, R/converters.R, man/reexports.Rd, man/MaxQtoMSstatsTMTFormat.Rd, man/OpenMStoMSstatsTMTFormat.Rd, man/PDtoMSstatsTMTFormat.Rd, man/PhilosophertoMSstatsTMTFormat.Rd, man/SpectroMinetoMSstatsTMTFormat.Rd
Removed local implementations of multiple converter functions and replaced with MSstatsConvert:: re-exports; updated NAMESPACE importFrom entries; deleted individual converter man pages and added consolidated reexports documentation.
Example datasets & docs removal
R/utils_MSstatsTMT.R, data/datalist, man/annotation.*.Rd, man/evidence.Rd, man/proteinGroups.Rd, man/raw.*.Rd
Deleted roxygen blocks and man pages for several example datasets and shortened data/datalist; removed example dataset documentation and dataset registrations from utils file.
Docs helper relocation
R/utils_docs.R, man/dot-documentFunction.Rd, man/dot-getPhilosopherInput.Rd
Added R/utils_docs.R with an internal .documentFunction() placeholder (returns NULL); moved .documentFunction doc source and removed .getPhilosopherInput documentation.
Tests & vignette updates
tests/testthat/test-converters.R, vignettes/MSstatsTMT.Rmd
Removed converter-focused tests; updated vignette to load example inputs from MSstatsConvert via system.file(...) and adjusted example calls and a protein identifier.
Project/config metadata
.gitignore, DESCRIPTION, MSstatsTMT.Rproj
Added duplicate *.Rproj pattern to .gitignore, bumped RoxygenNote from 7.3.2 to 7.3.3 in DESCRIPTION, and removed the MSstatsTMT.Rproj file.

Sequence Diagram(s)

sequenceDiagram
  participant User as User
  participant MSstatsTMT as MSstatsTMT\n(package)
  participant MSstatsConvert as MSstatsConvert\n(package)
  participant FS as FileSystem

  Note over MSstatsTMT,MSstatsConvert: Old flow (before this PR)
  User->>MSstatsTMT: call MaxQtoMSstatsTMTFormat(input, annotation, ...)
  MSstatsTMT->>FS: read input files / annotation
  MSstatsTMT->>MSstatsTMT: run local converter logic\n(filtering, cleaning, summarizing)
  MSstatsTMT->>FS: write/log output
  MSstatsTMT-->>User: return MSstatsTMT-formatted data

  Note over MSstatsTMT,MSstatsConvert: New flow (after this PR)
  User->>MSstatsTMT: call MaxQtoMSstatsTMTFormat(input, annotation, ...)
  MSstatsTMT->>MSstatsConvert: delegate to MSstatsConvert::MaxQtoMSstatsTMTFormat(...)
  MSstatsConvert->>FS: read input files / annotation
  MSstatsConvert->>MSstatsConvert: run converter logic\n(filtering, cleaning, summarizing)
  MSstatsConvert->>FS: write/log output
  MSstatsConvert-->>MSstatsTMT: return formatted data
  MSstatsTMT-->>User: return MSstatsTMT-formatted data
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰 Hopping lines of code anew,

Functions moved — a tidy view.
Re-exports snug in their new nest,
Datasets gone to take a rest.
I chew through docs and bound the rest! 🥕

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description check ✅ Passed The description is comprehensive and well-structured, covering motivation, detailed changes across multiple file categories, testing implications, and known minor issues. It matches the template structure with all required sections substantially completed.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Title check ✅ Passed The PR title clearly and concisely describes the main change: migrating converter functions from local implementations to re-exports from the MSstatsConvert package.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch refactor-converters

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
R/utils_MSstatsTMT.R (1)

20-23: ⚠️ Potential issue | 🟡 Minor

Remove or replace these dangling documentation links.

raw.pd and annotation.pd are not documented datasets in this package—they exist only as temporary objects created in the vignette. The \code{\link{raw.pd}} and \code{\link{annotation.pd}} references in the input.pd help topic will produce broken cross-references. Either drop the links and use plain code formatting (e.g., \code{raw.pd}) or verify that equivalent documented objects exist in the MSstatsConvert package and update the links accordingly.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@R/utils_MSstatsTMT.R` around lines 20 - 23, The roxygen examples in
utils_MSstatsTMT.R reference temporary vignette objects using dangling links
\code{\link{raw.pd}} and \code{\link{annotation.pd}}; update the documentation
for the input.pd/help topic by replacing those broken link forms with plain code
formatting (e.g., \code{raw.pd} and \code{annotation.pd}) or, if equivalents are
documented in MSstatsConvert, change to package-qualified links (e.g.,
\link[MSstatsConvert]{raw.pd}) so the cross-references resolve; make this change
where PDtoMSstatsTMTFormat and input.pd are referenced to remove the broken
links.
🧹 Nitpick comments (1)
.gitignore (1)

9-9: Remove duplicate ignore rule.

*.Rproj is already listed at Line 7, so duplicating it at Line 9 adds noise without changing behavior.

Proposed cleanup
 *.Rproj
 *.DS_Store
-*.Rproj
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.gitignore at line 9, Remove the duplicate ignore rule for the pattern
"*.Rproj" in .gitignore: locate the duplicate entry (the second occurrence
matching "*.Rproj") and delete it so the pattern only appears once, leaving the
original "*.Rproj" entry intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@data/datalist`:
- Line 2: The roxygen documentation in R/utils_MSstatsTMT.R still points to
now-removed files raw.pd and annotation.pd causing broken links while
data/datalist still lists input.pd; update the roxygen block(s) in
R/utils_MSstatsTMT.R that mention raw.pd and annotation.pd to instead reference
the new MSstatsConvert-based source files (or remove the link tags altogether),
and ensure any mention of input.pd matches the current data/datalist entry so
the generated docs no longer include dead links.

In `@R/utils_docs.R`:
- Around line 5-24: The roxygen `@param` block documents logging parameters but
the .documentFunction formals still list legacy converter arguments, causing
mismatched usage/arguments in generated Rd files and broken `@inheritParams` for
callers; fix by aligning them: either (A) replace the .documentFunction formals
(fewMeasurements, useUniquePeptide, summaryforMultipleRows,
removeProtein_with1Feature, removeProtein_with1Protein,
removeOxidationMpeptides, removeMpeptides) with the documented logging formals
(use_log_file, append, verbose, log_file_path) and update any internal
references or callers to the new names, or (B) restore explicit `@param` entries
for each legacy formal so documentation matches the actual formals—choose one
approach and make the code+roxygen consistent for .documentFunction and any
functions using `@inheritParams` .documentFunction.

---

Outside diff comments:
In `@R/utils_MSstatsTMT.R`:
- Around line 20-23: The roxygen examples in utils_MSstatsTMT.R reference
temporary vignette objects using dangling links \code{\link{raw.pd}} and
\code{\link{annotation.pd}}; update the documentation for the input.pd/help
topic by replacing those broken link forms with plain code formatting (e.g.,
\code{raw.pd} and \code{annotation.pd}) or, if equivalents are documented in
MSstatsConvert, change to package-qualified links (e.g.,
\link[MSstatsConvert]{raw.pd}) so the cross-references resolve; make this change
where PDtoMSstatsTMTFormat and input.pd are referenced to remove the broken
links.

---

Nitpick comments:
In @.gitignore:
- Line 9: Remove the duplicate ignore rule for the pattern "*.Rproj" in
.gitignore: locate the duplicate entry (the second occurrence matching
"*.Rproj") and delete it so the pattern only appears once, leaving the original
"*.Rproj" entry intact.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 2a9c08f1-a0a5-4491-9106-1ffb79fa7b55

📥 Commits

Reviewing files that changed from the base of the PR and between e46b5e0 and 96ef236.

⛔ Files ignored due to path filters (2)
  • inst/raw_data/Philosopher/MSstatsTMT_annotation.csv is excluded by !**/*.csv
  • inst/raw_data/Philosopher/msstats.csv is excluded by !**/*.csv
📒 Files selected for processing (34)
  • .gitignore
  • DESCRIPTION
  • MSstatsTMT.Rproj
  • NAMESPACE
  • R/converters.R
  • R/utils_MSstatsTMT.R
  • R/utils_docs.R
  • data/annotation.mine.rda
  • data/annotation.mq.rda
  • data/annotation.pd.rda
  • data/datalist
  • data/evidence.rda
  • data/proteinGroups.rda
  • data/raw.mine.rda
  • data/raw.om.rda
  • data/raw.pd.rda
  • man/MaxQtoMSstatsTMTFormat.Rd
  • man/OpenMStoMSstatsTMTFormat.Rd
  • man/PDtoMSstatsTMTFormat.Rd
  • man/PhilosophertoMSstatsTMTFormat.Rd
  • man/SpectroMinetoMSstatsTMTFormat.Rd
  • man/annotation.mine.Rd
  • man/annotation.mq.Rd
  • man/annotation.pd.Rd
  • man/dot-documentFunction.Rd
  • man/dot-getPhilosopherInput.Rd
  • man/evidence.Rd
  • man/proteinGroups.Rd
  • man/raw.mine.Rd
  • man/raw.om.Rd
  • man/raw.pd.Rd
  • man/reexports.Rd
  • tests/testthat/test-converters.R
  • vignettes/MSstatsTMT.Rmd
💤 Files with no reviewable changes (16)
  • MSstatsTMT.Rproj
  • man/MaxQtoMSstatsTMTFormat.Rd
  • man/PDtoMSstatsTMTFormat.Rd
  • man/dot-getPhilosopherInput.Rd
  • man/raw.pd.Rd
  • man/annotation.mine.Rd
  • man/proteinGroups.Rd
  • man/PhilosophertoMSstatsTMTFormat.Rd
  • tests/testthat/test-converters.R
  • man/annotation.mq.Rd
  • man/raw.mine.Rd
  • man/raw.om.Rd
  • man/annotation.pd.Rd
  • man/OpenMStoMSstatsTMTFormat.Rd
  • man/evidence.Rd
  • man/SpectroMinetoMSstatsTMTFormat.Rd

Comment thread data/datalist
Comment thread R/utils_docs.R Outdated
@tonywu1999 tonywu1999 changed the title Refactor converters refactor(converters): Migrate converters to MSstatsConvert Apr 16, 2026
@tonywu1999 tonywu1999 merged commit 1bfcd15 into devel Apr 16, 2026
2 checks passed
@tonywu1999 tonywu1999 deleted the refactor-converters branch April 16, 2026 22:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant