Split GSVA ranks calculations, enabling distribution of computations#267
Conversation
…). Added corresponding unit tests
… a range of rows in gsvaRowNorm(), or columns in gsvaColRanks() and gsvaColScores(). Added corresponding unit tests
There was a problem hiding this comment.
Pull request overview
Splits the GSVA rank calculation into separate gsvaRowNorm(), gsvaColRanks(), and gsvaColScores() methods (deprecating gsvaRanks()/gsvaScores()), each accepting first/last arguments so the row-normalization and column-rank/score stages can be distributed across HPC nodes. Supporting changes refactor the wrapData() generic and data containers to carry GSVA metadata and added assays (e.g., gsvarownr, gsvaranks), expand .check_maxmem()/.check_ondisk()/.check_sparse_load_input_expr() to honor restricted ranges, update HDF5 serialization to round-trip the new metadata, and add a details() method while reworking show(). Documentation, vignettes and unit tests are adjusted accordingly.
Changes:
- Introduce three-step API (
gsvaRowNorm/gsvaColRanks/gsvaColScores) withfirst/lastchunking anddropExistingAssayssupport; markgsvaRanks/gsvaScoresdeprecated. - Refactor
wrapData()methods, helper checks, and.pull_param()/.gsvaParam_as_list()to embed/retrieve parameter metadata across containers and to size memory based on restricted ranges. - Update vignettes,
Rdfiles, tests, and HDF5 serialization helpers to use the new API and to adddetails()method.
Reviewed changes
Copilot reviewed 29 out of 38 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| R/gsva.R | Adds new step methods, helpers .pull_param/.gsvaParam_as_list, details(), and chunked first/last plumbing. |
| R/utils.R | Generalizes wrapData methods, adds range/first-last and whdim plumbing in .check_* helpers. |
| R/gsvaRanks_serialization.R | Reworks save/load to operate on GsvaExprData with embedded GSVA metadata. |
| R/GSVA-pkg-deprecated.R | Re-implements deprecated gsvaRanks/gsvaScores via new helpers. |
| R/ssgsea.R, R/plage.R, R/zscore.R | Updates gsva() methods to call the new .check_* and wrapData() signatures; adds details for ssgsea. |
| R/GsvaMethodParam.R, R/AllGenerics.R, R/AllClasses.R | Adds new generics/exports, simplified show(), details() method, updates supported classes. |
| R/GSVA-package.R, NAMESPACE | Adjusts imports/exports for new methods and dropped re-exports. |
| man/*.Rd | Regenerated docs for renamed/new methods and link fixes. |
| vignettes/GSVA_scRNAseq.Rmd, GSVA_proteomics.Rmd, GSVA.bib | Documents three-step API, QC workflow updates, new reference. |
| inst/unitTests/* | Updates tests to new API and adds chunked-range checks. |
| DESCRIPTION, .covrignore | Roxygen config bump and coverage exclusions for deprecated/defunct files. |
Files not reviewed (9)
- man/GSVA-pkg-deprecated.Rd: Language not supported
- man/GsvaExprData-class.Rd: Language not supported
- man/GsvaMethodParam-class.Rd: Language not supported
- man/geneIdsToGeneSetCollection.Rd: Language not supported
- man/geneSets.Rd: Language not supported
- man/gsva.Rd: Language not supported
- man/gsvaAnnotation.Rd: Language not supported
- man/gsvaEnrichment.Rd: Language not supported
- man/gsvaParam-class.Rd: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Split the calculation of GSVA ranks into two steps, one for row normalization and another for column rank calculation. The functions
gsvaRanks()andgsvaScores()have been deprecated and replaced bygsvaRowNorm(),gsvaColRanks()andgsvaColScores(). Each of them takes two additional parameters calledfirstandlast, which allow one to restrict calculations to the range of rows (forgsvaRowNorm()) or columns (forgsvaColRanks()andgsvaColScores()) that go fromfirsttolast. By default,first=NAandlast=NAto indicate no such restrictions apply. Being able to restrict calculations to a range of rows or columns, should enable distributing computations in an HPC environment across nodes with non-shared main memory. This PR closes issue #266.