Skip to content

Add multiclass classification (OVO, OVA) #52

@eprifti

Description

@eprifti

Context

gpredomics currently supports binary classification only (2 classes: 0 vs 1). Many clinical and biological problems involve multiple classes (e.g., disease subtypes, treatment response categories, multiple conditions).

Proposed approaches

One-vs-All (OVA / OVR)

  • Train K binary classifiers, each separating one class from all others
  • At prediction time, assign the class with the highest score/confidence
  • Pros: simple, only K models needed, each model is a standard gpredomics binary model (fully interpretable)
  • Cons: class imbalance (one class vs all others); models are not calibrated against each other; may produce ambiguous regions where multiple classifiers predict positive
  • Implementation: can be orchestrated externally (run gpredomics K times with relabeled y), or integrated into the engine for convenience

One-vs-One (OVO)

  • Train K×(K-1)/2 binary classifiers, one for each pair of classes
  • At prediction time, each classifier votes; assign the class with the most votes
  • Pros: each pairwise classifier sees balanced sub-problems; often better separation
  • Cons: quadratic number of models; voting ties possible; harder to interpret the ensemble
  • Implementation: more complex orchestration; needs a voting/aggregation layer

Comparison

Approach # Models Balance Interpretability Complexity
OVA K Imbalanced High (each model is standalone) Low
OVO K(K-1)/2 Balanced Medium (ensemble of pairwise models) Medium

Design considerations

  • Jury integration: The existing voting/jury system could potentially be reused for combining OVO classifiers
  • Feature importance: How to aggregate feature importance across multiple binary models
  • Cross-validation: CV should maintain class proportions across all classes (stratified K-fold)
  • param.yaml: Need a new parameter for multiclass strategy (multiclass: ova or multiclass: ovo)
  • Output format: Results should show per-class metrics (sensitivity, specificity, etc.) and a confusion matrix

Related work

  • predomicsmc — existing multiclass extension via Predomics (R implementation)
  • scikit-learn's OneVsRestClassifier / OneVsOneClassifier as design reference

Suggested implementation path

  1. Start with OVA — simpler, each sub-model is a standard gpredomics run
  2. Add OVO as an option later
  3. Consider whether multiclass should be a core engine feature or an orchestration layer (wrapper script / predomicsapp-web feature)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions