Skip to content

Add Optuna-based hyperparameter optimization in gpredomicspy #77

@eprifti

Description

@eprifti

Context

gpredomics has many tunable parameters (k_penalty, population_size, cooling_rate, etc.) that significantly affect model quality. Manual tuning is tedious and suboptimal. Optuna provides efficient Bayesian hyperparameter optimization.

Design

Implement in gpredomicspy (Python layer) as gpredomicspy.optimize():

import gpredomicspy as gp

# Define search space
search_space = {
    "algo": ["ga", "beam", "sa", "ils", "lasso"],
    "k_penalty": (1e-5, 0.01, "log"),
    "language": ["ter", "bin,ter", "bin,ter,ratio"],
    "data_type": ["prev", "raw", "raw,prev"],
    "population_size": (500, 10000),
    "cooling_rate": (0.99, 0.9999),
    "feature_minimal_prevalence_pct": (5, 30),
    "feature_maximal_adj_pvalue": (0.01, 0.1),
}

results = gp.optimize(
    base_param="param.yaml",
    search_space=search_space,
    n_trials=100,
    metric="test_auc",        # or "fit", "spearman", etc.
    direction="maximize",
    cv=True,                  # use CV for robust evaluation
    n_jobs=4,                 # parallel trials
)

print(results.best_params)
print(results.best_value)
results.plot_importance()     # which params matter most

Parameters worth optimizing

Category Parameter Type Range
Regularization k_penalty log-float [1e-5, 0.1]
Regularization fr_penalty float [0, 1]
Regularization bias_penalty float [0, 1]
Algorithm algo categorical ga/beam/sa/ils/lasso/aco
Data language categorical ter/bin/ratio combos
Data data_type categorical raw/prev/log combos
Feature selection prevalence_pct float [5, 50]
Feature selection max_adj_pvalue float [0.01, 1.0]
GA population_size int [500, 10000]
GA max_epochs int [50, 500]
GA mutated_children_pct float [50, 95]
SA cooling_rate float [0.99, 0.9999]
SA max_iterations int [1000, 50000]
ACO n_ants int [50, 1000]
ACO alpha/beta float [0.5, 5.0]
ACO rho float [0.01, 0.5]

Web app integration

Add a "Tune" button in ParametersTab that:

  1. Opens a modal with parameter search space configuration
  2. Runs Optuna in the background (via worker.py)
  3. Shows convergence plot + parameter importance
  4. "Apply best" button to set optimal parameters

Dependencies

  • optuna (pip install optuna)
  • optuna-dashboard (optional, for web visualization)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions