Native Rust hyperparameter optimization — Optuna-style without reloading data

## Context

Issue #77 proposed Optuna in Python (gpredomicspy). But Python-level optimization reloads data from disk for every trial — wasteful when the dataset is large (e.g., wetlab 1981×918 matrix).

A native Rust implementation would:
1. Load data once
2. Run hundreds of parameter trials in-memory
3. Use the same feature selection cache across trials
4. Be orders of magnitude faster than Python subprocess per trial

## Design

### Core: `optimize()` function in lib.rs

```rust
pub fn optimize(
    data: &Data,
    base_param: &Param,
    search_space: &SearchSpace,
    n_trials: usize,
    metric: OptMetric,        // TestAUC, TestSpearman, CVMeanAUC, etc.
    sampler: Sampler,         // TPE (default), Random, Grid
) -> OptResult {
    // Data is loaded ONCE, shared across all trials
    for trial in 0..n_trials {
        let param = sampler.suggest(&search_space, &history);
        let result = run_trial(data, &param);  // no disk I/O
        history.push(trial, param, result);
    }
    OptResult { best_params, best_value, history }
}
```

### Sampler options

1. **Random** — uniform random sampling (baseline)
2. **TPE (Tree-structured Parzen Estimator)** — Optuna's default, Bayesian
3. **Grid** — exhaustive grid search for small spaces
4. **CMA-ES** — covariance matrix adaptation for continuous params

### Search space definition (in param.yaml)

```yaml
optimize:
  n_trials: 100
  metric: test_auc           # or cv_mean_auc, spearman, etc.
  sampler: tpe
  search_space:
    algo: [ga, beam, sa, ils, lasso]
    k_penalty: {log_uniform: [1e-5, 0.01]}
    language: [ter, "bin,ter", "bin,ter,ratio"]
    data_type: [prev, raw, "raw,prev"]
    population_size: {int: [500, 10000]}
    cooling_rate: {uniform: [0.99, 0.9999]}
    feature_minimal_prevalence_pct: {int: [5, 30]}
```

### Key advantages over Python Optuna

| | Python Optuna (#77) | Native Rust |
|---|---|---|
| Data loading | Once per trial (subprocess) | **Once total** |
| Feature selection | Recomputed per trial | **Cached** |
| Overhead per trial | ~2s (process spawn + data I/O) | **~0ms** |
| 100 trials on Qin2014 | ~200s + algo time | **~algo time only** |
| Parallelism | Python GIL limited | Full rayon parallelism |

### Implementation phases

1. **Random sampler + grid** — simplest, proves the architecture
2. **TPE sampler** — port the core algorithm (kernel density estimation)
3. **Pruning** — early stopping of unpromising trials (median pruner)
4. **CLI integration** — `gpredomics --optimize param.yaml`
5. **Web app** — "Tune" button that calls optimize() via gpredomicspy

### References

- Akiba et al. (2019). Optuna: A Next-generation Hyperparameter Optimization Framework. KDD.
- Bergstra et al. (2011). Algorithms for Hyper-Parameter Optimization. NeurIPS.
- TPE: Tree-structured Parzen Estimator (Bergstra et al., 2011)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Native Rust hyperparameter optimization — Optuna-style without reloading data #83

Context

Design

Core: `optimize()` function in lib.rs

Sampler options

Search space definition (in param.yaml)

Key advantages over Python Optuna

Implementation phases

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

	Python Optuna (#77)	Native Rust
Data loading	Once per trial (subprocess)	Once total
Feature selection	Recomputed per trial	Cached
Overhead per trial	~2s (process spawn + data I/O)	~0ms
100 trials on Qin2014	~200s + algo time	~algo time only
Parallelism	Python GIL limited	Full rayon parallelism

Native Rust hyperparameter optimization — Optuna-style without reloading data #83

Description

Context

Design

Core: optimize() function in lib.rs

Sampler options

Search space definition (in param.yaml)

Key advantages over Python Optuna

Implementation phases

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Core: `optimize()` function in lib.rs