Skip to content

imspy-predictors: NCE calibration + native fine-tuning#395

Merged
theGreatHerrLebert merged 5 commits into
mainfrom
fix/nce-calibration
May 18, 2026
Merged

imspy-predictors: NCE calibration + native fine-tuning#395
theGreatHerrLebert merged 5 commits into
mainfrom
fix/nce-calibration

Conversation

@theGreatHerrLebert
Copy link
Copy Markdown
Owner

Merges the fix/imspy-predictors-im-refit work plus the NCE calibration fix.

NCE calibration (3302d070)

  • New calibrate_nce(): sweeps an absolute NCE over high-confidence target PSMs and returns the value maximizing mean spectral angle — one NCE per run. The model conditions on a per-run NCE scalar (fine-tuned on collision_energy_aligned_normed), so calibration is an absolute sweep, not an offset on the observed collision energy.
  • predict_intensities_prosit uses calibrate_nce; sets collision_energy_calibrated to the absolute best NCE.
  • get_collision_energy_calibration_factor kept as a deprecated compat wrapper; fixes the bug where it calibrated on the unmodified sequence while the real prediction used sequence_modified.

Fine-tuning fixes (9981465c, 86857fba, f7ac8805, e9c6c18e)

  • Native intensity fine-tuning; per-epoch fine-tune history; out-of-range charge filtering in IM simulate/fine-tune; fine-tuning output fixes.

simulate_ion_mobilities builds a charge one-hot with num_classes=4
(charges 1..4). Inputs outside that range trigger F.one_hot's index
assertion — silent on CPU, hard CUDA crash on GPU
(ScatterGatherKernel idx_dim < index_size).

Filter such rows before the one_hot, predict on the valid subset, and
NaN-pad invalid positions in the output array. Emit a RuntimeWarning
so callers see how many were skipped. Charges of 5+ leak through sage
matching at ~0.15% on HeLa data even with precursor_charge=[2,4],
which is enough to take down a GPU run.
Three small additions to the fine-tune training loops in
imspy_predictors:

- rt/predictors.py: also accumulate train_loss across train batches,
  store {epochs, train_loss, val_loss} on self._finetune_history.
- ccs/predictors.py: same for CCS/IM fine_tune_model. Also drop
  training rows whose charge is outside the model's [1, 4] one-hot
  domain before constructing the charge tensor; same root cause as the
  earlier simulate_ion_mobilities filter (CUDA assertion on charge=5+
  PSMs that leak through sage matching).
- intensity/predictors.py: same train_loss accumulation and history
  capture for the native intensity fine-tune loop.

The history dict is the shape sagepy-rescore's report.py expects so it
can render per-head loss curves and improvement-vs-epoch-0 panels for
the sagepy-rescore HTML report.
Add calibrate_nce(): sweeps absolute NCE over high-confidence target PSMs
and returns the value maximizing mean spectral angle -- one NCE per run.
The model conditions on a per-run NCE scalar (fine-tuned on
collision_energy_aligned_normed, domain ~7-43), so calibration must be an
absolute sweep, not an offset added to the observed collision energy.

- predict_intensities_prosit: use calibrate_nce; set collision_energy_calibrated
  to the absolute best NCE instead of collision_energy + offset.
- get_collision_energy_calibration_factor: kept as a deprecated compat wrapper
  over calibrate_nce. Fixes the bug where it calibrated on the unmodified
  sequence while the real prediction used sequence_modified.
@theGreatHerrLebert theGreatHerrLebert merged commit f4c06ea into main May 18, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant