imspy-predictors: NCE calibration + native fine-tuning#395
Merged
Conversation
simulate_ion_mobilities builds a charge one-hot with num_classes=4 (charges 1..4). Inputs outside that range trigger F.one_hot's index assertion — silent on CPU, hard CUDA crash on GPU (ScatterGatherKernel idx_dim < index_size). Filter such rows before the one_hot, predict on the valid subset, and NaN-pad invalid positions in the output array. Emit a RuntimeWarning so callers see how many were skipped. Charges of 5+ leak through sage matching at ~0.15% on HeLa data even with precursor_charge=[2,4], which is enough to take down a GPU run.
Three small additions to the fine-tune training loops in
imspy_predictors:
- rt/predictors.py: also accumulate train_loss across train batches,
store {epochs, train_loss, val_loss} on self._finetune_history.
- ccs/predictors.py: same for CCS/IM fine_tune_model. Also drop
training rows whose charge is outside the model's [1, 4] one-hot
domain before constructing the charge tensor; same root cause as the
earlier simulate_ion_mobilities filter (CUDA assertion on charge=5+
PSMs that leak through sage matching).
- intensity/predictors.py: same train_loss accumulation and history
capture for the native intensity fine-tune loop.
The history dict is the shape sagepy-rescore's report.py expects so it
can render per-head loss curves and improvement-vs-epoch-0 panels for
the sagepy-rescore HTML report.
Add calibrate_nce(): sweeps absolute NCE over high-confidence target PSMs and returns the value maximizing mean spectral angle -- one NCE per run. The model conditions on a per-run NCE scalar (fine-tuned on collision_energy_aligned_normed, domain ~7-43), so calibration must be an absolute sweep, not an offset added to the observed collision energy. - predict_intensities_prosit: use calibrate_nce; set collision_energy_calibrated to the absolute best NCE instead of collision_energy + offset. - get_collision_energy_calibration_factor: kept as a deprecated compat wrapper over calibrate_nce. Fixes the bug where it calibrated on the unmodified sequence while the real prediction used sequence_modified.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Merges the
fix/imspy-predictors-im-refitwork plus the NCE calibration fix.NCE calibration (
3302d070)calibrate_nce(): sweeps an absolute NCE over high-confidence target PSMs and returns the value maximizing mean spectral angle — one NCE per run. The model conditions on a per-run NCE scalar (fine-tuned oncollision_energy_aligned_normed), so calibration is an absolute sweep, not an offset on the observed collision energy.predict_intensities_prositusescalibrate_nce; setscollision_energy_calibratedto the absolute best NCE.get_collision_energy_calibration_factorkept as a deprecated compat wrapper; fixes the bug where it calibrated on the unmodified sequence while the real prediction usedsequence_modified.Fine-tuning fixes (
9981465c,86857fba,f7ac8805,e9c6c18e)