Merge feature/predictor-finetune: native Chronologer RT adapter + per-task fine-tuning#397
Merged
Merged
Conversation
…tims_read_pasef_msms_for_frame_v2 Adds Rust wrappers around two previously-unexposed Bruker SDK functions: * tims_extract_centroided_spectrum_for_frame_v2(handle, frame_id, scan_lo, scan_hi, callback, user_data) — returns Bruker's built-in centroided peak list (m/z, intensity) for one (frame, scan range) tile. This is the function DiaTracer uses to start with peaks rather than raw events. * tims_read_pasef_msms_for_frame_v2 — DDA-PASEF per-frame fragment reader; included for future DDA work, not used by the DIA dump. Signatures recovered from gtluu/pyTDFSDK init_tdf_sdk.py + the MSMS_SPECTRUM_FUNCTOR / MSMS_SPECTRUM_FUNCTION callback typedefs from ctypes_data_structures.py. Callback marshalling is done through a process-level Mutex + Option<Vec<...>> trampoline since Rust closures don't compose with libloading symbols + C function pointers. Adds rustdf/examples/dump_bruker_centroids.rs — a one-shot CLI that reads dia_ms_ms_windows + dia_ms_ms_info from analysis.tdf, then extracts a centroided spectrum per (MS2 frame, DIA quad scan-range) and writes a TSV with the peak arrays. Smoke test on 10 MS2 frames of O240206: * 26 extract calls * 838 peaks per call (mean) * 324 calls/s → full-file estimate (29k calls) lands ~90 seconds vs ~50 minutes for the current event-clustering pseudo-assembly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 1.0 of the calibrated-library pipeline: take q ≤ 0.01 PSMs from a
spectrum-centric DIA-PASEF rescore, fine-tune the pretrained RT predictor
(UnifiedPeptideModel via DeepChromatographyApex) to predict OBSERVED RT
in minutes for THIS run, measure RT MAE on a peptide-level holdout
(20% of unique sequences, seeded).
Smoke test on real O240206 (26,855 anchor PSMs, 21,484 train / 5,371
holdout, RTX 5090, ~30s wall):
baseline (linearly projected pretrained): 1.535 min MAE
fine-tuned: 0.425 min MAE
delta: -1.11 min (-72%)
Held-out peptides only — sequences the predictor never saw during
fine-tuning. Shows the calibration hypothesis works: q ≤ 0.01 PSMs
from a single run are usable training data for a per-run RT model
that generalises within the same instrument run.
Implementation notes:
- Joins rescored_canonical.csv (observed `rt`) with rescored_canonical.tdc.csv
(q_value + decoy) on (spec_idx, match_idx). Aggregates per peptide
via mean observed RT.
- Bypasses the legacy DeepChromatographyApex.fine_tune_model() — that
method calls bare `self.model(tokens)` which on UnifiedPeptideModel
returns a dict, breaking l1_loss. The script's custom_finetune_loop()
calls model.predict_rt(tokens) which extracts the 'rt' tensor.
- Supervision target is observed `rt` in minutes (not
retention_time_projected, which is sage's projection into ITS
predictor space — different scale than imspy_predictors' RT model).
- Pre-fine-tune baseline projects pretrained predictions linearly onto
observed RT via least-squares so MAE measures prediction quality
rather than the scale offset between the two model output spaces.
Output: rt_finetuned.pt (state_dict + metrics + args + pretrained
state_dict for diff) + metrics.json.
Next: Phase 1.1 expands to multi-task (RT + CCS + intensity) on
UnifiedPeptideModel, single state_dict per run.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Joint fine-tune of RT + CCS heads on UnifiedPeptideModel using one
DIA-PASEF run's q ≤ 0.01 PSMs. Same encoder is shared so a single
training pass calibrates both heads against this run's anchor set.
Smoke test on real O240206 (26,855 anchor peptides, 21,484 train /
5,371 holdout, RTX 5090, ~30 s wall, 30 epochs):
RT baseline 1.535 min fine-tuned 0.468 min -69% ✓
CCS Ų baseline 31.22 fine-tuned 30.14 -3.5% marginal
1/K0* baseline 0.0655 fine-tuned 0.0708 +8% (worse)
*predicted CCS converted via Mason-Schamp (ccs_to_one_over_k0).
Diagnosis: RT calibrates cleanly, matching the Phase 1.0 standalone
result. The CCS head's physics-informed SquareRootProjectionLayer
encodes a strong inductive bias for CCS in Ų; fine-tune on observed
CCS makes ~3% headway in 30 epochs, but predicted-CCS → 1/K0 round-trip
gets slightly worse than just linearly projecting the pretrained CCS
onto observed-CCS. Net read: linear projection of pretrained CCS is
already a strong baseline for library use; meaningful CCS fine-tune
would need either much longer training (~100s of epochs) or a
1/K0-native head that drops the SquareRootProjection physics layer.
Implementation notes:
- Side-loads observed 1/K0 from <stem>.pseudo.bin's env_apex_scan +
TimsDataset.scan_to_inverse_mobility (one-frame LUT). The rescore
CSV's ims/predicted_ims columns are zero — sage isn't passed
inverse_ion_mobility from build_query in rescore_canonical.py.
Right long-term fix is upstream (1-line plumbing in
rescore_canonical.py); side-load avoids re-running rescore.
- Supervises CCS in Ų (head's native scale); converts predicted
CCS → 1/K0 via imspy_core.chemistry.mobility.ccs_to_one_over_k0_par
for library-relevant reporting.
- Saves both pretrained + fine-tuned state_dicts plus the linear
projection params (rt_projection, ccs_projection) to the .pt.
Library-time inference: load checkpoint, predict, optionally
apply the projection (esp. for CCS where the fine-tune is marginal).
Phase 1.1 result: RT calibration is publication-grade. CCS + linear
projection is sufficient for library generation; chasing the extra
3-5% from a real CCS fine-tune is not worth the architecture work
now. Pivoting to Phase 1.2 (fragment intensity) — that's where the
biggest peptide-centric search lift comes from.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…lti-task fix
Three new scripts for the per-run calibration pipeline (Phase 1.1' / 1.2),
plus a data-prep bug fix on the existing multi-task script.
NEW finetune_dia_pasef_ccs.py (Phase 1.1' standalone CCS diagnostic)
- Side-loads observed 1/K0 from .pseudo.bin's env_apex_scan +
TimsDataset.scan_to_inverse_mobility (one-frame LUT)
- Converts 1/K0 → CCS Ų via Mason-Schamp (one_over_k0_to_ccs_par)
— supervises the CCS head in its native output scale, then converts
predicted CCS → 1/K0 at inference for library use.
- Aggregates per-(peptide, charge) — CCS depends on charge, peptides
observed at z=2 and z=3 carry distinct CCS values; mixing them
via per-sequence aggregation gives meaningless supervision.
- Smoke test on real O240206 (26,676 anchor pairs, 21,341 train /
5,335 hold, RTX 5090, 45 epochs, ~52s wall):
CCS Ų baseline+proj 13.24 → fine-tuned 6.85 −48.3%
1/K0 baseline+proj 0.0269 → fine-tuned 0.0139 −48.4%
z=2: 8.65 → 4.41 Ų MAE (publication-grade per-instrument calibration)
z=3: 24.34 → 12.74 Ų MAE
- SquareRootProjectionLayer slopes/intercepts moved meaningfully
(z=2 slope 12.96→10.58, z=3 slope 15.61→16.17) — confirms the
physics-prior is trainable when targets are correct.
NEW finetune_dia_pasef_intensity.py (Phase 1.2 — fragment intensity)
- Reads the per-PSM rescored_canonical.fragments.parquet (sage's
annotate_matches dump, persisted by patched rescore_canonical.py)
- Encodes each PSM's fragments → canonical Prosit 174-dim layout
(29 positions × {y,b} × {+1,+2,+3}) via intensity_target_encoder
- Fine-tunes UnifiedPeptideModel intensity head w/ masked_spectral_distance
- PSM-level training units (NOT aggregated) — fragmentation depends
on charge + CE; sequence-level aggregation would conflate them
- Holdout by sequence (no leak), eval via spectral angle similarity
- Smoke test on real O240206 (26,676 PSMs, 20 epochs, ~26s wall):
Spectral angle baseline 0.3797 → fine-tuned 0.6199 +63%
median 0.3736 → 0.6482
- End-to-end re-rescore with fine-tuned weights (set INTENSITY_WEIGHTS_PATH
env var on rescore_canonical.py): 26,676 → 27,773 peptides @ 1% FDR
(+1,097, +4.1%), decoy fraction 0.0099 unchanged.
NEW intensity_target_encoder.py
- Library + CLI sanity-check entry point. Encoder follows the verified
mapping (verified by physics on real data, see history):
sagepy observed_fragments_map() key = (ion_int, frag_charge, ordinal)
ion_int 0 → b, ion_int 1 → y
The user explicitly flagged this class of bug as the "ugly hotspot"
where a silent swap survives undetected through the loss curve.
- Sanity checks (s1/s2/s3) all PASSED on the real parquet:
s1: round-trip encode→decode matches input set on 5 PSMs
s2: distribution of fill rate per (ord, ion, charge) follows
biology (peak at ord 3-5, tail to ord 25, 0% at ord 29)
s3: spot check on short/mid/long peptides — max emitted ordinal
equals peptide L-1 (no fragments past peptide length)
MODIFIED finetune_dia_pasef_multi.py — bug fix + reframe as DIAGNOSTIC
- Per-(peptide, charge) aggregation instead of per-sequence (CCS bug)
- Holdout split by unique sequence (not row index) to avoid same
peptide leaking across train/test at different charges
- Docstring rewritten to flag the ARCHITECTURAL LIMITATION discovered
here: off-the-shelf checkpoints (rt/, ccs/, intensity/) are each
pre-trained standalone, with their own encoder weights paired
with their own head. Loading from rt/best_model.pt with
tasks=['rt','ccs'] gets a fresh-init CCS head (encoder + RT head
only loaded). Linear projection then fits a random head's output
to observed CCS for a ~30 Ų baseline — vs the 13 Ų baseline the
standalone CCS script gets from ccs/best_model.pt where encoder +
CCS head are correctly paired.
- Production calibration path is the 3 per-task scripts; multi-task
remains for future research (joint-pretrained encoder).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase A (joint_init_baseline.py): does merging the 3 pretrained heads
into one UnifiedPeptideModel regress per-task baselines? Empirical test
across 3 base-encoder choices on real O240206 holdout:
base RT MAE Δ% CCS Ų Δ% Intensity SA Δ%
intensity +133.1% -1.9% -0.1%
rt +1.9% +2.0% -0.2% ← winner
ccs +114.5% +0.0% -0.0%
RT is the picky one — only the rt-paired encoder gives non-degraded RT
baseline. CCS (anchored by SquareRootProjectionLayer physics prior) and
intensity (anchored by charge + collision-energy inputs) are nearly
encoder-insensitive. Joint init from rt/best_model.pt + swapped CCS &
intensity heads → all three baselines within 2% of standalone. Joint IS
viable.
Phase B (finetune_dia_pasef_joint.py): 5-flavor fine-tune ablation on
the joint init. None match per-task on all three metrics:
flavor RT min CCS Ų 1/K0 SA
Per-task reference 0.43 6.85 0.0139 0.620
B1 heads-only encoder-frozen 0.971 11.20 0.0233 0.5448
B2 joint static weights 0.427 7.54 0.0153 0.5360
B3 split LR (heads/encoder 100×) 0.537 10.12 0.0207 0.5515
B4 uncertainty weighting (Kendall) 0.496 6.83 0.0138 0.5784
B5 sequential intensity → joint 0.582 9.10 0.0185 0.5610
B4 is the best joint flavor — matches CCS exactly (6.83 vs 6.85) and
1/K0 (0.0138 vs 0.0139), but loses RT by 15% and SA by 7%. SA is the
metric most directly tied to library-free search performance (intensity
features drive sage's discriminator), so a 7% gap projects to ~30-50%
loss of the +1,097 peptide gain we measured from intensity fine-tune.
Why intensity always loses in joint: 174-dim convolutional decoder is
the most encoder-hungry head; joint distributes encoder capacity across
three tasks, per-task gets the encoder fully devoted to fragments.
Decision: stay with three per-task scripts as production. Joint-model
deployment cost (3 checkpoints, 3 forward passes at library-gen time)
is small vs the SA hit. B4 stays as the best diagnostic flavor for
future revisits (e.g., when calibration data is scarce, when more
tasks are added, or when a unified pre-trained encoder lands).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds ``Chronologer`` as a drop-in RT predictor alongside
``DeepChromatographyApex``. Roughly 4× tighter median residual
on timsTOF anchor PSMs (~8 s vs ~25 s on real-o240206) when
fine-tuned on q ≤ 0.01 spec-centric hits.
Implementation reproduces the upstream Searle Lab residual-CNN
architecture (Apache-2.0; Pino LK et al., U. Wisconsin-Madison)
inline — no upstream package dependency. We avoid vendoring
weights; ``Chronologer.from_checkpoint`` loads a base ``.pt``
that the user supplies. The full attribution + citation
block is at the top of chronologer.py.
API mirrors DeepChromatographyApex where it can:
* ``Chronologer.from_base(checkpoint_path, scale_init=0.79, bias_init=0.69)``
* ``predictor.predict(seqs, batch_size=4096) → np.ndarray`` (minutes)
* ``predictor.fine_tune(df, epochs=50, lr=1e-4, patience=8, val_frac=0.2)``
* ``predictor.fit_kde_correction(n_bins=2000)`` — optional KDE post-correction
that prefers upstream ``chronologer.chronologer_utils.kde_alignment.KDE_align``
when the import is available, falls back to a local KDE otherwise.
* ``predictor.fine_tune_psms(psm_collection, q_threshold=0.01)`` —
the API sagepy-rescore calls.
* ``predictor.save_checkpoint(path)`` — saves model state_dict + KDE
params so library generation can load via build_library's
--chrono-checkpoint flag.
Tokenizer accepts both ``[UNIMOD:n]`` and ``(UniMod:n)`` mod
brackets via ``unimod_to_chronologer`` (also exported), aligning
with the DiaNN probe canonicalization rule from
project_diann_probe_stage0_2026_05_09.
Brings ``DeepPeptideIntensityPredictor`` to API-parity with the RT and IM heads. sagepy-rescore.predict_and_finetune was already calling fine_tune_psms on intensity; without this commit it falls through to a NotImplementedError on the rustims main tree. Three pieces: * ``observed_fragments_to_intensity_target(sequence, charge, fragments)`` builds a 174-vec Prosit target from a sagepy ``Fragments`` view of the observed ions. Layout is ordinal-major ([y1+1, y1+2, y1+3, b1+1, b1+2, b1+3, y2+1, ...]) to match the model's native output. Impossible slots (frag > seq_len-1, frag_charge > prec_charge) are marked -1.0 so masked_spectral_distance ignores them; valid but unmatched slots stay 0.0 so the model learns the presence/absence pattern, not just the matched magnitudes. Intensities normalized by per-spectrum max so the loss is scale-invariant. * ``_ion_to_text`` is a small shim around sagepy's IonType enum so it works whether the binding emits ``"b"``, ``"y"``, ``"IonType(B)"``, or the str repr. * ``DeepPeptideIntensityPredictor.fine_tune_model(data, batch_size=64, epochs=50, learning_rate=1e-4, patience=5)`` matches the training loop from scripts/finetune_timstof.py: AdamW + GradScaler + grad-clip 1.0 + masked_spectral_distance. ``DeepPeptideIntensityPredictor.fine_tune_psms(psm_collection, q_threshold=0.01, ...)`` filters to rank-1 q≤threshold targets and dispatches to fine_tune_model. State is held on ``self._finetune_history`` so the sagepy-rescore HTML report can plot it. Validated on real-o240206 fc3-imcons0_5 (412k spectra, 1.76M sage PSMs): full FT pushes mokapot peptide yield from 30,947 (baseline FT) → 33,027 (+14.4% vs rescore_canonical 28,857 baseline) when paired with xgboost mokapot model.
DeepPeptideIonMobilityApex.fine_tune_model now records
per-epoch {epochs, train_loss, val_loss} on
self._finetune_history. sagepy-rescore's report reads this
to plot per-head convergence curves alongside RT and
intensity, so all three predictor heads now expose the
same telemetry channel.
Other small tweaks:
* divide accumulated train_loss by num batches before
recording (previously the per-epoch print/value was an
un-normalized sum that grew with batch count)
* drop the print-every-10 cadence to every-5 to match the
RT and intensity heads — keeps log-parser regexes aligned
across all three heads in generate_report_post_hoc.py
Critical layout bug in observed_fragments_to_intensity_target,
the new label builder used by fine_tune_psms.
The function was writing observed fragment intensities at
ordinal-major slots:
slot = (ordinal-1)*6 + (charge-1 if y else 3+charge-1)
so e.g. b1+1 went to slot 3, y2+1 went to slot 6, y1+2 went to
slot 1. But the canonical Prosit/imspy 174-vec layout (from
both imspy_simulation.utility.flatten_prosit_array and Rust's
rustdf::sim::utility::reshape_prosit_array) is charge-major,
y-before-b inside each charge block:
[0:29] y+1
[29:58] b+1
[58:87] y+2
[87:116] b+2
[116:145] y+3
[145:174] b+3
so the correct slot is
slot = (charge-1)*58 + (0 if y else 29) + (ordinal-1)
Effect: FT trained the model on labels at the wrong slots.
Loss still decreased because the scrambling was consistent
across labels, so mokapot's per-PSM cosine features (which
compare the FT'd-and-scrambled prediction against the same-
scrambled observed) still went up — this is why v3 peptide
yield (33,027) lifted +14.4% over rescore_canonical despite
the bug. Mokapot features are scramble-invariant within a
PSM.
But the LIBRARY uses the canonical (correct) decoder to map
the model's 174-vec to per-fragment intensities. So the
library's predicted per-fragment intensities are gibberish.
Smoking gun (3-way audit on v3 anchors, n=588):
extracted ↔ sage_observed: median spearman +0.71 ✓
extracted ↔ predicted: median spearman -0.12 ✗
sage_obs ↔ predicted: median spearman -0.21 ✗
Pep-centric extraction is sound (matches sage). Predictor
is mis-calibrated *for downstream library use* because of
the scrambled FT.
Fix is the slot formula. Mask handling rewritten to mark
impossible (ordinal > seq_len-1) and impossible
(frag_charge > precursor_charge) slots in the charge-major
layout.
After this commit:
* v3 intensity_finetuned.pt is invalid; must re-FT.
* Any library built with that ckpt has wrong per-fragment
intensities; rebuild required.
* Spec-centric peptide yield numbers from v3 stack still
stand (mokapot features are scramble-invariant), but any
library-assisted pipeline run on the v3 library is
untrusted until rebuild.
…e-major)" This reverts commit 8843d1d.
All three predictor heads (Chronologer RT, intensity, IM) split train/val by random PSM index. Each peptide appears at many spectra (107k PSMs ↔ ~33k unique modseqs in our test set), so a PSM-level split puts the same modseq in both folds. The predictors are deterministic per input (RT: seq→time; intensity: (seq,charge)→174vec; IM: (seq,charge)→1/K0), so val loss collapses to the instrument noise floor (~10s RT, ~3% intensity, ~0.01 1/K0) — that's NOT generalization, it's memorization being measured. Fix: group-aware split. * RT (chronologer.py): group key = sequence_modified * Intensity (predictors.py): group key = (sequence_modified, charge) * IM (ccs/predictors.py): group key = (sequence_modified, charge) All PSMs of a given group go to the same fold via: uniq, inv = np.unique(group_keys, return_inverse=True) perm_groups = rng_np.permutation(n_groups) val_groups = set(perm_groups[:n_val_groups]) mask_val = [g in val_groups for g in inv] Spec-centric mokapot yields are unaffected (the rescore features use the model's prediction vs sage's matched fragments per-PSM, so memorized fit and generalized fit give identical features). But the LIBRARY built with these predictors covers the full FASTA digest (mostly unseen sequences) — so library quality depends on true generalization, which we've been measuring incorrectly. Today's chronologer FT showed val_L1 = 10s, which we now expect to be 2-4× looser on a held-out peptide set. The model is still good, just not as good as the curves suggested. After commit: rebuild venv editable + restart full recovery pipeline with fixed split.
Brings the native Chronologer RT adapter (imspy_predictors/rt/chronologer.py) and the per-task RT/CCS/intensity fine-tune work onto main, alongside the calibrate_nce NCE calibration already on main. intensity/predictors.py auto-merged (calibrate_nce preserved). The only conflict was in ccs/predictors.py -- the IM fine-tune verbose-logging cadence; took the feature-branch version (epoch % 5, 'im' label) which matches the RT/intensity report cadence. Fine-tune history tracking is present on both sides and merged cleanly.
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Brings
feature/predictor-finetuneontomainso the native Chronologer RT adapter (imspy_predictors/rt/chronologer.py) and the per-task RT/CCS/intensity fine-tune work land alongside thecalibrate_nceNCE calibration already on main.Why:
feature/predictor-finetunebranched from488a8305(before PR #395), so no branch currently has both Chronologer andcalibrate_nce.sagepy-rescorepins imspy-predictors at a branch == main, so it cannot reach Chronologer until this lands.Conflict resolution:
intensity/predictors.pyauto-merged cleanly --calibrate_nceis preserved. The only conflict was a one-hunk cosmetic difference inccs/predictors.py(IM fine-tune verbose-logging cadence); took the feature-branch version (epoch % 5,imlabel) which matches the RT/intensity report cadence. Fine-tune history tracking is present on both sides and merged cleanly.After merge: repin
sagepy-rescorepyproject tomainand reinstall.