Native Bruker timsTOF .d input (timstof feature)#45
Conversation
Add an optional, off-by-default `timstof` cargo feature that reads native Bruker timsTOF .d (DDA-PASEF) data via the pure-Rust timsrust crate (the same reader Sage uses) and produces the engine's model::Spectrum, so the search path is format-agnostic. A .d is a directory (TDF SQLite + binary blob) read natively with no vendor runtime and nothing to bundle. TimsTofReader opens a .d by path, iterates the centroided MS2 spectra by index (timsrust SpectrumReader::new + len/get), and maps each to a Spectrum: 1-based scan number, precursor m/z/charge/intensity, RT in seconds, isolation center+width to symmetric offsets, and ascending-sorted peaks. Precursor-less or non-positive-m/z spectra are skipped, mirroring the mzML/Thermo readers. The dep and all .d code sit behind cfg(feature = timstof); default builds read mzML/MGF only and never pull in timsrust.
Detect a .d input by path extension (works on the .d directory) and add is_d alongside is_mzml; is_mgf now means 'neither mzML nor .d'. In the non-chimeric streaming reader, route .d to TimsTofReader (feature-gated), with a clear error when the binary is built without --features timstof. Chimeric and precursor-calibration on .d are out of scope: the DDA reader exposes no MS1 stream, so --chimeric on .d degrades gracefully to a normal search (like MGF), and --precursor-cal is skipped with a warning. The TSV writer uses its scan-based (non-MGF) path for .d since the reader emits real scan numbers.
Add the design doc (approach, why timsrust, scope/out-of-scope, PXD072598 benchmark dataset, the not-built-locally note), a feature-gated integration test that opens a real .d only when MSGF_TEST_D points at one (no-op otherwise so CI stays green), and a README 'Reading Bruker timsTOF .d files' section plus the --spectrum flag update.
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Integrate the timsTOF .d input alongside the now-merged Thermo .raw feature. Resolved the shared dispatch/feature files: both 'thermo' and 'timstof' cargo features coexist; the binary's format dispatch chains is_mzml / is_raw / is_d / MGF, is_mgf excludes all native formats, --chimeric covers mzML+.raw (.d degrades like MGF), --precursor-cal skips .raw and .d, and the README lists all four input formats. Default + --features timstof builds green.
Qodo reviews are paused for this user.Troubleshooting steps vary by plan Learn more → On a Teams plan? Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center? |
Native
|
| metric | value |
|---|---|
native .d read (timsrust) |
exit=0 — read + searched the full .d |
| PIN rows | 542,645 |
| target / decoy PSMs | 279,945 / 262,700 |
| @1% FDR (Percolator 3.7.1) | 4,002 |
| panics / real errors | none |
The reader works end-to-end on production DDA-PASEF data — the scope of this PR.
Caveat (follow-up, not a reader defect): the yield is modest and the target:decoy ratio is ~1.07 (weak discrimination) because the scoring config was not timsTOF-tuned — searched with the HCD_QExactive model, a human+yeast FASTA, and no ion-mobility features. Proper timsTOF tuning (a TOF-appropriate fragment tolerance, clean human FASTA, mobility as a Percolator feature) is the obvious next step. The timstof feature is off-by-default, so merging adds the capability with no risk to the default mzML/MGF/.raw path.
Native Bruker timsTOF
.dinputAdds an optional, off-by-default
timstofcargo feature that reads nativeBruker timsTOF
.d(DDA-PASEF) data via the pure-Rusttimsrustcrate (v0.4.2 — the same readerSage uses), producing the engine's
model::Spectrumexactly like themzML/MGF/Thermo readers.
msgf-rust --spectrum sample.d ...then works with noother flags (auto-detected by the
.dpath).A
.dis a directory (TDF SQLiteanalysis.tdf+ binaryanalysis.tdf_bin).timsrustreads it natively — no vendor runtime, nothing to bundle — sothis is simpler than the Thermo
.rawfeature (no release-packaging work).What's in here
crates/input/Cargo.toml: optionaltimsrust = { version = "0.4.2", optional = true }+[features] timstof = ["dep:timsrust"].crates/msgf-rust/Cargo.toml:[features] timstof = ["input/timstof"](non-default).crates/input/src/timstof.rs:TimsTofReader— opens a.d, iterates the centroided MS2 spectra (SpectrumReader::new+len/get), maps each to aSpectrum(1-based scan, precursor m/z/charge/intensity, RT in seconds, isolation center+width → symmetric offsets, ascending-sorted peaks), and skips precursor-less / non-positive-m/z spectra. ImplementsIterator<Item = Result<Spectrum, _>>so the binary'ssend_chunksconsumes it like the other readers.crates/input/src/lib.rs:#[cfg(feature = "timstof")]module + re-exports.crates/msgf-rust/src/bin/msgf-rust.rs: detect.dby extension; addis_d; dispatch toTimsTofReaderin the non-chimeric streaming path; clear error when built without--features timstof;is_mgfnow excludes.d.MSGF_TEST_Dpoints at a.d).docs/design/2026-06-01-timstof-d-input.md+ a README section.Scope
.d(no MS1 stream from the DDA reader →--chimericdegrades gracefully to a normal search, like MGF),.dprecursor calibration (skipped + warned).Benchmark dataset (documented, not downloaded)
PXD072598 — HeLa, DDA-PASEF, timsTOF Pro 2 (FragPipe/MSFragger reference results). Smallest
.d:HeLa_IAA_F51_1.d.zip(1.11 GB). Human HeLa → human UniProt FASTA..dfiles are 1–3.5 GB; none is on the dev machine.Build / validation
timstof) build stays green:cargo check -p msgf-rustpasses; all default tests pass; lockfile change is additive-only (no shared dep up/downgraded).timstoffeature also built + tested locally on rustc 1.87 — the MSRV concern from the Thermo PR did not materialize fortimsrust0.4.2:cargo build -p msgf-rust --features timstof,cargo test -p input --features timstof(5 unit tests pass), andcargo clippy --features timstofare all green..dread end-to-end (the integration test is a no-op without a real.d; none is available on this machine). Run on a machine with a.d:MSGF_TEST_D=/path/HeLa_IAA_F51_1.d cargo test -p input --features timstof --test timstof_d_loads, then a full search against a human FASTA cross-checked vs the mzML-converted equivalent.Draft until a live
.dread is validated.