Skip to content

Native Bruker timsTOF .d input (timstof feature)#45

Merged
ypriverol merged 4 commits into
devfrom
feat/timstof-d-input
Jun 1, 2026
Merged

Native Bruker timsTOF .d input (timstof feature)#45
ypriverol merged 4 commits into
devfrom
feat/timstof-d-input

Conversation

@ypriverol
Copy link
Copy Markdown
Member

Native Bruker timsTOF .d input

Adds an optional, off-by-default timstof cargo feature that reads native
Bruker timsTOF .d (DDA-PASEF) data via the pure-Rust
timsrust crate (v0.4.2 — the same reader
Sage uses), producing the engine's model::Spectrum exactly like the
mzML/MGF/Thermo readers. msgf-rust --spectrum sample.d ... then works with no
other flags (auto-detected by the .d path).

A .d is a directory (TDF SQLite analysis.tdf + binary analysis.tdf_bin).
timsrust reads it natively — no vendor runtime, nothing to bundle — so
this is simpler than the Thermo .raw feature (no release-packaging work).

What's in here

  • crates/input/Cargo.toml: optional timsrust = { version = "0.4.2", optional = true } + [features] timstof = ["dep:timsrust"].
  • crates/msgf-rust/Cargo.toml: [features] timstof = ["input/timstof"] (non-default).
  • crates/input/src/timstof.rs: TimsTofReader — opens a .d, iterates the centroided MS2 spectra (SpectrumReader::new + len/get), maps each to a Spectrum (1-based scan, precursor m/z/charge/intensity, RT in seconds, isolation center+width → symmetric offsets, ascending-sorted peaks), and skips precursor-less / non-positive-m/z spectra. Implements Iterator<Item = Result<Spectrum, _>> so the binary's send_chunks consumes it like the other readers.
  • crates/input/src/lib.rs: #[cfg(feature = "timstof")] module + re-exports.
  • crates/msgf-rust/src/bin/msgf-rust.rs: detect .d by extension; add is_d; dispatch to TimsTofReader in the non-chimeric streaming path; clear error when built without --features timstof; is_mgf now excludes .d.
  • Tests: 5 pure-Rust unit tests (convert/skip/charge/isolation/peak-sort) + a feature-gated integration test (skipped unless MSGF_TEST_D points at a .d).
  • docs/design/2026-06-01-timstof-d-input.md + a README section.

Scope

  • In: DDA-PASEF, MS2 only, the non-chimeric search path.
  • Out (this PR): ion mobility (carried but unused; a future Percolator-feature idea), chimeric on .d (no MS1 stream from the DDA reader → --chimeric degrades gracefully to a normal search, like MGF), .d precursor calibration (skipped + warned).

Benchmark dataset (documented, not downloaded)

PXD072598 — HeLa, DDA-PASEF, timsTOF Pro 2 (FragPipe/MSFragger reference results). Smallest .d: HeLa_IAA_F51_1.d.zip (1.11 GB). Human HeLa → human UniProt FASTA. .d files are 1–3.5 GB; none is on the dev machine.

Build / validation

  • Default (non-timstof) build stays green: cargo check -p msgf-rust passes; all default tests pass; lockfile change is additive-only (no shared dep up/downgraded).
  • The timstof feature also built + tested locally on rustc 1.87 — the MSRV concern from the Thermo PR did not materialize for timsrust 0.4.2: cargo build -p msgf-rust --features timstof, cargo test -p input --features timstof (5 unit tests pass), and cargo clippy --features timstof are all green.
  • Still needs VM validation: a live .d read end-to-end (the integration test is a no-op without a real .d; none is available on this machine). Run on a machine with a .d: MSGF_TEST_D=/path/HeLa_IAA_F51_1.d cargo test -p input --features timstof --test timstof_d_loads, then a full search against a human FASTA cross-checked vs the mzML-converted equivalent.

Draft until a live .d read is validated.

ypriverol added 3 commits June 1, 2026 15:47
Add an optional, off-by-default `timstof` cargo feature that reads native
Bruker timsTOF .d (DDA-PASEF) data via the pure-Rust timsrust crate (the same
reader Sage uses) and produces the engine's model::Spectrum, so the search path
is format-agnostic. A .d is a directory (TDF SQLite + binary blob) read natively
with no vendor runtime and nothing to bundle.

TimsTofReader opens a .d by path, iterates the centroided MS2 spectra by index
(timsrust SpectrumReader::new + len/get), and maps each to a Spectrum: 1-based
scan number, precursor m/z/charge/intensity, RT in seconds, isolation
center+width to symmetric offsets, and ascending-sorted peaks. Precursor-less or
non-positive-m/z spectra are skipped, mirroring the mzML/Thermo readers. The dep
and all .d code sit behind cfg(feature = timstof); default builds read mzML/MGF
only and never pull in timsrust.
Detect a .d input by path extension (works on the .d directory) and add is_d
alongside is_mzml; is_mgf now means 'neither mzML nor .d'. In the non-chimeric
streaming reader, route .d to TimsTofReader (feature-gated), with a clear error
when the binary is built without --features timstof.

Chimeric and precursor-calibration on .d are out of scope: the DDA reader
exposes no MS1 stream, so --chimeric on .d degrades gracefully to a normal
search (like MGF), and --precursor-cal is skipped with a warning. The TSV writer
uses its scan-based (non-MGF) path for .d since the reader emits real scan
numbers.
Add the design doc (approach, why timsrust, scope/out-of-scope, PXD072598
benchmark dataset, the not-built-locally note), a feature-gated integration test
that opens a real .d only when MSGF_TEST_D points at one (no-op otherwise so CI
stays green), and a README 'Reading Bruker timsTOF .d files' section plus the
--spectrum flag update.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 1, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e23e163a-b675-4b7f-ab65-3e67c8a2a215

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/timstof-d-input

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Integrate the timsTOF .d input alongside the now-merged Thermo .raw feature.
Resolved the shared dispatch/feature files: both 'thermo' and 'timstof' cargo
features coexist; the binary's format dispatch chains is_mzml / is_raw / is_d /
MGF, is_mgf excludes all native formats, --chimeric covers mzML+.raw (.d degrades
like MGF), --precursor-cal skips .raw and .d, and the README lists all four input
formats. Default + --features timstof builds green.
@ypriverol ypriverol marked this pull request as ready for review June 1, 2026 15:02
@qodo-code-review
Copy link
Copy Markdown

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

@ypriverol
Copy link
Copy Markdown
Member Author

Native .d read — benchmark validated

Ran the live end-to-end benchmark on PXD072598 (HeLa_IAA_F51_1.d, DDA-PASEF, timsTOF Pro 2, 1.4 GB unzipped) on a Linux host (rustc 1.95):

metric value
native .d read (timsrust) exit=0 — read + searched the full .d
PIN rows 542,645
target / decoy PSMs 279,945 / 262,700
@1% FDR (Percolator 3.7.1) 4,002
panics / real errors none

The reader works end-to-end on production DDA-PASEF data — the scope of this PR.

Caveat (follow-up, not a reader defect): the yield is modest and the target:decoy ratio is ~1.07 (weak discrimination) because the scoring config was not timsTOF-tuned — searched with the HCD_QExactive model, a human+yeast FASTA, and no ion-mobility features. Proper timsTOF tuning (a TOF-appropriate fragment tolerance, clean human FASTA, mobility as a Percolator feature) is the obvious next step. The timstof feature is off-by-default, so merging adds the capability with no risk to the default mzML/MGF/.raw path.

@ypriverol ypriverol merged commit 37971fa into dev Jun 1, 2026
5 checks passed
@ypriverol ypriverol deleted the feat/timstof-d-input branch June 3, 2026 17:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant