From 22269be975dddbc725d285e3982f7f08501d9898 Mon Sep 17 00:00:00 2001 From: theGreatHerrLebert Date: Fri, 13 Feb 2026 12:28:23 +0100 Subject: [PATCH] updating DOCs for sim and eval execution --- README.md | 15 +++- packages/imspy-predictors/README.md | 22 +++++- packages/imspy-simulation/README.md | 36 ++++++++- packages/imspy-simulation/SIMULATOR_README.md | 79 +++++++++++++++++-- .../resources/docs/documentation.md | 75 +++++++++++++++++- .../timsim/integration/VALIDATION_README.md | 58 +++++++++++--- 6 files changed, 257 insertions(+), 28 deletions(-) diff --git a/README.md b/README.md index 18615cc6..6c0de91a 100644 --- a/README.md +++ b/README.md @@ -257,9 +257,18 @@ pip install -e ./imspy-vis ### Docker -Pre-built images available for reproducible environments: -- [AMD64](https://github.com/MatteoLacki/rustims_docker/raw/refs/heads/main/release.zip) -- [ARM64](https://github.com/MatteoLacki/rustims_docker/raw/refs/heads/main/release_arm64.zip) +Build from the included Dockerfile (CUDA 12.4, Python 3.12): + +```bash +# Build the image +docker build -t rustims . + +# Verify GPU support +docker run --rm --gpus all rustims python -c "import torch; print(torch.cuda.is_available())" + +# Run a simulation +docker run --rm --gpus all -v /data:/workspace rustims timsim /workspace/config.toml +``` ## Documentation diff --git a/packages/imspy-predictors/README.md b/packages/imspy-predictors/README.md index 65fb6483..62ea6a44 100644 --- a/packages/imspy-predictors/README.md +++ b/packages/imspy-predictors/README.md @@ -20,7 +20,24 @@ pip install imspy-predictors[koina] - **Retention Time Prediction**: GRU-based retention time predictors - **Fragment Intensity Prediction**: Prosit 2023 timsTOF intensity predictor - **Charge State Prediction**: Binomial and deep learning charge state distribution models -- **Koina Integration**: Access remote prediction models via Koina servers (optional) +- **Koina Integration**: Access remote prediction models via [Koina](https://koina.wilhelmlab.org) servers (optional) + +### Available Koina Remote Models + +When using `pip install imspy-predictors[koina]`, remote models can be configured in TimSim's `[models]` TOML section: + +| Task | Model Name | +|------|-----------| +| **RT** | `Deeplc_hela_hf`, `Chronologer_RT`, `AlphaPeptDeep_rt_generic`, `Prosit_2019_irt` | +| **CCS** | `AlphaPeptDeep_ccs_generic`, `IM2Deep` | +| **Intensity** | `prosit`, `alphapeptdeep`, `ms2pip` | + +```toml +[models] +rt_model = "AlphaPeptDeep_rt_generic" +ccs_model = "" # "" = local PyTorch model +intensity_model = "prosit" +``` ## Quick Start @@ -53,8 +70,7 @@ intensity_model = Prosit2023TimsTofWrapper() ## Dependencies - **imspy-core**: Core data structures (required) -- **TensorFlow**: Deep learning framework (required) -- **dlomix**: Deep learning for omics (required) +- **PyTorch**: Deep learning framework (required) - **koinapy**: Koina API client (optional, for remote models) ## Optional Dependencies diff --git a/packages/imspy-simulation/README.md b/packages/imspy-simulation/README.md index f6ce8efa..db374579 100644 --- a/packages/imspy-simulation/README.md +++ b/packages/imspy-simulation/README.md @@ -14,11 +14,19 @@ For search integration (validation workflows): pip install imspy-simulation[search] ``` +For KOINA remote model support (optional): + +```bash +pip install imspy-predictors[koina] +``` + ## Features - **Frame Builders**: DIA and DDA frame simulation with annotation support - **TimSim**: Complete simulation pipeline for synthetic timsTOF data +- **Prediction Models**: Local PyTorch models with optional KOINA remote model support (see [Prediction Models](#prediction-models)) - **Validation**: Tools for validating simulated data against search results +- **Integration Testing (EVAL)**: Automated validation against DiaNN, FragPipe, and Sage (see [Integration Testing](#integration-testing)) - **Isotope Simulation**: Accurate isotope distribution generation - **TDF Writing**: Write simulated data to Bruker TDF format @@ -49,9 +57,35 @@ frames = frame_builder.build_frames([1, 2, 3]) ### timsim Full simulation pipeline: ```bash -timsim --config config.toml --output /path/to/output +timsim config.toml +timsim config.toml --save-path output.d --reference-path reference.d --fasta-path proteome.fasta +``` + +## Prediction Models + +TimSim uses deep learning models for retention time, ion mobility (CCS), and fragment intensity prediction. By default, local PyTorch models are used. Optionally, remote models can be accessed via [KOINA](https://koina.wilhelmlab.org) servers: + +```toml +[models] +rt_model = "" # "" = local (default), or e.g. "Deeplc_hela_hf" +ccs_model = "" # "" = local (default), or e.g. "AlphaPeptDeep_ccs_generic" +intensity_model = "" # "" = local (default), or e.g. "prosit", "alphapeptdeep" ``` +Requires `pip install imspy-predictors[koina]` for remote models. Falls back to local models if KOINA is unreachable. See [SIMULATOR_README.md](SIMULATOR_README.md) for the full list of available models. + +## Integration Testing + +The EVAL pipeline validates simulated datasets against production proteomics search engines: + +```bash +python -m imspy_simulation.timsim.integration.sim --env env.toml --list +python -m imspy_simulation.timsim.integration.sim --env env.toml --test IT-DIA-HELA +python -m imspy_simulation.timsim.integration.eval --env env.toml --test IT-DIA-HELA +``` + +See the [Validation README](src/imspy_simulation/timsim/integration/VALIDATION_README.md) for setup, available tests, and configuration details. + ## Submodules - **builders/**: Frame builder implementations (DIA, DDA) diff --git a/packages/imspy-simulation/SIMULATOR_README.md b/packages/imspy-simulation/SIMULATOR_README.md index 62a4d64d..7bcd2f54 100644 --- a/packages/imspy-simulation/SIMULATOR_README.md +++ b/packages/imspy-simulation/SIMULATOR_README.md @@ -7,7 +7,23 @@ A high-fidelity proteomics simulation engine for Bruker timsTOF instruments. Gen ### 1. Installation ```bash -# Activate your Python environment +# From PyPI (recommended) +pip install imspy-simulation + +# With KOINA remote model support (optional) +pip install imspy-predictors[koina] +``` + +**Docker** (includes all dependencies + GPU support): +```bash +docker build -t rustims . +docker run --rm --gpus all -v /data:/workspace rustims timsim /workspace/config.toml +``` + +
+From source + +```bash source /path/to/your/env/bin/activate # Install the Rust backend (requires maturin) @@ -16,9 +32,10 @@ maturin develop --release # Install Python packages pip install -e /path/to/rustims/packages/imspy-core -pip install -e /path/to/rustims/packages/imspy-simulation pip install -e /path/to/rustims/packages/imspy-predictors +pip install -e /path/to/rustims/packages/imspy-simulation ``` +
### 2. Create a Configuration File @@ -71,15 +88,17 @@ You need a real timsTOF `.d` file as a template. The simulator will populate it ### Simulation Pipeline ``` -FASTA → Digestion → RT Prediction → IM Prediction → +FASTA → Digestion → Model Selection → RT Prediction → IM Prediction → + (local/KOINA) Fragment Intensity Prediction → Frame Assembly → .d File ``` 1. **Digestion**: In-silico tryptic digest of proteins -2. **RT Prediction**: Deep learning model predicts retention times -3. **IM Prediction**: CCS/mobility prediction for each peptide ion -4. **Intensity Prediction**: Fragment ion intensities (PROSPECT model) -5. **Frame Assembly**: Signals placed into timsTOF frame structure +2. **Model Selection**: Choose local PyTorch or KOINA remote models (see [Prediction Model Selection](#prediction-model-selection-koina)) +3. **RT Prediction**: Deep learning model predicts retention times +4. **IM Prediction**: CCS/mobility prediction for each peptide ion +5. **Intensity Prediction**: Fragment ion intensities (local or KOINA models) +6. **Frame Assembly**: Signals placed into timsTOF frame structure ### Output Files @@ -121,6 +140,35 @@ output_dir/ | `missed_cleavages` | `2` | Allowed missed cleavages | | `min_len` / `max_len` | `7` / `30` | Peptide length range | +### Prediction Model Selection (KOINA) + +TimSim supports both local PyTorch models and remote [KOINA](https://koina.wilhelmlab.org) models for predictions. Configure via the `[models]` section: + +```toml +[models] +rt_model = "" # "" or "local" = local PyTorch (default) +ccs_model = "" # "" or "local" = local PyTorch (default) +intensity_model = "" # "" or "local" = local PyTorch (default) +``` + +**Available remote models:** + +| Task | Model Name | Notes | +|------|-----------|-------| +| **RT** | `"Deeplc_hela_hf"` | DeepLC HeLa model | +| | `"Chronologer_RT"` | Chronologer RT predictor | +| | `"AlphaPeptDeep_rt_generic"` | AlphaPeptDeep generic RT | +| | `"Prosit_2019_irt"` | Prosit indexed RT | +| **CCS** | `"AlphaPeptDeep_ccs_generic"` | AlphaPeptDeep generic CCS | +| | `"IM2Deep"` | IM2Deep predictor | +| **Intensity** | `"prosit"` | Prosit 2023 timsTOF (max 30 AA, limited mods) | +| | `"alphapeptdeep"` | AlphaPeptDeep generic (supports phospho) | +| | `"ms2pip"` | ms2pip timsTOF 2024 | + +**Prerequisites**: `pip install imspy-predictors[koina]` + +If a KOINA server is unreachable, the simulator automatically falls back to local models. + ### Advanced Features #### Partial Fragmentation (Unfragmented Precursors) @@ -259,6 +307,23 @@ binomial_charge_model = true charge_state_one_probability = 0.15 ``` +## Integration Testing (EVAL Pipeline) + +Validate simulated datasets against production proteomics search engines (DiaNN, FragPipe, Sage): + +```bash +# List available integration tests +python -m imspy_simulation.timsim.integration.sim --env env.toml --list + +# Run a simulation +python -m imspy_simulation.timsim.integration.sim --env env.toml --test IT-DIA-HELA + +# Analyze and validate against ground truth +python -m imspy_simulation.timsim.integration.eval --env env.toml --test IT-DIA-HELA +``` + +Third-party analysis tools (DiaNN, FragPipe, Sage) must be installed separately — they are not bundled due to licensing. See the full [Validation README](src/imspy_simulation/timsim/integration/VALIDATION_README.md) for setup instructions and available test scenarios. + ## Troubleshooting ### "Bruker SDK not found" diff --git a/packages/imspy-simulation/src/imspy_simulation/resources/docs/documentation.md b/packages/imspy-simulation/src/imspy_simulation/resources/docs/documentation.md index 30464472..f1bacd89 100644 --- a/packages/imspy-simulation/src/imspy_simulation/resources/docs/documentation.md +++ b/packages/imspy-simulation/src/imspy_simulation/resources/docs/documentation.md @@ -81,10 +81,11 @@ output/ 8. [Property Variation Settings](#property-variation-settings) 9. [DDA Settings](#dda-settings) 10. [Charge State Probabilities](#charge-state-probabilities) -11. [Quad Transmission Settings](#quad-transmission-settings) -12. [Video Settings](#video-settings) -13. [Performance Settings](#performance-settings) -14. [Console and Execution](#console-and-execution) +11. [Prediction Model Settings](#prediction-model-settings) +12. [Quad Transmission Settings](#quad-transmission-settings) +13. [Video Settings](#video-settings) +14. [Performance Settings](#performance-settings) +15. [Console and Execution](#console-and-execution) --- @@ -300,6 +301,67 @@ charge_state_one_probability = 0.0 --- +## Prediction Model Settings + +TimSim uses deep learning models for retention time (RT), collisional cross section (CCS), and fragment intensity prediction. By default, local PyTorch models ship with the package. Optionally, remote models can be used via [KOINA](https://koina.wilhelmlab.org) servers. + +**Prerequisite for remote models**: `pip install imspy-predictors[koina]` + +### `[models]` Section + +| Parameter | Default | Description | +|-----------|---------|-------------| +| `rt_model` | `""` | Retention time prediction model | +| `ccs_model` | `""` | CCS / ion mobility prediction model | +| `intensity_model` | `""` | Fragment intensity prediction model | + +### Available Models + +**Retention Time (`rt_model`)**: + +| Value | Description | +|-------|-------------| +| `""` or `"local"` | Local PyTorch model (default) | +| `"Deeplc_hela_hf"` | DeepLC HeLa model (KOINA) | +| `"Chronologer_RT"` | Chronologer RT predictor (KOINA) | +| `"AlphaPeptDeep_rt_generic"` | AlphaPeptDeep generic RT (KOINA) | +| `"Prosit_2019_irt"` | Prosit indexed RT (KOINA) | + +**CCS / Ion Mobility (`ccs_model`)**: + +| Value | Description | +|-------|-------------| +| `""` or `"local"` | Local PyTorch model (default) | +| `"AlphaPeptDeep_ccs_generic"` | AlphaPeptDeep generic CCS (KOINA) | +| `"IM2Deep"` | IM2Deep predictor (KOINA) | + +**Fragment Intensity (`intensity_model`)**: + +| Value | Description | +|-------|-------------| +| `""` or `"local"` | Local PyTorch PROSPECT fine-tuned model (default) | +| `"prosit"` | Prosit 2023 timsTOF (KOINA) — max 30 AA, limited modifications | +| `"alphapeptdeep"` | AlphaPeptDeep generic (KOINA) — supports all modifications including phospho | +| `"ms2pip"` | ms2pip timsTOF 2024 (KOINA) | + +### Notes + +- If a KOINA server is unreachable, the simulator automatically falls back to local models. +- For phosphorylated peptides, use `"alphapeptdeep"` as the intensity model — Prosit does not support phosphorylation modifications. +- Prosit intensity models are limited to peptides ≤ 30 amino acids with standard modifications. +- AlphaPeptDeep supports all UNIMOD modifications and has no peptide length restriction. + +### Configuration Example + +```toml +[models] +rt_model = "AlphaPeptDeep_rt_generic" +ccs_model = "AlphaPeptDeep_ccs_generic" +intensity_model = "prosit" +``` + +--- + ## Quad Transmission Settings Advanced settings for quadrupole-dependent isotope transmission and **partial fragmentation** (precursor survival). @@ -431,6 +493,11 @@ isotope_k = 8 isotope_min_intensity = 1 isotope_centroid = true +[models] +rt_model = "" +ccs_model = "" +intensity_model = "" + [noise] mz_noise_precursor = true precursor_noise_ppm = 6.5 diff --git a/packages/imspy-simulation/src/imspy_simulation/timsim/integration/VALIDATION_README.md b/packages/imspy-simulation/src/imspy_simulation/timsim/integration/VALIDATION_README.md index 2d0b04a7..bf783b14 100644 --- a/packages/imspy-simulation/src/imspy_simulation/timsim/integration/VALIDATION_README.md +++ b/packages/imspy-simulation/src/imspy_simulation/timsim/integration/VALIDATION_README.md @@ -16,17 +16,27 @@ Automated validation framework for TIMSIM simulations. Generate synthetic datase ### 1. Setup Environment ```bash -# Create/activate Python environment -source /path/to/your/env/bin/activate +# From PyPI (recommended) +pip install imspy-simulation -# Install required packages -pip install -e /path/to/rustims/packages/imspy-simulation +# With KOINA remote model support (optional) +pip install imspy-predictors[koina] +``` + +
+From source + +```bash +source /path/to/your/env/bin/activate pip install -e /path/to/rustims/packages/imspy-core +pip install -e /path/to/rustims/packages/imspy-predictors +pip install -e /path/to/rustims/packages/imspy-simulation # Rebuild Rust backend if needed cd /path/to/rustims/imspy_connector maturin develop --release ``` +
### 2. Configure Environment File @@ -42,6 +52,11 @@ fragpipe_workflow_dia = "/path/to/workflows/DIA_SpecLib_Quant_diaPASEF.workflow" fragpipe_workflow_dda = "/path/to/workflows/LFQ-noMBR.workflow" sage_path = "/path/to/sage" +# Optional: additional workflow files for phospho tests +# fragpipe_workflow_dia_phospho = "/path/to/workflows/DIA_Phospho.workflow" +# fragpipe_workflow_dda_phospho = "/path/to/workflows/DDA_Phospho.workflow" +# fragpipe_python = "/path/to/python" # Python used by FragPipe (if different) + # Output and reference data [paths] output_base = "/path/to/output" @@ -54,9 +69,32 @@ fasta_hela_decoys = "/path/to/hela-decoys.fasta" [performance] num_threads = -1 use_gpu = true + +# Optional: tool-specific timeouts (seconds) +# diann_threads = 8 # Override thread count for DiaNN +# diann_timeout = 7200 # DiaNN timeout (default: 2h) +# fragpipe_timeout = 7200 # FragPipe timeout (default: 2h) ``` -### 3. Run Tests +### 3. Install Third-Party Analysis Tools + +DiaNN, FragPipe, and Sage are **not bundled** with imspy-simulation due to licensing restrictions. You must install them separately and configure their paths in `env.toml`. + +**DiaNN**: +- Download the Linux binary from [https://github.com/vdemichev/DiaNN](https://github.com/vdemichev/DiaNN) +- Make it executable: `chmod +x diann-linux` +- Set `diann_path` in `env.toml` + +**FragPipe**: +- Download a release from [https://github.com/Nesvilab/FragPipe](https://github.com/Nesvilab/FragPipe) +- Requires Java runtime (`java -version` to verify) +- Set `fragpipe_path`, `fragpipe_tools`, and workflow paths in `env.toml` + +**Sage** (optional, DDA only): +- Open source — download a binary from [https://github.com/lazear/sage](https://github.com/lazear/sage) or build from source +- Set `sage_path` in `env.toml` (optional — auto-discovered if on `$PATH`) + +### 4. Run Tests ```bash # List available tests @@ -354,11 +392,11 @@ imspy_simulation/timsim/integration/ ### Analysis Tools -| Tool | Version | Purpose | -|------|---------|---------| -| DiaNN | 1.8+ | DIA/DDA analysis | -| FragPipe | 21+ | DIA/DDA analysis | -| Sage | 0.14+ | DDA analysis (optional) | +| Tool | Version | Purpose | Download | +|------|---------|---------|----------| +| DiaNN | 1.8+ | DIA/DDA analysis | [github.com/vdemichev/DiaNN](https://github.com/vdemichev/DiaNN) | +| FragPipe | 21+ | DIA/DDA analysis | [github.com/Nesvilab/FragPipe](https://github.com/Nesvilab/FragPipe) | +| Sage | 0.14+ | DDA analysis (optional) | [github.com/lazear/sage](https://github.com/lazear/sage) | ### Python Packages