Skip to content

Add peak calling for FIRE into ft#94

Draft
mrvollger wants to merge 72 commits intomainfrom
peak_calling
Draft

Add peak calling for FIRE into ft#94
mrvollger wants to merge 72 commits intomainfrom
peak_calling

Conversation

@mrvollger
Copy link
Copy Markdown
Member

@mrvollger mrvollger commented Nov 6, 2025

Summary

This branch adds two major pieces of functionality and ships a co-located Snakemake workflow for model training.

New subcommands

  • ft call-peaks — end-to-end peak caller implemented in Rust (replaces the previous Python/bash pipeline). Builds a per-chromosome FDR table, identifies local maxima, snaps peak boundaries to the median positions of underlying FIRE elements, merges adjacent peaks, and emits a BED with the expected fire-peak columns (including percent-accessible calls). Code under src/subcommands/call_peaks/ (fdr, peaks, mod — ~1450 lines).
  • ft mock-fire — generates synthetic FIRE-tagged output for peak-calling tests and reproducers. See src/subcommands/mock_fire.rs.
  • ft benchmark (hidden) — profiles the fiberseq iterator to help tune chunk sizes and parallelism. See src/subcommands/benchmark.rs.

Extensions and fixes

  • ft pileup: major refactor to share code with peak calling. Now accepts multiple regions on the CLI or a BED file of regions. Adds default --frac-fibers filtering. See src/subcommands/pileup.rs and src/cli/pileup_opts.rs.
  • Multi-fetch iterator: fiber.rs switched from collecting regions to multi-fetching, reducing peak memory on wide region sets.
  • Per-chromosome FDR construction: builds the FDR table incrementally per chromosome instead of all at once.
  • Better error messages and cleaner logging in the new peak-calling path.

Train-FIRE workflow

Adds Train-FIRE/ — a self-contained Snakemake pipeline for training FIRE models (positives/negatives construction, feature extraction, XGBoost grid search, mokapot semi-supervised FDR, track-hub generation). Pixi-managed, with a .tests/ minimal smoke config.

Test plan

Mitchell R. Vollger and others added 30 commits October 29, 2025 15:14
…ng, also need to turn that into a real peak file
…ng, also need to turn that into a real peak file
…ng, also need to turn that into a real peak file
…ng, also need to turn that into a real peak file
…s and incoperating the underlying FIRE elements so I can find the natural peak start and end
Add new `mock-fire` CLI subcommand that generates mock BAM files with
FIRE elements from a BED file. Each interval becomes a FIRE element,
with the 4th column grouping intervals into the same mock read.

Bug fixes:
- Fix score calculation overflow when quality=255 by capping at 253
  and limiting max score to 100
- Fix peak merging to select representative peak by highest score
  instead of lowest FDR, ensuring the peak_max reflects the position
  with most FIRE coverage
- Add hidden --min-fire-coverage parameter to call-peaks for testing
  with low-coverage data

New files:
- src/cli/mock_fire_opts.rs
- src/subcommands/mock_fire.rs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant