Skip to content

BREAKING CHANGE: v0.0.5 -> chaintools antirepeat x117 speedup impl#4

Merged
alejandrogzi merged 1 commit into
masterfrom
v0.0.5
Jun 8, 2026
Merged

BREAKING CHANGE: v0.0.5 -> chaintools antirepeat x117 speedup impl#4
alejandrogzi merged 1 commit into
masterfrom
v0.0.5

Conversation

@alejandrogzi

@alejandrogzi alejandrogzi commented Jun 8, 2026

Copy link
Copy Markdown
Owner

Full refactor of the antirepeat tool. On a ~200 MB soft-masked chain file with
.2bit reference/query it previously took ~12 h; it now completes in minutes, with
byte-identical output.

Root cause

antirepeat used a lazy .2bit access path: for every chain below --no-check-score
it re-read the sequence from disk and, per fetch, the twobit reader scanned the
chromosome's soft-mask/N block list from the start (a linear skip_while, no binary
search). On repeat-masked genomes — hundreds of thousands of mask blocks per chromosome —
each of the ~millions of per-chain fetches re-scanned a large prefix, and a cost that
grows with chromosome length dominated the run. Parallelism was also off by default.

Performance

  • Preload sequences into memory once. .2bit reference/query are now fully decoded
    into memory at startup (the soft-mask/N scan is paid once per chromosome instead of once
    per chain), so every per-chain access is an in-memory lookup. This is the single biggest
    win and turns the ~12 h run into minutes.
  • Load only referenced chromosomes. A cheap header-only pre-scan of the chain file
    loads just the sequences it references, bounding peak memory on fragmented assemblies
    (stdin input falls back to loading everything).
  • Parallel by default. The parallel feature is now part of the cli build, so chains
    are filtered across all cores out of the box; the previous --threads startup error
    (when built without the feature) is gone.
  • Zero per-chain allocation. The filter now borrows chromosome slices directly instead
    of copying each chain's span; minus-strand queries are reverse-complemented on the fly
    during the walk rather than copied and reversed. On large-span, repeat-driven chains this
    is a further ~9.6× faster and ~2× lower peak memory in benchmarking.
  • Fused filters. The degeneracy and repeat-mask filters now share a single pass over a
    chain's aligned blocks.
  • I/O tuning. Larger input read buffer (1 MB) and larger parallel batches.

Changed

  • AntiRepeatEngine::chain_passes no longer takes a SequenceCache; sequence access is now
    through the new SequenceResolver::chromosome() borrowing accessor.
  • The score tool also benefits from the in-memory .2bit preload, as it shares the
    sequence resolver.

Notes

  • Output is verified byte-identical to v0.0.4 across plus/minus strands, soft-mask and N
    content, gzip input/output, and any thread count, via a randomized differential test and
    full old-vs-new output diffing.

@alejandrogzi alejandrogzi merged commit 584f405 into master Jun 8, 2026
2 checks passed
@alejandrogzi alejandrogzi deleted the v0.0.5 branch June 8, 2026 15:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant