BREAKING CHANGE: v0.0.5 -> chaintools antirepeat x117 speedup impl by alejandrogzi · Pull Request #4 · alejandrogzi/chaintools

alejandrogzi · 2026-06-08T15:04:17Z

Full refactor of the antirepeat tool. On a ~200 MB soft-masked chain file with
.2bit reference/query it previously took ~12 h; it now completes in minutes, with
byte-identical output.

Root cause

antirepeat used a lazy .2bit access path: for every chain below --no-check-score
it re-read the sequence from disk and, per fetch, the twobit reader scanned the
chromosome's soft-mask/N block list from the start (a linear skip_while, no binary
search). On repeat-masked genomes — hundreds of thousands of mask blocks per chromosome —
each of the ~millions of per-chain fetches re-scanned a large prefix, and a cost that
grows with chromosome length dominated the run. Parallelism was also off by default.

Performance

Preload sequences into memory once. .2bit reference/query are now fully decoded
into memory at startup (the soft-mask/N scan is paid once per chromosome instead of once
per chain), so every per-chain access is an in-memory lookup. This is the single biggest
win and turns the ~12 h run into minutes.
Load only referenced chromosomes. A cheap header-only pre-scan of the chain file
loads just the sequences it references, bounding peak memory on fragmented assemblies
(stdin input falls back to loading everything).
Parallel by default. The parallel feature is now part of the cli build, so chains
are filtered across all cores out of the box; the previous --threads startup error
(when built without the feature) is gone.
Zero per-chain allocation. The filter now borrows chromosome slices directly instead
of copying each chain's span; minus-strand queries are reverse-complemented on the fly
during the walk rather than copied and reversed. On large-span, repeat-driven chains this
is a further ~9.6× faster and ~2× lower peak memory in benchmarking.
Fused filters. The degeneracy and repeat-mask filters now share a single pass over a
chain's aligned blocks.
I/O tuning. Larger input read buffer (1 MB) and larger parallel batches.

Changed

AntiRepeatEngine::chain_passes no longer takes a SequenceCache; sequence access is now
through the new SequenceResolver::chromosome() borrowing accessor.
The score tool also benefits from the in-memory .2bit preload, as it shares the
sequence resolver.

Notes

Output is verified byte-identical to v0.0.4 across plus/minus strands, soft-mask and N
content, gzip input/output, and any thread count, via a randomized differential test and
full old-vs-new output diffing.

BREAKING CHANGE: v0.0.5 -> chaintools antirepeat x117 speedup impl

5da2bcb

alejandrogzi merged commit 584f405 into master Jun 8, 2026
2 checks passed

alejandrogzi deleted the v0.0.5 branch June 8, 2026 15:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BREAKING CHANGE: v0.0.5 -> chaintools antirepeat x117 speedup impl#4

BREAKING CHANGE: v0.0.5 -> chaintools antirepeat x117 speedup impl#4
alejandrogzi merged 1 commit into
masterfrom
v0.0.5

alejandrogzi commented Jun 8, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

alejandrogzi commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Root cause

Performance

Changed

Notes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

alejandrogzi commented Jun 8, 2026 •

edited

Loading