Hi there,
Thanks for developing this awesome tool! I was wondering if you could give me some suggestions on the following issues? Is there any strategy to use for this task?
Input Data:
16 species in the same family, All genomes softmasked with RepeatMasker (33–59% masked)
Genome Size Range: 755 MB – 1.58 GB, Average: ~1.21 GB, Total: ~19.3 GB
Software:
Cactus v2 (commit 00699c2), native install (no Docker/Singularity)
HPC:
SLURM cluster, --batchSystem single_machine, 1 node, 64 CPUs, ~1 TB RAM
Problem
cactus_consolidated OOM-kills during the abPOA BAR phase at one internal nodes Anc01. This nodes is not the root — it is mid-level ancestors covering subsets of species. Jobs run sequentially (one at a time, not in parallel), so this is a single-job memory issue; Reducing --consCores from 60 → 48 slightly reduced peak mem usage.
Questions
- Are there recommended config XML parameters to reduce abPOA BAR memory for closely-related species at this genome size
(e.g. partialOrderAlignmentWindow, partialOrderAlignmentMaskFilter)?
- Would switching partialOrderAlignment="0" (cPecan instead of abPOA) substantially reduce memory for this use case?
- Is there any way to split or chunk the cactus_consolidated step for a single node, or is the monolithic design
fundamental?
- Is the table calibrated for vertebrate genomes, and if so, are there recommended values for
invertebrates with ~1.2 GB genomes?
- I have seen that cactus progressive successfully align >50 mammalian genomes, do you happen to know what strategies they used? or they just have a lot more memory on their HPC?
Thank you very much!
Best
Ruiqi
Hi there,
Thanks for developing this awesome tool! I was wondering if you could give me some suggestions on the following issues? Is there any strategy to use for this task?
Input Data:
16 species in the same family, All genomes softmasked with RepeatMasker (33–59% masked)
Genome Size Range: 755 MB – 1.58 GB, Average: ~1.21 GB, Total: ~19.3 GB
Software:
Cactus v2 (commit 00699c2), native install (no Docker/Singularity)
HPC:
SLURM cluster, --batchSystem single_machine, 1 node, 64 CPUs, ~1 TB RAM
Problem
cactus_consolidated OOM-kills during the abPOA BAR phase at one internal nodes Anc01. This nodes is not the root — it is mid-level ancestors covering subsets of species. Jobs run sequentially (one at a time, not in parallel), so this is a single-job memory issue; Reducing --consCores from 60 → 48 slightly reduced peak mem usage.
Questions
(e.g. partialOrderAlignmentWindow, partialOrderAlignmentMaskFilter)?
fundamental?
invertebrates with ~1.2 GB genomes?
Thank you very much!
Best
Ruiqi