Skip to content

MGE memory-related errors - 'Aborted (core dumped)' & 'Segmentation fault (core dumped)' #14

@SchistoDan

Description

@SchistoDan

Hi,

Firstly, thanks for developing a brilliant and easy to use tool! Apologies in advance for the information dump.

I've been developing a snakemake pipeline (initially based on your example) to take genome skims from thousands of museum specimens. Currently we're running the pipeline on a benchmarking dataset of 570 samples. I've enabled MGE to take specific protein references for each sample and run with multiple ('r' and 's') parameter combinations, which can result in MGE producing 3,420 consensus (and associated) files.

As these are museum specimens, we were also interested in screening for possible 'contaminant' reads, and as MGE can run on a multi-fasta and reads will map to the 'closest' reference, we've been trying to provide MGE with a multi-fasta for each sample that contains the sample-specific reference and 14 common contaminant references (fungi, human, etc). When running MGE on the 570 samples with even one parameter combination, this can result in 8,550 MGE runs/jobs (570*15)!

When running this version of the pipeline, the MGE step consistently crashes with 'Aborted' and 'Segmentation fault' output to the snakemake log (e.g. line 4,014 for Segmentation fault MGE-standard_r1s50_contam.txt). This crash happens at different times in the run (isn't always the first MGE job). I can 'resume' the snakemake run post-crash using '--rerun-incomplete' but the same sample will consistently cause the crash within each run, although in an identical run (in a different directory) a different sample will cause the crash (implying it's likely not a sample-specific issue). The input files into MGE (concatenated and trimmed PE fastq files) varying between ~50 MB and 15 GB (most are 1-3 GB), but it's not always the MGE jobs using the larger files that cause the crashes. I'm running the pipeline on a HPC node with 192 CPUs and 2 TB RAM available. I've tried requesting more of less CPUs and over 1 TB of RAM for the run but it doesn't seem to impact whether the run crashes. Our system administrator seems to think it's not an out of memory issue on our end.

When looking at the MGE logs, some of the jobs that report 'Aborted' and 'Segmentation fault' output 'munmap_chunk(): invalid pointer', 'corrupted size vs. prev_size while consolidating' or 'realloc(): invalid old size', which I believe are C/C++ memory leak/inefficiency-related issues, but I'm not familiar with C languages.

Interestingly, when the MGE step crashes on a particular sample, all of the alignment and consensus files for the 'contaminant' references are produced for that sample (or at least aren't deleted), but are not for the target reference.

Any advice on how to overcome these issues would be greatly appreciated as we intend to scale up our analysis further (>10,000 samples). Just let me know if you'd like any further information or files from me!

Many thanks,
Dan

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions