Skip to content

Jasmine Merging of Samples called by Manta and Smoove followed by multi-sample merging. #61

Description

I am currently trying to call a cohort of a bit over 1000 samples using Manta and Smoove as my SV-callers. I wish to know what the best approach would be to merge my structural variant calls using Jasmine.

When using SURVIVOR, the merging is relatively straightforward, first merge the different caller outputs per sample (you get the genotypes for that variant from each caller to show if it was detected by a caller and if it was what genotype it gave the varian), followed by merging these caller-merged VCF files by sample. The result is a VCF file with 1 genotype per sample (usually the "best" genotype from the callers in the previous VCF files, the "best" being preferably 1/1 followed by 0/1).

In Jasmine, I've been trying to replicate a similar approach by first merging caller outputs per sample with --allow_intrasample to allow for merging of overlapping calls in between callers and no --output_genotypes option, as doing this causes problems later on which I will explain.

Following the intra-sample merging, I try to merge by disabling --allow_intrasample and enabling --output_genotypes. The result looks similar to what I would expect, one entry per sample, where the genotype per sample is taken from the 1 genotype present in the intrasample merged vcf files. The reason why I disabled --output_genotypes in the intrasample merge is because whenever I enabled it and proceeded to the intersample merging, the result would be duplicated samples being present in the VCF file as separate samples, for example (0_Sample1, 1_Sample1, 0_Sample2, 1_Sample2) which muddies my data.

I also tried merging all intra and inter-sample VCF files at once and got a similar result. I understand this was recommended in #16 , although I want my final merged VCF file to only have 1 of each sample, where the best genotype is selected for that sample from among the caller outputs. This is just to make downstream analysis easier as having duplicated sample names may complicate things down the line.

So I'm not quite sure what the best way to go about this is. I'd like to use JasmineSV over SURVIVOR due to the fact that it preserves the original variant information present in the original caller VCF files (such as the alternate allele/sequence given by Manta) and can also preserve 0/0 calls, unlike SURVIVOR.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions