Skip to content

Assembly does not improve #172

Description

@gubrins

Heys,

I am working with two closely related species and for both I have HiFi and Hi-C data. I did the exact same for both species and for species 1, after SALSA, I get a better assembly. However, for species 2, after salsa I get the same N50 as I had before doing the scaffolding.
During the assembly, I get this ERROR! WARNING: Not enough Hi-C reads for scaffolding. What does this mean?
This is the summary I get from gfastats after the scaffolding:

`+++Summary+++:

scaffolds: 356

Total scaffold length: 1502913456
Average scaffold length: 4221667.01
Scaffold N50: 67491308
Scaffold auN: 81379285.48
Scaffold L50: 7
Largest scaffold: 203202437

contigs: 403

Total contig length: 1502889956
Average contig length: 3729255.47
Contig N50: 67491308
Contig auN: 81095181.14
Contig L50: 7
Largest contig: 203202437

gaps: 47

Total gap length: 23500
Average gap length: 500.00
Gap N50: 500
Gap auN: 500.00
Gap L50: 24
Largest gap: 500
Base composition (ACGT): 448804358, 302773097, 302741034, 448571467
GC content %: 40.29

soft-masked bases: 0

paths: 356

`

As you can see, both scaffold and contig N50 are the same: 67491308

And I also add this, just in case it helps:

bedfile loaded
Starting Iteration 1
bedfile started
bedfile loaded
Loading Hi-C links 
Hybrid scaffold graph loaded, nodes = 806 edges = 450
Hi-C implied edges = 0
Starting Iteration 2
bedfile started
bedfile loaded
Starting Iteration 2
WARNING: Not enough Hi-C reads for scaffolding
Loading Hi-C links 
Hybrid scaffold graph loaded, nodes = 688 edges = 350
Hi-C implied edges = 0
python2 /home/panthera/bin/RE_sites.py -a scafolding_omanensis/assembly.cleaned.fasta -e GANTC > scafolding_omanensis/re_counts_iteration_1
python2 /home/panthera/bin/make_links.py -b scafolding_omanensis/alignment_iteration_1.bed -d scafolding_omanensis -i 1 -x abc
python2 /home/panthera/bin/fast_scaled_scores.py -d scafolding_omanensis -i 1
sort -k 5 -gr scafolding_omanensis/contig_links_scaled_iteration_1 > scafolding_omanensis/contig_links_scaled_sorted_iteration_1
python2 /home/panthera/bin/layout_unitigs.py -x abc -l scafolding_omanensis/contig_links_scaled_sorted_iteration_1 -c 1000 -i 1 -d scafolding_omanensis
/home/panthera/bin/break_contigs -a scafolding_omanensis/alignment_iteration_2.bed -b scafolding_omanensis/breakpoints_iteration_2.txt -l scafolding_omanensis/scaffold_length_iteration_2 -i 2 -s 100   > scafolding_omanensis/misasm_iteration_2.report
python2 /home/panthera/bin/refactor_breaks.py -d scafolding_omanensis -i 2
python2 /home/panthera/bin/make_links.py -b scafolding_omanensis/alignment_iteration_2.bed -d scafolding_omanensis -i 2
python2 /home/panthera/bin/layout_unitigs.py -x abc -l scafolding_omanensis/contig_links_scaled_sorted_iteration_2 -c 1000 -i 2 -d scafolding_omanensis
/home/panthera/bin/break_contigs -a scafolding_omanensis/alignment_iteration_3.bed -b scafolding_omanensis/breakpoints_iteration_3.txt -l scafolding_omanensis/scaffold_length_iteration_3 -i 3 -s 100  > scafolding_omanensis/misasm_iteration_3.report
python2 /home/panthera/bin/refactor_breaks.py -d scafolding_omanensis -i 3 > scafolding_omanensis/misasm_3.log

This is the code I used to scaffold the assembly:
run_pipeline.py --assembly purged.fa --length purged.fa.fai --bed combined.bed --enzyme GANTC --output scaffolded

Any help would be appreciated!

Thanks in advance!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions