Low sum in ragoo.fasta
See original GitHub issueHello,
I am attempting to run Ragoo using a long-read assembly as the ‘reference’. After running Ragoo with the following command:
ragoo.py -t 4 -b -C ${assembly} ${ref}
My output ragoo.fasta
file seems to be missing a lot of bases. The original assembly is ~2.7Gb, but the output fasta file has ~736 Mb only.
Any idea about what is happening to the outstanding sequences, or is this expected behaviour? The chimera.broken.fa
file is the correct size, so it seems that things are being lost after that stage somewhere.
Thanks! Lauren
Issue Analytics
- State:
- Created 4 years ago
- Comments:16 (9 by maintainers)
Top Results From Across the Web
ntJoin: Fast and lightweight assembly-guided scaffolding ...
Here, we introduce ntJoin, an assembly-guided scaffolder, which uses a lightweight, alignment-free mapping strategy in lieu of alignments to quickly contiguate ...
Read more >Untitled
Dreamcast console cheap, Pesbukers antv 2012, Portal dimensional no iraque, ... Video cars 2, Dirbiniai is elnio ragu, K line sailing schedule port...
Read more >Construction and integration of three de novo Japanese ...
In meta-assembly strategies, individual assemblies are aligned, and one best assembly is selected for each aligned segment based on the absence ...
Read more >Genome assembly and association tests identify interacting ...
generate the bimodal distribution of tree size with undesirable small trees observed by. 53 growers. We identified candidate genes within ...
Read more >Chromosomal-level genome assembly of the semi-dwarf rice ...
To form the chromosomes, the RaGOO [22] assembler was used to align the assembly scaffolds against the R498 genome. Table 1. QUAST statistics...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hi there,
After testing the code with your data, I believe I understand the problem.
When
-C
is invoked, a single file for each of the unplaced contigs is written in the intermediate output directory. Since your contigs had roughly 1M unplaced contigs, I assume this became a problem for your file system, thus leading to the truncatedragoo.fasta
file.Indeed, if one does not use
-C
,ragoo.fasta
contains the expected amount of sequence.In future versions of RaGOO, the intermediate output will be restricted to exactly 2 files regardless of the
-C
option. I believe that should solve the “low sum” problem.Additionally, it is true that RaGOO was not designed for more fragmented assemblies of larger genomes. To address this, future versions of ragoo will allow the user to lower the minimum alignment length, thus allowing for more contigs to be placed.
I will test out your data again when these features are implemented.
Thanks
My ‘reference’ is another human assembly using a different assembler. The original
ragoo.fasta
file has ~727 Mbp in it, vs ~2.4 Gbp in the file with manually concatenating sequence.