Use of int8 for variant_contig results in integer overflow with fragmented reference genomes
See original GitHub issueMany (non human) reference genomes contain 1000s of contigs that have not been assembled into full chromosomes.
Currently the variant_contig
array is hard coded as int8 (line) which results in integer overflow making it impossible to join variants to their contig.
Issue Analytics
- State:
- Created 2 years ago
- Reactions:1
- Comments:10 (1 by maintainers)
Top Results From Across the Web
pysam.AlignmentFile Example - Program Talk
Learn how to use python api pysam. ... the cut arrays are sorted by genomic coordinate Returns: contig, {cell: np.array([cut1(int),cut2(int)])} ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
We should make it an option to specify the dtype for
variant_contig
probably - evenint16
will overflow sometimes. There are lots of VCFs out there with huge numbers of contigs.Although, I guess this is the sort of thing we should be able to query the IO library for (“how many contigs are there” should be efficiently computable on any indexed VCF), so we should be able to automatically detect the minimal dtype. Even then though, I suppose people might want to manually specify the dtype, for their own reasons.
Nice!
I’ve created a fix in #667. Hopefully we can get that merged soon for you to use.