Incompatibility with masked reference genomes
See original GitHub issueIncompatibility with masked reference genomes
call-macs2
fails if a masked reference (where Ns are used to indicate repetitive regions) is used to build a custom genome reference database.
run_shell_cmd: PID=70151, CMD=bedtools intersect -a ${SAMPLE}.trim.merged.nodup.tn5.tagAlign.tmp1 -b ${SAMPLE}.trim.merged.nodup.tn5.pval0.01.300K.bfilt.narrowPeak.tmp2 -wa -u | wc -l
Traceback (most recent call last):
File "/users/nicolerg/anaconda2/envs/encode-atac-seq-pipeline/bin/encode_macs2_atac.py", line 210, in <module>
main()
File "/users/nicolerg/anaconda2/envs/encode-atac-seq-pipeline/bin/encode_macs2_atac.py", line 200, in main
frip_qc = frip( args.ta, bfilt_npeak, args.out_dir)
File "/users/nicolerg/anaconda2/envs/encode-atac-seq-pipeline/bin/encode_frip.py", line 54, in frip
write_txt(frip_qc, str(float(val1)/float(val2)))
ValueError: could not convert string to float: ***** WARNING: File rat_liver7_S12_L001_R1_001.trim.merged.nodup.tn5.tagAlign.tmp1 has inconsistent naming convention for record:
AABR07024382.1 100568 100639 N 1000 +
If it would be impractical to include compatibility with masked references, it would be helpful to specify that the pipeline is incompatible with masked references in the documentation. I understand that this is the function of the blacklist input, but blacklisted regions can be more difficult to define for less popular model organisms.
Issue Analytics
- State:
- Created 5 years ago
- Comments:9 (2 by maintainers)
Top Results From Across the Web
Demystifying the versions of GRCh38/hg38 Reference ...
The hg38-alt-masked-graph hash table is compatible with pre-3.9 versions of DRAGEN. DRAGEN does not support the users building their own custom graph genomes....
Read more >The case for not masking away repetitive DNA | Full Text
In the course of analyzing whole-genome data, it is common practice to mask or filter out repetitive regions of a genome, ...
Read more >SNPsplit: Allele-specific splitting of alignments between ...
The simplest is to align all reads to a single reference genome, ... strategy will avoid a mismatch at this position, compared to...
Read more >A complete reference genome improves analysis of human ...
Consequently, human genetics and genomics benefit from the availability of a high-quality reference genome, ideally without gaps or errors that ...
Read more >Human genome reference builds - GRCh38 or hg38 - b37
For help dealing with reference compatibility problems, see this list of solutions. For information on the FASTA format and accompanying index files, see...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
This is actually a duplicate issue of #48 and will be fixed in the next release.
Interesting. This is a bug/unintended behavior so we’d like to get to the bottom of it. I’ll ask @leepc12 to get in touch with you so we can get to the bottom of it and fix the issue.
On Sat, Nov 3, 2018, 12:05 PM nicolerg <notifications@github.com wrote: