question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Incompatibility with masked reference genomes

See original GitHub issue

Incompatibility with masked reference genomes

call-macs2 fails if a masked reference (where Ns are used to indicate repetitive regions) is used to build a custom genome reference database.

run_shell_cmd: PID=70151, CMD=bedtools intersect -a ${SAMPLE}.trim.merged.nodup.tn5.tagAlign.tmp1 -b ${SAMPLE}.trim.merged.nodup.tn5.pval0.01.300K.bfilt.narrowPeak.tmp2 -wa -u | wc -l

Traceback (most recent call last):
  File "/users/nicolerg/anaconda2/envs/encode-atac-seq-pipeline/bin/encode_macs2_atac.py", line 210, in <module>
    main()
  File "/users/nicolerg/anaconda2/envs/encode-atac-seq-pipeline/bin/encode_macs2_atac.py", line 200, in main
    frip_qc = frip( args.ta, bfilt_npeak, args.out_dir)
  File "/users/nicolerg/anaconda2/envs/encode-atac-seq-pipeline/bin/encode_frip.py", line 54, in frip
    write_txt(frip_qc, str(float(val1)/float(val2)))
ValueError: could not convert string to float: ***** WARNING: File rat_liver7_S12_L001_R1_001.trim.merged.nodup.tn5.tagAlign.tmp1 has inconsistent naming convention for record:
AABR07024382.1	100568	100639	N	1000	+

If it would be impractical to include compatibility with masked references, it would be helpful to specify that the pipeline is incompatible with masked references in the documentation. I understand that this is the function of the blacklist input, but blacklisted regions can be more difficult to define for less popular model organisms.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:9 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
leepc12commented, Nov 7, 2018

This is actually a duplicate issue of #48 and will be fixed in the next release.

1reaction
akundajecommented, Nov 4, 2018

Interesting. This is a bug/unintended behavior so we’d like to get to the bottom of it. I’ll ask @leepc12 to get in touch with you so we can get to the bottom of it and fix the issue.

On Sat, Nov 3, 2018, 12:05 PM nicolerg <notifications@github.com wrote:

Yes, I have been able to run the pipeline with the UCSC soft-masked version of the rat genome (including contigs). This run was with a hard-masked version from Ensembl to see what the results looked like in comparison since there are some questions about the quality of the rat assembly, but it is not entirely necessary to use the hard-masked version. In fact it may be too limiting.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ENCODE-DCC/atac-seq-pipeline/issues/49#issuecomment-435613032, or mute the thread https://github.com/notifications/unsubscribe-auth/AAI7EWVUrI6RX1X9aH0HExgw2fgnEWNLks5urej_gaJpZM4YM5wz .

Read more comments on GitHub >

github_iconTop Results From Across the Web

Demystifying the versions of GRCh38/hg38 Reference ...
The hg38-alt-masked-graph hash table is compatible with pre-3.9 versions of DRAGEN. DRAGEN does not support the users building their own custom graph genomes....
Read more >
The case for not masking away repetitive DNA | Full Text
In the course of analyzing whole-genome data, it is common practice to mask or filter out repetitive regions of a genome, ...
Read more >
SNPsplit: Allele-specific splitting of alignments between ...
The simplest is to align all reads to a single reference genome, ... strategy will avoid a mismatch at this position, compared to...
Read more >
A complete reference genome improves analysis of human ...
Consequently, human genetics and genomics benefit from the availability of a high-quality reference genome, ideally without gaps or errors that ...
Read more >
Human genome reference builds - GRCh38 or hg38 - b37
For help dealing with reference compatibility problems, see this list of solutions. For information on the FASTA format and accompanying index files, see...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found