Incompatibility issues with custom reference genome
See original GitHub issueIncompatibility issues with custom reference genome
Macs2 peak calling fails if the FASTA file used to build a custom genome database does not follow the chr[\dXY]
naming convention. For example, I am using the Ensembl masked version of the rat genome (rn6, release 94) found here ftp://ftp.ensembl.org/pub/release-94/fasta/rattus_norvegicus/dna/, which does not prepend ‘chr’ to chromosome names. The error is produced by the following call:
[2018-10-31 18:06:20,887 ERROR] Unknown exception caught. Killing process group 72093...
Traceback (most recent call last):
File "/users/nicolerg/anaconda2/envs/encode-atac-seq-pipeline/bin/encode_common.py", line 224, in run_shell_cmd
p.returncode, cmd)
CalledProcessError: Command 'cat /mnt/lab_data/montgomery/nicolerg/motrpac/atac/pipeline-output/cromwell-executions/atac/03a28d2a-f364-4ce1-bfd5-f10488cf42a9/call-macs2/shard-0/inputs/-78707573/rn6_masked.chrom.sizes | grep -P 'chr[\dXY]+[ \t]' > 20180725_2_Gastroc_002_powder_S1_L001_R1_001.trim.merged.nodup.tn5.pval0.01.300K.bfilt.chrsz.tmp' returned non-zero exit status 1
OS/Platform and dependencies
- Platform: Ubuntu 16.04.4
- Cromwell: cromwell-34
- Conda version: conda 4.5.11
Issue Analytics
- State:
- Created 5 years ago
- Comments:8 (3 by maintainers)
Top Results From Across the Web
Identity and compatibility of reference genome resources
Here, we address each of these issues. Our approach can guarantee identity, relationships, and compatibility among reference genome assets, which we have ...
Read more >Limitations of the Human Reference Genome for Personalized ...
Some regions with known high variability, like the MHC, already have alternative assemblies because a single reference sequence causes too many ...
Read more >Troubleshooting Custom Genome fasta - Galaxy Training!
If a custom genome/transcriptome/exome dataset is producing errors, double check the format and that the chromosome identifiers between ALL inputs.
Read more >Personalized and graph genomes reveal missing signal in ...
Epigenomic studies that use next generation sequencing experiments typically rely on the alignment of reads to a reference sequence.
Read more >A complete reference genome improves analysis of human ...
Consequently, human genetics and genomics benefit from the availability of a high-quality reference genome, ideally without gaps or errors that ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Sure, working on it and will be fixed in the next release.
Sure, I am working on parameterizing ``atac.mito_chr_name
: "any_mito_chr_name"
in an input JSON. This will be fixed in the next release.BTW
regex-filter-reads
is sort of a wrapper variable for--regex-grep-v-ta
inbam2ta
.