Peak calling parameter chip.fdr_thresh never changes from default 0.01
See original GitHub issueDescribe the bug
I’m running the transcription factor chipseq pipeline. I have one sample out of 16 that failed the pipeline. I’m trying to determine whether the sample may be poor or whether I can recover some peaks by loosening the fdr threshold from the default 0.01 it was originally run at. The pipeline failed with the following error:
Exception: File is empty (20200622_Chip_H1_S8_L001_R1_001.merged.nodup.pr2_x_20200622_Chip_G1_S7_L001_R1_001.merged.nodup.300K.regionPeak.gz). Help: No peaks found. FDR threshold (fdr_thresh in your input JSON) might be too stringent or poor quality sample?
I went back into my input json and added a chip.fdr_thresh of 0.05 and re-ran the pipeline. I received the exact same results. I re-ran again with fdr of 0.2 in an attempt to sanity check and received the same results. For the other samples that successfully made it through the pipeline the html output at fdr of 0.05 and 0.2 never changes, the number of peaks and everything remains the same as what originally was called at the default fdr level. Along those same lines the number of raw peaks called (capped at 300000) says “at an fdr of 0.01” in the html in every case, even when I specifically changed the fdr parameter in the input json. I then went into the metadata.json and grepped for fdr_thresh and confirmed the fdr threshold was the value I passed in the input json but the results are always at a fdr of 0.01 regardless of what the input json and metadata.json has. Down below in the troubleshooting section is the output from grepping the metadata.json for “fdr_thresh”.
OS/Platform
- OS/Platform: Ubuntu 16.04
- Conda version: conda 4.8.3
- Pipeline version: [e.g. v1.6.0]
- Caper version: [e.g. v1.2.0]
Caper configuration file
Paste contents of ~/.caper/default.conf
.
backend=local
# Hashing strategy for call-caching (3 choices)
# This parameter is for local (local/slurm/sge/pbs) backend only.
# This is important for call-caching,
# which means re-using outputs from previous/failed workflows.
# Cache will miss if different strategy is used.
# "file" method has been default for all old versions of Caper<1.0.
# "path+modtime" is a new default for Caper>=1.0,
# file: use md5sum hash (slow).
# path: use path.
# path+modtime: use path and modification time.
local-hash-strat=path+modtime
# Local directory for localized files and Cromwell's intermediate files
# If not defined, Caper will make .caper_tmp/ on local-out-dir or CWD.
# /tmp is not recommended here since Caper store all localized data files
# on this directory (e.g. input FASTQs defined as URLs in input JSON).
local-loc-dir=/home/ubuntu/20200730_KRISTINA_CHIPSEQ/tmp-caper-cache
cromwell=/home/ubuntu/.caper/cromwell_jar/cromwell-52.jar
womtool=/home/ubuntu/.caper/womtool_jar/womtool-52.jar
Input JSON file
Paste contents of your input JSON file.
{
"chip.title" : "Kristina CHIPSeq (paired-end) H1 vs G1",
"chip.description" : "Chip H1 vs Chip G1 as control",
"chip.pipeline_type" : "tf",
"chip.aligner" : "bowtie2",
"chip.align_only" : false,
"chip.true_rep_only" : false,
"chip.genome_tsv" : "/home/ubuntu/20200730_KRISTINA_CHIPSEQ/software/chip-seq-pipeline2/mm10.tsv",
"chip.paired_end" : true,
"chip.ctl_paired_end" : true,
"chip.always_use_pooled_ctl" : true,
"chip.fdr_thresh" : 0.05,
"chip.fastqs_rep1_R1" : [ "/home/ubuntu/20200730_KRISTINA_CHIPSEQ/FastQs/Young-ChIP-Seq/Chip_H1/20200622_Chip_H1_S8_L001_R1_001.fastq.gz", "/home/ubuntu/20200730_KRISTINA_CHIPSEQ/FastQs/Young-ChIP-Seq/Chip_H1/20200622_Chip_H1_S8_L002_R1_001.fastq.gz" ],
"chip.fastqs_rep1_R2" : [ "/home/ubuntu/20200730_KRISTINA_CHIPSEQ/FastQs/Young-ChIP-Seq/Chip_H1/20200622_Chip_H1_S8_L001_R2_001.fastq.gz", "/home/ubuntu/20200730_KRISTINA_CHIPSEQ/FastQs/Young-ChIP-Seq/Chip_H1/20200622_Chip_H1_S8_L002_R2_001.fastq.gz" ],
"chip.ctl_fastqs_rep1_R1" : [ "/home/ubuntu/20200730_KRISTINA_CHIPSEQ/FastQs/Young-ChIP-Seq/Chip_G1/20200622_Chip_G1_S7_L001_R1_001.fastq.gz", "/home/ubuntu/20200730_KRISTINA_CHIPSEQ/FastQs/Young-ChIP-Seq/Chip_G1/20200622_Chip_G1_S7_L002_R1_001.fastq.gz" ],
"chip.ctl_fastqs_rep1_R2" : [ "/home/ubuntu/20200730_KRISTINA_CHIPSEQ/FastQs/Young-ChIP-Seq/Chip_G1/20200622_Chip_G1_S7_L001_R2_001.fastq.gz", "/home/ubuntu/20200730_KRISTINA_CHIPSEQ/FastQs/Young-ChIP-Seq/Chip_G1/20200622_Chip_G1_S7_L002_R2_001.fastq.gz" ]
}
Troubleshooting result
If you ran caper run
without Caper server then Caper automatically runs a troubleshooter for failed workflows. Find troubleshooting result in the bottom of Caper’s screen log.
If you ran caper submit
with a running Caper server then first find your workflow ID (1st column) with caper list
and run caper debug [WORKFLOW_ID]
.
Paste troubleshooting result.
Since the pipeline failing isn’t exactly the problem right now, below are contents of metadata.json confirming changed fdr_thresh value.
"inputs": "{\n \"chip.title\" : \"Kristina CHIPSeq (paired-end) H1 vs G1\",\n \"chip.description\" : \"Chip H1 vs Chip G1 as control\",\n\n \"chip.pipeline_type\" : \"tf\",\n \"chip.aligner\" : \"bowtie2\",\n \"chip.align_only\" : false,\n \"chip.true_rep_only\" : false,\n\n \"chip.genome_tsv\" : \"/home/ubuntu/20200730_KRISTINA_CHIPSEQ/software/chip-seq-pipeline2/mm10.tsv\",\n\n \"chip.paired_end\" : true,\n \"chip.ctl_paired_end\" : true,\n\n \"chip.always_use_pooled_ctl\" : true,\n \"chip.fdr_thresh\" : 0.05,\n\n \"chip.fastqs_rep1_R1\" : [ \"/home/ubuntu/20200730_KRISTINA_CHIPSEQ/FastQs/Young-ChIP-Seq/Chip_H1/20200622_Chip_H1_S8_L001_R1_001.fastq.gz\", \"/home/ubuntu/20200730_KRISTINA_CHIPSEQ/FastQs/Young-ChIP-Seq/Chip_H1/20200622_Chip_H1_S8_L002_R1_001.fastq.gz\" ],\n \"chip.fastqs_rep1_R2\" : [ \"/home/ubuntu/20200730_KRISTINA_CHIPSEQ/FastQs/Young-ChIP-Seq/Chip_H1/20200622_Chip_H1_S8_L001_R2_001.fastq.gz\", \"/home/ubuntu/20200730_KRISTINA_CHIPSEQ/FastQs/Young-ChIP-Seq/Chip_H1/20200622_Chip_H1_S8_L002_R2_001.fastq.gz\" ],\n \n \"chip.ctl_fastqs_rep1_R1\" : [ \"/home/ubuntu/20200730_KRISTINA_CHIPSEQ/FastQs/Young-ChIP-Seq/Chip_G1/20200622_Chip_G1_S7_L001_R1_001.fastq.gz\", \"/home/ubuntu/20200730_KRISTINA_CHIPSEQ/FastQs/Young-ChIP-Seq/Chip_G1/20200622_Chip_G1_S7_L002_R1_001.fastq.gz\" ],\n \"chip.ctl_fastqs_rep1_R2\" : [ \"/home/ubuntu/20200730_KRISTINA_CHIPSEQ/FastQs/Young-ChIP-Seq/Chip_G1/20200622_Chip_G1_S7_L001_R2_001.fastq.gz\", \"/home/ubuntu/20200730_KRISTINA_CHIPSEQ/FastQs/Young-ChIP-Seq/Chip_G1/20200622_Chip_G1_S7_L002_R2_001.fastq.gz\" ]\n \n}\n",
"Float fdr_thresh": "B14399CBAAC6DA4B5B733B483106383F",
"fdr_thresh": 0.05,
"Float fdr_thresh": "B14399CBAAC6DA4B5B733B483106383F",
"fdr_thresh": 0.05,
"Float fdr_thresh": "B14399CBAAC6DA4B5B733B483106383F",
"fdr_thresh": 0.05,
"Float fdr_thresh": "B14399CBAAC6DA4B5B733B483106383F",
"fdr_thresh": 0.05,
"fdr_thresh": 0.05,
Issue Analytics
- State:
- Created 3 years ago
- Comments:12 (4 by maintainers)
Top GitHub Comments
@alexadowdell: Currently there is no way to unset numpeaks_threshold (
run_spp.R -npeak=
) in the pipeline. I will fix this in the next release so thatchip.fdr_thresh
is defined in an input JSON thenchip.cap_num_peak_spp
is not passed torun_spp.R
.The run_spp.R script can take it either parameter or both.
Jin - you may need to adjust the way num_peaks is being set to allow for direct use of the FDR parameter. If the user specifies FDR explicitly in the JSON then num_peaks should not be used when calling run_spp.R
-Anshul.
On Thu, Jan 7, 2021 at 2:58 PM Alexa Dowdell notifications@github.com wrote: