Bug in CollectMultipleMetrics:CollectRnaSeqMetrics: No refflat input
See original GitHub issueBug Report
Affected tool(s)
CollectMultipleMetrics
Affected version(s)
- Latest public release version [version?]
- Latest development/master branch as of [date of test?]
Description
CollectRnaSeqMetrics requires a refflat file of reference gene annotations to run. It was added as a metric in Picard’s CollectMultipleMetrics. In CMM it does not take in a refflat file, so it does not work as is. The change would be super minor but I think single pass sams do not take in inputs other than input/output/reference, so I think adding refflat may require a change to single pass sam as well.
Steps to reproduce
_java -jar picard.jar CollectMultipleMetrics VALIDATION_STRINGENCY=SILENT METRIC_ACCUMULATION_LEVEL=ALL_READS INPUT=test.bam OUTPUT=output_filename FILE_EXTENSION=".txt" REFERENCE_SEQUENCE=Homo_sapiens_assembly19.fasta ASSUME_SORTED=true PROGRAM=null PROGRAM=RnaSeqMetrics
11:31:45.913 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/seq/mint/jishuxu/softwares/picard/build/libs/picard.jar!/com/intel/gkl/native/libgkl_compression.so
[Wed Nov 22 11:31:45 EST 2017] CollectMultipleMetrics INPUT=test.bam ASSUME_SORTED=true OUTPUT=output_filename METRIC_ACCUMULATION_LEVEL=[ALL_READS] FILE_EXTENSION=.txt PROGRAM=[RnaSeqMetrics] VALIDATION_STRINGENCY=SILENT REFERENCE_SEQUENCE=/seq/references/Homo_sapiens_assembly19/v1/Homo_sapiens_assembly19.fasta STOP_AFTER=0 INCLUDE_UNPAIRED=false VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Wed Nov 22 11:31:45 EST 2017] Executing as user on Linux 2.6.32-696.6.3.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_121-b13; Deflater: Intel; Inflater: Intel; Picard version: 2.15.0-4-gb0b9f78-SNAPSHOT
[Wed Nov 22 11:31:48 EST 2017] picard.analysis.CollectMultipleMetrics done. Elapsed time: 0.04 minutes.
Runtime.totalMemory()=2058354688
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" java.lang.NullPointerException
at htsjdk.samtools.util.IOUtil.openFileForReading(IOUtil.java:615)
at picard.util.BasicInputParser.filesToInputStreams(BasicInputParser.java:172)
at picard.util.BasicInputParser.<init>(BasicInputParser.java:78)
at picard.util.TabbedInputParser.<init>(TabbedInputParser.java:51)
at picard.util.TabbedTextFileWithHeaderParser.<init>(TabbedTextFileWithHeaderParser.java:125)
at picard.annotation.RefFlatReader.load(RefFlatReader.java:73)
at picard.annotation.RefFlatReader.load(RefFlatReader.java:66)
at picard.annotation.GeneAnnotationReader.loadRefFlat(GeneAnnotationReader.java:37)
at picard.analysis.CollectRnaSeqMetrics.setup(CollectRnaSeqMetrics.java:161)
at picard.analysis.SinglePassSamProgram.makeItSo(SinglePassSamProgram.java:129)
at picard.analysis.CollectMultipleMetrics.doWork(CollectMultipleMetrics.java:426)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:268)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:98)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:108)_
Expected behavior
CollectRnaSeqMetrics should be able to run as a member of CollectMultipleMetrics
Actual behavior
CollectRnaSeqMetrics cannot run, see error message above
Issue Analytics
- State:
- Created 6 years ago
- Comments:8 (7 by maintainers)
Top Results From Across the Web
Problem running CollectRnaSeqMetrics - Galaxy Help
Hello,. These tools had previous issues that have not been confirmed to be completely resolved. OR the input may just need to be...
Read more >CollectRnaSeqMetrics (Picard) - GATK - Broad Institute
Produces RNA alignment metrics for a SAM or BAM file. This tool takes a SAM/BAM file containing the aligned reads from an RNAseq...
Read more >Run Picard tools and collate multiple metrics files
refFlat > for the REF_FLAT argument of the CollectRnaSeqMetrics tool. Run this command on your optionally gzipped GTF file, and the output file...
Read more >Getting started with Picard - of Dave Tang
CollectRnaSeqMetrics. The CollectRnaSeqMetrics looks very useful; from the documentation: Program to collect metrics about the alignment of RNA ...
Read more >picard.wdl · simplify-lima · biowdl / tasks · GitLab
Additional * 2 because picard multiple metrics reads the reference fasta twice. ... {description: "The input BAM file for which metrics will be...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I would actually call this issue resolved…if someone wants to open a separate ticket for injecting tool-specific arguments in CMM that’s fine…but should not be riding on this issue.
Hm - I suppose that could be done, but it would require either translating the extra args to a command line string and delegating to the existing parser, or else a lot of redundancy with the existing parser for things like creating/populating enums, lists, error handling, etc.
Another option would be to invest in refactoring the metrics collectors so they’re re-useable components, rather than specialized CommandLinePrograms. This has already been done in GATK so it could use the same identical component, i.e.,
InsertSizeMetricsCollector
, fromCollectInsertSizeMetrics
,CollectInsertSizeMetricsSpark
,CollectMultipleMetrics
andCollectMultipleMetricsSpark
. The args for each metrics collector live in an argument class that can be hosted in multiple tools, and annotated with@ArgumentCollection
. The actual collector code lives in a component class that implements a known collector interface, with a type parameter for the corresponding argument collection class. The tools become nothing more than shells that host the collector(s) and argument collection(s).I think the general pattern could be leveraged to help solve this problem. You still need a bridge to propagate command line args for CMM, but it eliminates the proliferation of collector-specific arguments in the interface.
Much of the refactored code was removed when we abandoned the GATK/Picard source consolidation, and the Spark aspects of it required more machinery than Picard would need, but see for example InsertSizeMetricsArgumentCollection, InsertSizeMetricsCollector, and CollectInsertSizeMetricsSpark. More detail is also here.