question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Bug in CollectMultipleMetrics:CollectRnaSeqMetrics: No refflat input

See original GitHub issue

Bug Report

Affected tool(s)

CollectMultipleMetrics

Affected version(s)

  • Latest public release version [version?]
  • Latest development/master branch as of [date of test?]

Description

CollectRnaSeqMetrics requires a refflat file of reference gene annotations to run. It was added as a metric in Picard’s CollectMultipleMetrics. In CMM it does not take in a refflat file, so it does not work as is. The change would be super minor but I think single pass sams do not take in inputs other than input/output/reference, so I think adding refflat may require a change to single pass sam as well.

Steps to reproduce

_java -jar picard.jar CollectMultipleMetrics VALIDATION_STRINGENCY=SILENT METRIC_ACCUMULATION_LEVEL=ALL_READS  INPUT=test.bam OUTPUT=output_filename  FILE_EXTENSION=".txt" REFERENCE_SEQUENCE=Homo_sapiens_assembly19.fasta  ASSUME_SORTED=true  PROGRAM=null PROGRAM=RnaSeqMetrics       
11:31:45.913 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/seq/mint/jishuxu/softwares/picard/build/libs/picard.jar!/com/intel/gkl/native/libgkl_compression.so
[Wed Nov 22 11:31:45 EST 2017] CollectMultipleMetrics INPUT=test.bam  ASSUME_SORTED=true OUTPUT=output_filename METRIC_ACCUMULATION_LEVEL=[ALL_READS] FILE_EXTENSION=.txt PROGRAM=[RnaSeqMetrics] VALIDATION_STRINGENCY=SILENT REFERENCE_SEQUENCE=/seq/references/Homo_sapiens_assembly19/v1/Homo_sapiens_assembly19.fasta    STOP_AFTER=0 INCLUDE_UNPAIRED=false VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Wed Nov 22 11:31:45 EST 2017] Executing as user on Linux 2.6.32-696.6.3.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_121-b13; Deflater: Intel; Inflater: Intel; Picard version: 2.15.0-4-gb0b9f78-SNAPSHOT
[Wed Nov 22 11:31:48 EST 2017] picard.analysis.CollectMultipleMetrics done. Elapsed time: 0.04 minutes.
Runtime.totalMemory()=2058354688
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" java.lang.NullPointerException
    at htsjdk.samtools.util.IOUtil.openFileForReading(IOUtil.java:615)
    at picard.util.BasicInputParser.filesToInputStreams(BasicInputParser.java:172)
    at picard.util.BasicInputParser.<init>(BasicInputParser.java:78)
    at picard.util.TabbedInputParser.<init>(TabbedInputParser.java:51)
    at picard.util.TabbedTextFileWithHeaderParser.<init>(TabbedTextFileWithHeaderParser.java:125)
    at picard.annotation.RefFlatReader.load(RefFlatReader.java:73)
    at picard.annotation.RefFlatReader.load(RefFlatReader.java:66)
    at picard.annotation.GeneAnnotationReader.loadRefFlat(GeneAnnotationReader.java:37)
    at picard.analysis.CollectRnaSeqMetrics.setup(CollectRnaSeqMetrics.java:161)
    at picard.analysis.SinglePassSamProgram.makeItSo(SinglePassSamProgram.java:129)
    at picard.analysis.CollectMultipleMetrics.doWork(CollectMultipleMetrics.java:426)
    at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:268)
    at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:98)
    at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:108)_

Expected behavior

CollectRnaSeqMetrics should be able to run as a member of CollectMultipleMetrics

Actual behavior

CollectRnaSeqMetrics cannot run, see error message above

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:8 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
yfarjouncommented, Apr 22, 2019

I would actually call this issue resolved…if someone wants to open a separate ticket for injecting tool-specific arguments in CMM that’s fine…but should not be riding on this issue.

0reactions
cmnbroadcommented, Apr 16, 2019

Hm - I suppose that could be done, but it would require either translating the extra args to a command line string and delegating to the existing parser, or else a lot of redundancy with the existing parser for things like creating/populating enums, lists, error handling, etc.

Another option would be to invest in refactoring the metrics collectors so they’re re-useable components, rather than specialized CommandLinePrograms. This has already been done in GATK so it could use the same identical component, i.e., InsertSizeMetricsCollector, from CollectInsertSizeMetrics, CollectInsertSizeMetricsSpark, CollectMultipleMetrics and CollectMultipleMetricsSpark. The args for each metrics collector live in an argument class that can be hosted in multiple tools, and annotated with @ArgumentCollection. The actual collector code lives in a component class that implements a known collector interface, with a type parameter for the corresponding argument collection class. The tools become nothing more than shells that host the collector(s) and argument collection(s).

I think the general pattern could be leveraged to help solve this problem. You still need a bridge to propagate command line args for CMM, but it eliminates the proliferation of collector-specific arguments in the interface.

Much of the refactored code was removed when we abandoned the GATK/Picard source consolidation, and the Spark aspects of it required more machinery than Picard would need, but see for example InsertSizeMetricsArgumentCollection, InsertSizeMetricsCollector, and CollectInsertSizeMetricsSpark. More detail is also here.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Problem running CollectRnaSeqMetrics - Galaxy Help
Hello,. These tools had previous issues that have not been confirmed to be completely resolved. OR the input may just need to be...
Read more >
CollectRnaSeqMetrics (Picard) - GATK - Broad Institute
Produces RNA alignment metrics for a SAM or BAM file. This tool takes a SAM/BAM file containing the aligned reads from an RNAseq...
Read more >
Run Picard tools and collate multiple metrics files
refFlat > for the REF_FLAT argument of the CollectRnaSeqMetrics tool. Run this command on your optionally gzipped GTF file, and the output file...
Read more >
Getting started with Picard - of Dave Tang
CollectRnaSeqMetrics. The CollectRnaSeqMetrics looks very useful; from the documentation: Program to collect metrics about the alignment of RNA ...
Read more >
picard.wdl · simplify-lima · biowdl / tasks · GitLab
Additional * 2 because picard multiple metrics reads the reference fasta twice. ... {description: "The input BAM file for which metrics will be...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found