question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

CollectRnaSeqMetrics uses platform unit instead of read group ID

See original GitHub issue

Bug Report

Affected tool(s)

CollectRnaSeqMetrics

Affected version(s)

  • 2.10.1
  • 2.12.1

Description

When running this tool and collecting metrics at READ_GROUP and ALL_READS levels, the metrics file has the PU string in the READ_GROUP column. For example, when I have this @RG header in my bam:

@RG	ID:110429_UNC14-SN744_0106_AB066TABXX_1	SM:TCGA-CJ-4638-01A-02R-1325-07	LB:unknown	PL:illumina	PU:AB066TABXX.1	CN:UNC-LCCC

The last three columns of the metrics section are:

TCGA-CJ-4638-01A-02R-1325-07	unknown	AB066TABXX.1

and the histrogram header is:

## HISTOGRAM	java.lang.Integer
normalized_position	All_Reads.normalized_coverage	AB066TABXX.1.normalized_coverage

I tested on the most recent version with a different bam, and still had the same issue. Is this expected?

Steps to reproduce

You should be able to reproduce this on and RNAseq bam with appropriate @RG header when requesting METRIC_ACCUMULATION_LEVEL=READ_GROUP.

Expected behavior

I expect the ID string from the @RG header to be used in the READ_GROUP column of the metrics file instead of the PU (why not call it PLATFORM_UNIT instead if that’s what you expect to happen?).

Actual behavior

The PU string is extracted and placed in the READ_GROUP column of the metrics file instead of the ID string.

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Reactions:1
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
kmhernancommented, Sep 20, 2017

Interesting since PU isn’t even part of the official SAM spec. In addition, the Broad GATK forums explicitly says this about read groups as seen here https://gatkforums.broadinstitute.org/gatk/discussion/6472/read-groups

This tag identifies which read group each read belongs to, so each read group's ID must be unique. It is referenced both in the read group definition line in the file header (starting with @RG) and in the RG:Z tag for each read record. Note that some Picard tools have the ability to modify IDs when merging SAM files in order to avoid collisions. In Illumina data, read group IDs are composed using the flowcell + lane name and number, making them a globally unique identifier across all sequencing data in the world.

I guess if you’re taking only a Broad world view, then following the PU guidelines of flowcell.lane.barcode does make them universally unique, but again that’s only the Broad world view.

0reactions
kmhernancommented, Sep 20, 2017

ah yes, I looked at the same PDF but it looked like the table ended instead of overflowed to next page. However, I do indeed have control over the ID as I do not use picard for merging, and think that should be the responsibility of the user to manage. But obviously that isn’t the picard way of merging. Thanks for the clarification.

Read more comments on GitHub >

github_iconTop Results From Across the Web

CollectRnaSeqMetrics (Picard) - GATK - Broad Institute
Produces RNA alignment metrics for a SAM or BAM file. This tool takes a SAM/BAM file containing the aligned reads from an RNAseq...
Read more >
Picard - GitHub Pages
AddOrReplaceReadGroups. Replace read groups in a BAM file. This tool enables the user to replace all read groups in the INPUT file with...
Read more >
Missing fields in Picard 'Collect Alignment Summary Metrics ...
As I mentioned above I think there might be mistake as in the 'Read group' field of the table I get the 'Platform...
Read more >
Galaxy | Tool Shed
+ ID=String Read Group ID Default value: 1. ... + PU=String Read Group platform unit (eg. run barcode) Required. RGSM=String - SM=String Read...
Read more >
RNASequencingwithTopHatand Cufflinks - Support Illumina
The Cufflinks Assembly and Differential Expression App uses previous alignment results ... Total number of reads passing filter for this sample.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found