CollectRnaSeqMetrics uses platform unit instead of read group ID
See original GitHub issueBug Report
Affected tool(s)
CollectRnaSeqMetrics
Affected version(s)
- 2.10.1
- 2.12.1
Description
When running this tool and collecting metrics at READ_GROUP
and ALL_READS
levels, the metrics file has the PU
string in the READ_GROUP
column. For example, when I have this @RG
header in my bam:
@RG ID:110429_UNC14-SN744_0106_AB066TABXX_1 SM:TCGA-CJ-4638-01A-02R-1325-07 LB:unknown PL:illumina PU:AB066TABXX.1 CN:UNC-LCCC
The last three columns of the metrics section are:
TCGA-CJ-4638-01A-02R-1325-07 unknown AB066TABXX.1
and the histrogram header is:
## HISTOGRAM java.lang.Integer
normalized_position All_Reads.normalized_coverage AB066TABXX.1.normalized_coverage
I tested on the most recent version with a different bam, and still had the same issue. Is this expected?
Steps to reproduce
You should be able to reproduce this on and RNAseq bam with appropriate @RG
header when requesting METRIC_ACCUMULATION_LEVEL=READ_GROUP
.
Expected behavior
I expect the ID
string from the @RG
header to be used in the READ_GROUP
column of the metrics file instead of the PU
(why not call it PLATFORM_UNIT instead if that’s what you expect to happen?).
Actual behavior
The PU
string is extracted and placed in the READ_GROUP
column of the metrics file instead of the ID
string.
Issue Analytics
- State:
- Created 6 years ago
- Reactions:1
- Comments:5 (2 by maintainers)
Top GitHub Comments
Interesting since PU isn’t even part of the official SAM spec. In addition, the Broad GATK forums explicitly says this about read groups as seen here https://gatkforums.broadinstitute.org/gatk/discussion/6472/read-groups
I guess if you’re taking only a Broad world view, then following the PU guidelines of
flowcell.lane.barcode
does make them universally unique, but again that’s only the Broad world view.ah yes, I looked at the same PDF but it looked like the table ended instead of overflowed to next page. However, I do indeed have control over the ID as I do not use picard for merging, and think that should be the responsibility of the user to manage. But obviously that isn’t the picard way of merging. Thanks for the clarification.