Add option to output gzipped FASTQ in SamToFastq
See original GitHub issueHi,
we are routinely extracting FASTQs from large (> 100 GB) BAM files, and it saves a lot of temporary disk space having the extracted data as a .fastq.gz rather than getting a .fastq first and then compressing it later.
From what I can tell there isn’t a way to get a different sort of output from SamToFastq. Searching through forums it looks like the way people do it is to output to /dev/stdout and pipe into gzip. However, you can only do this for either read 1 or read 2, but not both:
For read 1:
java -jar picard.jar SamToFastq INPUT=a.bam FASTQ=/dev/stdout SECOND_END_FASTQ=/dev/null QUIET=true | gzip --stdout > a.1.fastq.gz
For read 2:
java -jar picard.jar SamToFastq INPUT=a.bam FASTQ=/dev/null SECOND_END_FASTQ=/dev/stdout QUIET=true | gzip --stdout > a.2.fastq.gz
Could you add a way so that both files are outputted as .fastq.gz, without having to read the bam files twice and discarding half the info on each run?
Issue Analytics
- State:
- Created 6 years ago
- Comments:7 (2 by maintainers)
Top GitHub Comments
Thanks, I tried it and indeed it does automatically gzip with the correct file extension.
Maybe you could add that to the online documentation explicitly somewhere? I had searched the page for
gz
orgzip
and it was only mentioned explicitly in a few places.As well,
IlluminaBasecallsToFastq
has aCOMPRESS_OUTPUTS
option, which is why I didn’t think it would gzip automatically just by the naming convention.Perhaps this is not well documented enough, but pretty much any Picard program that generates output files that are not SAM/BAM files has the ability to gzip compress them. All you have to do is name the files with a
.gz
extension and it happens automatically.