question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add option to output gzipped FASTQ in SamToFastq

See original GitHub issue

Hi,

we are routinely extracting FASTQs from large (> 100 GB) BAM files, and it saves a lot of temporary disk space having the extracted data as a .fastq.gz rather than getting a .fastq first and then compressing it later.

From what I can tell there isn’t a way to get a different sort of output from SamToFastq. Searching through forums it looks like the way people do it is to output to /dev/stdout and pipe into gzip. However, you can only do this for either read 1 or read 2, but not both:

For read 1: java -jar picard.jar SamToFastq INPUT=a.bam FASTQ=/dev/stdout SECOND_END_FASTQ=/dev/null QUIET=true | gzip --stdout > a.1.fastq.gz

For read 2: java -jar picard.jar SamToFastq INPUT=a.bam FASTQ=/dev/null SECOND_END_FASTQ=/dev/stdout QUIET=true | gzip --stdout > a.2.fastq.gz

Could you add a way so that both files are outputted as .fastq.gz, without having to read the bam files twice and discarding half the info on each run?

Issue Analytics

  • State:open
  • Created 6 years ago
  • Comments:7 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
pplettnercommented, Jul 18, 2017

Thanks, I tried it and indeed it does automatically gzip with the correct file extension.

Maybe you could add that to the online documentation explicitly somewhere? I had searched the page for gz or gzip and it was only mentioned explicitly in a few places.

As well, IlluminaBasecallsToFastq has a COMPRESS_OUTPUTS option, which is why I didn’t think it would gzip automatically just by the naming convention.

1reaction
tfennecommented, Jul 18, 2017

Perhaps this is not well documented enough, but pretty much any Picard program that generates output files that are not SAM/BAM files has the ability to gzip compress them. All you have to do is name the files with a .gz extension and it happens automatically.

Read more comments on GitHub >

github_iconTop Results From Across the Web

SamToFastq (Picard) - GATK - Broad Institute
Argument name(s) Default value Summary ‑‑INPUT ‑I null Input SAM/BAM file to extract reads from ‑‑help ‑h false display the help message ‑‑version false display the...
Read more >
Picard tools samtofastq for a folder - Biostars
First, use SamToFastq to generate *_R1_fq.gz and *_R2_fq.gz, then use mkdir and mv to move all fastq files to a new folder.
Read more >
FASTQ AND BAM PROCESSING OVERVIEW
Generate BAM/CRAM output given one or more pairs of fastq files. Optionally generate BQSR report. fq2bam performs the following steps. The user ...
Read more >
samtools-fasta(1) manual page
samtools fasta / fastq – converts a SAM/BAM/CRAM file to FASTA or ... using the -s option then only paired sequences will be...
Read more >
SamToFastq - Galaxy | Tool Shed
Select SAM/BAM dataset or dataset collection: · Do you want to output a fastq file per read group (two fastq files per read...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found