Documentation: speeding up demultiplex of big runs (S4 chips)
See original GitHub issueDocumentation request
Tool(s) involved
ExtractIlluminaBarcodes/IlluminaBasecallsToSam
Description
Hi,
Is there any history of anyone demultiplexing S4 chips using IlluminaBasecallsToSam
. The process seems to be extremely slow, to the point of being unusable for production runs.
Is there any way of speeding this up?
Runtimes for reference: ExtractIlluminaBarcodes: 22 min (70 threads, local FS, 1 lane) IlluminaBasecallsToSam: 17h+ (cancelled)(70 threads, local FS, 1 lane) Bcl2fastq: 7:45h (slower network FS, 70 threads, all lanes).
Commands used
picard -Xmx430080m ExtractIlluminaBarcodes \
--BARCODE_FILE NVQ_Run314.barcodes.L1.tsv \
--MAX_MISMATCHES 0 \
--METRICS_FILE "barcode_metrics_L1.txt" \
--OUTPUT_DIR . \
--BASECALLS_DIR /projects/demultiplex/211111_A00785_0314_AHKVVTDSX2_bcl/Data/Intensities/BaseCalls --COMPRESSION_LEVEL 5 --LANE 1 --MAX_RECORDS_IN_RAM 7000000 --NUM_PROCESSORS 70 --READ_STRUCTURE 151T8B8B151T --TMP_DIR /tmp --TMP_DIR $PWD
picard -Xmx430080m IlluminaBasecallsToSam \
--BARCODES_DIR . \
--IGNORE_UNEXPECTED_BARCODES true \
--INCLUDE_NON_PF_READS false \
--LIBRARY_PARAMS NVQ_Run314.library_params.L1.tsv \
--READ_GROUP_ID "211111_A00785_0314_AHKVVTDSX2.1" \
--RUN_BARCODE 211111_A00785_0314_AHKVVTDSX2 \
--SEQUENCING_CENTER CMGG \
--SORT true \
--BASECALLS_DIR /projects/demultiplex/211111_A00785_0314_AHKVVTDSX2_bcl/Data/Intensities/BaseCalls --COMPRESSION_LEVEL 5 --LANE 1 --MAX_RECORDS_IN_RAM 7000000 --NUM_PROCESSORS 70 --READ_STRUCTURE 151T8B8B151T --TMP_DIR /tmp --TMP_DIR $PWD
All help is more than welcome Thanks Matthias
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (2 by maintainers)
Top Results From Across the Web
LotuS2: an ultrafast and highly accurate tool for amplicon ...
Abstract. Background: Amplicon sequencing is an established and cost-efficient method for profiling microbiomes. However, many available tools to process ...
Read more >AD9081/AD9082 System Development User Guide | UG-1578
SCOPE. This user guide provides information for systems engineers and software developers using the AD9081 and AD9082 family of software defined,.
Read more >Frequently Asked Questions - Support Illumina
Run the RNA-Seq workflow (FASTQ only) on the MiSeq and stream the data to BaseSpace. The BaseSpace RNA-Seq Alignment App analyzes data from...
Read more >Open On-Chip Debugger: OpenOCD User's Guide
driver; or a big piece of work like supporting a new chip architecture. ... (OpenOCD may be able to use this DCC internally,...
Read more >TMS320F2837xD Dual-Core Microcontrollers datasheet (Rev ...
independent 32-bit floating-point processor that runs at the same speed as the ... speed parallel connection to FPGAs or other processors with similar...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hi John,
So you’re telling me you’ve been able to demultiplex a full S4 chip (300 cycles, PE) in about 1 hour? That’s very impressive. I’m interested in how you were able to do that. As far as I can see, the demux is CPU bound, so our filesystem shouldn’t be an issue (cifs network share for bcl data, cephFS for local storage).
We’re not doing much out of the ordinary:
I’m very curious how you were able to improve on this. Looking forward to your response!
Matthias
A little late here, but 7hours for bcl2fq2 with 70 cores… an S4 flowcell? That is really very long. (as are the other steps, but I’ve been poking around with bcl2fq, and can get execution times on NFS filesystems of ~1hr). but it is highly dependent on the command line args- if this still interests you, I’d be curious to see your bcl2fq command.