question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Documentation: speeding up demultiplex of big runs (S4 chips)

See original GitHub issue

Documentation request

Tool(s) involved

ExtractIlluminaBarcodes/IlluminaBasecallsToSam

Description

Hi,

Is there any history of anyone demultiplexing S4 chips using IlluminaBasecallsToSam. The process seems to be extremely slow, to the point of being unusable for production runs. Is there any way of speeding this up?

Runtimes for reference: ExtractIlluminaBarcodes: 22 min (70 threads, local FS, 1 lane) IlluminaBasecallsToSam: 17h+ (cancelled)(70 threads, local FS, 1 lane) Bcl2fastq: 7:45h (slower network FS, 70 threads, all lanes).

Commands used

picard -Xmx430080m ExtractIlluminaBarcodes \
--BARCODE_FILE NVQ_Run314.barcodes.L1.tsv \
--MAX_MISMATCHES 0 \
--METRICS_FILE "barcode_metrics_L1.txt" \
--OUTPUT_DIR . \
--BASECALLS_DIR /projects/demultiplex/211111_A00785_0314_AHKVVTDSX2_bcl/Data/Intensities/BaseCalls --COMPRESSION_LEVEL 5 --LANE 1 --MAX_RECORDS_IN_RAM 7000000 --NUM_PROCESSORS 70 --READ_STRUCTURE 151T8B8B151T --TMP_DIR /tmp --TMP_DIR $PWD

picard -Xmx430080m IlluminaBasecallsToSam \
--BARCODES_DIR . \
--IGNORE_UNEXPECTED_BARCODES true \
--INCLUDE_NON_PF_READS false \
--LIBRARY_PARAMS NVQ_Run314.library_params.L1.tsv \
--READ_GROUP_ID "211111_A00785_0314_AHKVVTDSX2.1" \
--RUN_BARCODE 211111_A00785_0314_AHKVVTDSX2 \
--SEQUENCING_CENTER CMGG \
--SORT true \
--BASECALLS_DIR /projects/demultiplex/211111_A00785_0314_AHKVVTDSX2_bcl/Data/Intensities/BaseCalls --COMPRESSION_LEVEL 5 --LANE 1 --MAX_RECORDS_IN_RAM 7000000 --NUM_PROCESSORS 70 --READ_STRUCTURE 151T8B8B151T --TMP_DIR /tmp --TMP_DIR $PWD

All help is more than welcome Thanks Matthias

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
matthdsmcommented, Mar 16, 2022

Hi John,

So you’re telling me you’ve been able to demultiplex a full S4 chip (300 cycles, PE) in about 1 hour? That’s very impressive. I’m interested in how you were able to do that. As far as I can see, the demux is CPU bound, so our filesystem shouldn’t be an issue (cifs network share for bcl data, cephFS for local storage).

We’re not doing much out of the ordinary:

bcl2fastq \
    --runfolder-dir cifs-drive \
    --output-dir  cehpFS/ \
    --interop-dir cehpFS/InterOp \
    --sample-sheet cehpFS/SampleSheet.csv \
    --loading-threads 70 \
    --processing-threads 70 \
    --writing-threads 70 \
    --barcode-mismatches 0 \
    --no-lane-splitting

I’m very curious how you were able to improve on this. Looking forward to your response!

Matthias

1reaction
iamh2ocommented, Mar 16, 2022

A little late here, but 7hours for bcl2fq2 with 70 cores… an S4 flowcell? That is really very long. (as are the other steps, but I’ve been poking around with bcl2fq, and can get execution times on NFS filesystems of ~1hr). but it is highly dependent on the command line args- if this still interests you, I’d be curious to see your bcl2fq command.

Read more comments on GitHub >

github_iconTop Results From Across the Web

LotuS2: an ultrafast and highly accurate tool for amplicon ...
Abstract. Background: Amplicon sequencing is an established and cost-efficient method for profiling microbiomes. However, many available tools to process ...
Read more >
AD9081/AD9082 System Development User Guide | UG-1578
SCOPE. This user guide provides information for systems engineers and software developers using the AD9081 and AD9082 family of software defined,.
Read more >
Frequently Asked Questions - Support Illumina
Run the RNA-Seq workflow (FASTQ only) on the MiSeq and stream the data to BaseSpace. The BaseSpace RNA-Seq Alignment App analyzes data from...
Read more >
Open On-Chip Debugger: OpenOCD User's Guide
driver; or a big piece of work like supporting a new chip architecture. ... (OpenOCD may be able to use this DCC internally,...
Read more >
TMS320F2837xD Dual-Core Microcontrollers datasheet (Rev ...
independent 32-bit floating-point processor that runs at the same speed as the ... speed parallel connection to FPGAs or other processors with similar...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found