Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Q: Is there a way to control threads used bei pigz?

See original GitHub issue

I’d like to have control over the number of threads used by pigz without modifying the source code. It seems that all cores are used for (de)compression even if I specify --cores?

That’s not a very “social approach” when more users sharing one single server … 😉

Issue Analytics

State:
Created 6 years ago
Comments:32 (15 by maintainers)

Top GitHub Comments

5reactions

marcelmcommented, Nov 4, 2019

Perhaps I should clarify that – contrary to what I wrote in one of the earliest comments in this issue – i acknowledge that there is a problem, and that I am working towards solving it.

However, Cutadapt isn’t my main job, so I need to proceed at my own pace. I can fortunately spend some of my working hours on personal (bioinformatics) projects and have used a lot of that time for Cutadapt. I’m motivated to make Cutadapt work for others, even if I personally don’t benefit from it – but it needs to remain fun. As long as I can do the things I want, it’s fine, but when the discussion moves into a territory where I get the impression that demands are made, then the fun stops. To be sure, the tone in this thread has been civil and reasonable, but it is the sheer amount of text which is not not helping.

Let me figure this out, one step at a time. Currently, I’m trying to make --cores=1 use exactly one core by doing all compression and decompression in-process, without calling an external process at all. With a single core, Cutadapt doesn’t spawn any worker process or I/O processes anyway, so this should be relatively easy. Perhaps I may be able to get this done this or next week.

@wookietreiber Thanks for your insights. I don’t have the time to reply at the moment, so this may need to wait.

One correction: Cutadapt no longer lets pigz use all available cores for compression. This has been limited to 4 since a while back. And decompression has recently been limited to one external process.

3reactions

marcelmcommented, Nov 8, 2019

i’ve just pushed a commit that makes Cutadapt no longer use subprocesses for gzip compression when --cores=1 is used (or when --cores is not specified). Input files are still read through a pigz process (using one thread) because its gzip decompression is more efficient. Total CPU usage is exactly 100%, though (it appears that the two processes never run at the same time).

I think I may remove this reader subprocess as well because gzip decompression is just 2.5% of the total time (when reading a .fastq.gz, removing one adapter, and writing to a `.fastq.gz).

In case anyone is wondering why this was ever done with subprocesses: gzip compression and decompression using Python’s built-in gzip module used to be very slow, so using an external gzip process was a workaround to get good speed (pigz came later). Nowadays, they are equivalent, so we can go back to the builtin.

I’ll start looking into the multi-core case, as time permits.