question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Q: Is there a way to control threads used bei pigz?

See original GitHub issue

I’d like to have control over the number of threads used by pigz without modifying the source code. It seems that all cores are used for (de)compression even if I specify --cores?

That’s not a very “social approach” when more users sharing one single server … 😉

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:32 (15 by maintainers)

github_iconTop GitHub Comments

5reactions
marcelmcommented, Nov 4, 2019

Perhaps I should clarify that – contrary to what I wrote in one of the earliest comments in this issue – i acknowledge that there is a problem, and that I am working towards solving it.

However, Cutadapt isn’t my main job, so I need to proceed at my own pace. I can fortunately spend some of my working hours on personal (bioinformatics) projects and have used a lot of that time for Cutadapt. I’m motivated to make Cutadapt work for others, even if I personally don’t benefit from it – but it needs to remain fun. As long as I can do the things I want, it’s fine, but when the discussion moves into a territory where I get the impression that demands are made, then the fun stops. To be sure, the tone in this thread has been civil and reasonable, but it is the sheer amount of text which is not not helping.

Let me figure this out, one step at a time. Currently, I’m trying to make --cores=1 use exactly one core by doing all compression and decompression in-process, without calling an external process at all. With a single core, Cutadapt doesn’t spawn any worker process or I/O processes anyway, so this should be relatively easy. Perhaps I may be able to get this done this or next week.

@wookietreiber Thanks for your insights. I don’t have the time to reply at the moment, so this may need to wait.

One correction: Cutadapt no longer lets pigz use all available cores for compression. This has been limited to 4 since a while back. And decompression has recently been limited to one external process.

3reactions
marcelmcommented, Nov 8, 2019

i’ve just pushed a commit that makes Cutadapt no longer use subprocesses for gzip compression when --cores=1 is used (or when --cores is not specified). Input files are still read through a pigz process (using one thread) because its gzip decompression is more efficient. Total CPU usage is exactly 100%, though (it appears that the two processes never run at the same time).

I think I may remove this reader subprocess as well because gzip decompression is just 2.5% of the total time (when reading a .fastq.gz, removing one adapter, and writing to a `.fastq.gz).

In case anyone is wondering why this was ever done with subprocesses: gzip compression and decompression using Python’s built-in gzip module used to be very slow, so using an external gzip process was a workaround to get good speed (pigz came later). Nowadays, they are equivalent, so we can go back to the builtin.

I’ll start looking into the multi-core case, as time permits.

Read more comments on GitHub >

github_iconTop Results From Across the Web

pigz(1): compress/expand files - Linux man page - Die.net
Pigz compresses using threads to make use of multiple processors and cores. The input is broken up into 128 KB chunks with each...
Read more >
How to Compress Files Faster with Pigz Tool in Linux
Pigz can archive larger files significantly quicker than gzip since it compresses using threads to make use of multiple CPUs and cores.
Read more >
DE1435614A1 - Process for the production of artificial threads ...
Use TI= to search in the title, AB= for the abstract, CL= for the claims, or TAC= for all three. For example, TI=(safety...
Read more >
Homework 3. Multithreaded gzip compression filter
The pigz program can be used as a filter that reads programs from standard ... number of threads to control the compression threads...
Read more >
Python informix - Night In Milano
As of this writing (July 2013), Python has two stable versions commonly used: 2. . To understand how it works, think in each...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found