question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Revert back to limiting number of threads when not explicitely provided

See original GitHub issue

The latest numexpr patch release (2.6.8 -> 2.6.9) had a change that had a much more important impact than what the changelog describes:

more robust handling of the thread-setting environment variables

There were an upper bound on the number of threads used before (8) which is not here anymore. A process that was using 8 threads previously can now use up to <machine CPU count> threads now! This is even more dramatic when using multiprocessing. If you previously had 8 processes using 8 threads (for a total of 64 threads, which was perfectly fine on a 64 cores machine), with 2.6.9 it is now using 8*64=512 threads.

Considering that numexpr is 99% of the time an indirect dependency (from pandas, numpy, etc), which do not enforce patch-level dependency, such high impact changes are potential production-breaking (this happened to us today, for this exact reason).

I don’t think convincing other projects to depend on numexpr up to patch level is reasonable, so numexpr should really try to not have impacting changes without changing at least the minor version.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Reactions:2
  • Comments:12 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
robbmcleodcommented, Jan 11, 2019

Probably 2.6.5 should have been a minor version bump, since that fixed the trouble with the massive 4096 thread threadpool. I guess in my mind I have had NumExpr 2.6 in maintenance mode.

When using “hybrid MPL”-style mixture of multiprocessing and multithreading in a cluster environment, if you aren’t controlling the number of threads per process, that’s a logical bug. Cluster scheduling system don’t force core affinity (that I have ever seen), so it’s generally an honour system that you use the number of cores that you request from the queuing system. In the past version, if you launched 16 processes on a 64-physical core node, you would likely just get away with using 128 threads total thanks to hyperthreading. With the unbound use of cores, your cluster administrators are going to notice in their logs that you have massively oversubscribed a node and thus haven’t configured your job correctly.

Apparently previously the default maximum was covering up this error. So we could revert it back to capping auto-detection at 8 cores. Or we could disable auto-detection of cores completely and use a default of 1 core unless the appropriate environment variables are set, or the developer calls numexpr.set_num_threads(). That would absolutely force people to actually read the documentation and understand how to configure the software, without breaking any systems.

Alternatively, if we do revert to 8 threads, I would prefer to add a warning if they do have more than 8 virtual cores, so people notice that they haven’t configured it.

It would also probably be a good idea to add to our documentation a brief tutorial on how to setup NumExpr to work in a cluster environment.

1reaction
gdementencommented, Jan 10, 2019

Disclaimer: I am not concerned by this issue (I don’t have access to those nice servers).

I just looked at the code for curiosity and I wonder if the more sensible option wouldn’t be to cap at 8 (or whatever number) only in the case where it was set via detect_number_of_cores() and not in the case where it was set via OMP_NUM_THREADS. It was capped in both cases until 2.6.8 and in 2.6.9, it is never capped.

FWIW, the old code was:

    try:
        nthreads = int(os.environ['NUMEXPR_NUM_THREADS'])
    except KeyError:
        nthreads = int(os.environ.get('OMP_NUM_THREADS', detect_number_of_cores()))
        # Check that we don't activate too many threads at the same time.
        # 8 seems a sensible value.
        if nthreads > 8:
            nthreads = 8
Read more comments on GitHub >

github_iconTop Results From Across the Web

The right way to limit maximum number of threads running at ...
I fed it a list of 5000 files to copy and the code went non-responsive once it got up to about 1500 concurrent...
Read more >
Pushing the Limits of Windows: Processes and Threads
I cover thread limits first since every active process has at least one thread (a process that's terminated, but is kept referenced by...
Read more >
Manage concurrent threads - Python Module of the Week
Enumerating All Threads​​ It is not necessary to retain an explicit handle to all of the daemon threads in order to ensure they...
Read more >
Controlling the Number of Threads Used for Execution
The global thread pool has both a soft limit and a hard limit. The number of worker threads available for parallel execution is...
Read more >
Set or get number of threads that data.table should use - GitLab
Set and get number of threads to be used in data.table functions that are parallelized with OpenMP. ... setDTthreads() will not allow more...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found