Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Revert back to limiting number of threads when not explicitely provided

See original GitHub issue

The latest numexpr patch release (2.6.8 -> 2.6.9) had a change that had a much more important impact than what the changelog describes:

more robust handling of the thread-setting environment variables

There were an upper bound on the number of threads used before (8) which is not here anymore. A process that was using 8 threads previously can now use up to <machine CPU count> threads now! This is even more dramatic when using multiprocessing. If you previously had 8 processes using 8 threads (for a total of 64 threads, which was perfectly fine on a 64 cores machine), with 2.6.9 it is now using 8*64=512 threads.

Considering that numexpr is 99% of the time an indirect dependency (from pandas, numpy, etc), which do not enforce patch-level dependency, such high impact changes are potential production-breaking (this happened to us today, for this exact reason).

I don’t think convincing other projects to depend on numexpr up to patch level is reasonable, so numexpr should really try to not have impacting changes without changing at least the minor version.

Issue Analytics

State:
Created 5 years ago
Reactions:2
Comments:12 (7 by maintainers)

Top GitHub Comments

1reaction

robbmcleodcommented, Jan 11, 2019

Probably 2.6.5 should have been a minor version bump, since that fixed the trouble with the massive 4096 thread threadpool. I guess in my mind I have had NumExpr 2.6 in maintenance mode.

When using “hybrid MPL”-style mixture of multiprocessing and multithreading in a cluster environment, if you aren’t controlling the number of threads per process, that’s a logical bug. Cluster scheduling system don’t force core affinity (that I have ever seen), so it’s generally an honour system that you use the number of cores that you request from the queuing system. In the past version, if you launched 16 processes on a 64-physical core node, you would likely just get away with using 128 threads total thanks to hyperthreading. With the unbound use of cores, your cluster administrators are going to notice in their logs that you have massively oversubscribed a node and thus haven’t configured your job correctly.

Apparently previously the default maximum was covering up this error. So we could revert it back to capping auto-detection at 8 cores. Or we could disable auto-detection of cores completely and use a default of 1 core unless the appropriate environment variables are set, or the developer calls numexpr.set_num_threads(). That would absolutely force people to actually read the documentation and understand how to configure the software, without breaking any systems.

Alternatively, if we do revert to 8 threads, I would prefer to add a warning if they do have more than 8 virtual cores, so people notice that they haven’t configured it.

It would also probably be a good idea to add to our documentation a brief tutorial on how to setup NumExpr to work in a cluster environment.

1reaction

gdementencommented, Jan 10, 2019

Disclaimer: I am not concerned by this issue (I don’t have access to those nice servers).

I just looked at the code for curiosity and I wonder if the more sensible option wouldn’t be to cap at 8 (or whatever number) only in the case where it was set via detect_number_of_cores() and not in the case where it was set via OMP_NUM_THREADS. It was capped in both cases until 2.6.8 and in 2.6.9, it is never capped.

FWIW, the old code was:

    try:
        nthreads = int(os.environ['NUMEXPR_NUM_THREADS'])
    except KeyError:
        nthreads = int(os.environ.get('OMP_NUM_THREADS', detect_number_of_cores()))
        # Check that we don't activate too many threads at the same time.
        # 8 seems a sensible value.
        if nthreads > 8:
            nthreads = 8