Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Numexpr is using large amounts of memory on import

See original GitHub issue

Hello, I’ve discovered that numexpr is using approx. 1Gb of memory per thread simply when the library is imported.

Here is some sample code that demonstrates the issue:

import subprocess
import os
my_pid = os.getpid()

try:
    import numexpr as ne
except ImportError:
    ne = None

p = subprocess.Popen("grep VmPeak /proc/"+str(my_pid)+"/status", stdout=subprocess.PIPE, shell=True)
(output, err) = p.communicate()
p_status = p.wait()
print(os.environ['NUMEXPR_MAX_THREADS'],output[8:20].decode())

And an example of the output, where test2.py is the code given above:

for i in {4..56..4}; do export NUMEXPR_MAX_THREADS=$i; python test2.py ; done
4  4080524 kB
8  8080540 kB
12 12080556 kB
16 16080572 kB
20 20080588 kB
24 24080604 kB
28 28080620 kB
32 32080636 kB
36 36080652 kB
40 40080668 kB
44 44080684 kB
48 48080700 kB
52 52080716 kB
56 56080732 kB

This occurs both in the 2.6.9 release and in the latest github build, 2.6.10dev0.

Is this expected? It seems like a lot of memory to be used just when importing. In case helpful for debugging: I’ve tried this code on linux machines with between 8 - 128 cores. The operating systems tested are: Ubuntu 14.04, 16.04, 18.04, Mint 19.

(edit) This is on a fresh python / conda install py the way. Here’s the output of conda list:

# Name                    Version                   Build  Channel
bzip2                     1.0.8                h516909a_0    conda-forge
ca-certificates           2019.6.16            hecc5488_0    conda-forge
certifi                   2019.6.16                py37_1    conda-forge
libblas                   3.8.0               11_openblas    conda-forge
libcblas                  3.8.0               11_openblas    conda-forge
libffi                    3.2.1             he1b5a44_1006    conda-forge
libgcc-ng                 9.1.0                hdf63c60_0    anaconda
libgfortran-ng            7.3.0                hdf63c60_0    anaconda
liblapack                 3.8.0               11_openblas    conda-forge
libopenblas               0.3.6                h6e990d7_6    conda-forge
libstdcxx-ng              9.1.0                hdf63c60_0    anaconda
ncurses                   6.1               hf484d3e_1002    conda-forge
numexpr                   2.6.10.dev0              pypi_0    pypi
numpy                     1.17.0           py37h95a1406_0    conda-forge
openssl                   1.1.1c               h516909a_0    conda-forge
pip                       19.2.1                   py37_0    conda-forge
python                    3.7.3                h33d41f4_1    conda-forge
readline                  8.0                  hf8c457e_0    conda-forge
setuptools                41.0.1                   py37_0    conda-forge
sqlite                    3.29.0               hcee41ef_0    conda-forge
tk                        8.6.9             hed695b0_1002    conda-forge
wheel                     0.33.4                   py37_0    conda-forge
xz                        5.2.4             h14c3975_1001    conda-forge
zlib                      1.2.11            h516909a_1005    conda-forge

Issue Analytics

State:
Created 4 years ago
Reactions:1
Comments:22 (6 by maintainers)

Top GitHub Comments

1reaction

simonrp84commented, Aug 6, 2019

Here’s some more comprehensive tests.

Importing only numpy: Constant: OPENBLAS_NUM_THREADS=1 Variable: OMP_NUM_THREADS=4...56 Result: Fixed memory usage on import, 87700kB

Importing only numpy: Variable: OPENBLAS_NUM_THREADS=4..56 Constant: OMP_NUM_THREADS=1 Result: Variable memory usage on import, 3186032kB for 4 BLAS threads, 56890180kB for 56 BLAS threads.

Importing only numexpr: Constant: OPENBLAS_NUM_THREADS=1 Variable: OMP_NUM_THREADS=4...56 Result: Variable memory usage on import, 4092092kB for one OMP thread, 56092300kB for 56 OMP threads.

Importing only numexpr: Variable: OPENBLAS_NUM_THREADS=4...56 Constant: OMP_NUM_THREADS=1 Result: Variable memory usage on import, 3190392kB for one BLAS thread, 56894552kB for 56 BLAS threads.

In all cases NUMEXPR_MAX_THREADS=1

Hope that helps.

1reaction

pnuucommented, Aug 6, 2019

And setting all of the env variables gives the same memory usage as export OMP_NUM_THREADS=1 alone.

Top Results From Across the Web

NumExpr 2.0 User Guide — numexpr 2.6.3.dev0 documentation

Spawning large numbers of threads is not free, and can increase import times for NumExpr or packages that import it such as Pandas...

Issues installing Numexpr ( and PyTables as a result)

1) Check c:\python27\DLLs for the dll it's trying to import. 2) See what part of the procedure is throwing the error. Use ipython...

numexpr 2.5.1 - PyPI

and use less memory than doing the same calculation in Python. ... Due to this, Numexpr works best with large arrays. ... import...

Enhancing performance — pandas 1.5.2 documentation

First we're going to need to import the Cython magic function to IPython: ... In general, the Numba engine is performant with a...

numexpr - UsersGuide.wiki - Google Code

You can test numexpr with: $ python -c "import numexpr; numexpr.test()" ... 64-bit signed integer (long), double-precision floating point number (float), ...