Numba appears ~10X slower running in docker versus native on OSX
See original GitHub issueTLDR: Running the same numba code in docker versus on OSX appears almost ~10X slower using timeit benchmarks.
- Numba on OSX: 0.678s runtime
- Numba in docker: 4.900s runtime
(N=10000)
- I have tried using the latest released version of Numba (most recent is visible in the change log (https://github.com/numba/numba/blob/main/CHANGE_LOG).
- I have included a self contained code sample to reproduce the problem. i.e. it’s possible to run as ‘python bug.py’.
Hi! I really enjoy numba, and appreciate all the work that goes into it. I wanted to report a strange phenomenon that I thought might be noteworthy. I’m doing some numerical simulation using multinomial distributions. I’m doing millions of samplings and so calculation of a logpmf
is becoming a bottleneck. I wanted to give numba a try to see if there was any speed up possible. I’ve attempted to rewrite the hot code paths in numba to see if there were any potential gains there. I however notice the same code running on OSX runs much slower when running in a docker image. The code needs to run in a docker container eventually because it will be deployed in a k8s cluster. Perhaps I’m not writing the code in the most optimal way, but it does seem strange that running in docker could cause such a dramatic change in performance. I might expect a few percentage points because of virtualisation, but not this much. I’ve attached a reproducible example below.
Potential docker considerations:
- Using an intel mac, not arm.
- Docker VM has 10CPUs available vs 12 for the OSX.
- Docker VM has 1GB of swap available.
import ctypes
import timeit
import numba
import numpy
from numba import extending
from numpy.typing import NDArray
_PTR = ctypes.POINTER
_dble = ctypes.c_double
_ptr_dble = _PTR(_dble)
gammaln_functype = ctypes.CFUNCTYPE(_dble, _dble)
cython_gammaln = gammaln_functype(
extending.get_cython_function_address("scipy.special.cython_special", "gammaln")
)
xlogy_functype = ctypes.CFUNCTYPE(_dble, _dble, _dble)
cython_xlogy = xlogy_functype(
extending.get_cython_function_address("scipy.special.cython_special", "__pyx_fuse_1xlogy")
)
@numba.vectorize([numba.float64(numba.float64)])
@numba.jit(nopython=True, parallel=True, fastmath=True)
def numba_gammaln(x):
return cython_gammaln(x)
@numba.vectorize([numba.float64(numba.float64, numba.float64)])
@numba.jit(nopython=True, parallel=True, fastmath=True)
def numba_xlogy(x, y):
return cython_xlogy(x, y)
@numba.jit(nopython=True, parallel=True, fastmath=True)
def numba_logpmf(x: NDArray[numpy.int_], n: int, p: NDArray[numpy.float32]) -> float:
"""Calculate the log probability mass function using vectorised scipy special functions."""
difference = numpy.sum(numba_xlogy(x, p) - numba_gammaln(x + 1))
return cython_gammaln(n + 1) + difference
#
# Benchmark
#
N_TESTS = 10000
n = 10
p = numpy.array([0.08333333, 0.08333333, 0.08333333, 0.08333333, 0.83333333])
x = numpy.array([1, 1, 1, 1, 6])
# Trigger JIT if required
numba_logpmf(x, n, p)
print(timeit.timeit(lambda: numba_logpmf(x, n, p), number=N_TESTS))
Issue Analytics
- State:
- Created a year ago
- Comments:16 (5 by maintainers)
I am glad you managed to resolve this issue. I have no idea what package could be missing that would incur such a performance penalty.
Meanwhile I not see any reason to “forget compiled functions” in Docker. it is stored in
__pycache__
? I running 4 testing attempts on same dataset (just hitting F5 for api endpoint which use timezonefinder under the hood) so any cache (if it has been created should be in place)