Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Numba appears ~10X slower running in docker versus native on OSX

See original GitHub issue

TLDR: Running the same numba code in docker versus on OSX appears almost ~10X slower using timeit benchmarks.

Numba on OSX: 0.678s runtime
Numba in docker: 4.900s runtime

(N=10000)

I have tried using the latest released version of Numba (most recent is visible in the change log (https://github.com/numba/numba/blob/main/CHANGE_LOG).
I have included a self contained code sample to reproduce the problem. i.e. it’s possible to run as ‘python bug.py’.

Hi! I really enjoy numba, and appreciate all the work that goes into it. I wanted to report a strange phenomenon that I thought might be noteworthy. I’m doing some numerical simulation using multinomial distributions. I’m doing millions of samplings and so calculation of a logpmf is becoming a bottleneck. I wanted to give numba a try to see if there was any speed up possible. I’ve attempted to rewrite the hot code paths in numba to see if there were any potential gains there. I however notice the same code running on OSX runs much slower when running in a docker image. The code needs to run in a docker container eventually because it will be deployed in a k8s cluster. Perhaps I’m not writing the code in the most optimal way, but it does seem strange that running in docker could cause such a dramatic change in performance. I might expect a few percentage points because of virtualisation, but not this much. I’ve attached a reproducible example below.

Potential docker considerations:

Using an intel mac, not arm.
Docker VM has 10CPUs available vs 12 for the OSX.
Docker VM has 1GB of swap available.

import ctypes
import timeit

import numba
import numpy
from numba import extending
from numpy.typing import NDArray

_PTR = ctypes.POINTER
_dble = ctypes.c_double
_ptr_dble = _PTR(_dble)

gammaln_functype = ctypes.CFUNCTYPE(_dble, _dble)
cython_gammaln = gammaln_functype(
    extending.get_cython_function_address("scipy.special.cython_special", "gammaln")
)

xlogy_functype = ctypes.CFUNCTYPE(_dble, _dble, _dble)
cython_xlogy = xlogy_functype(
    extending.get_cython_function_address("scipy.special.cython_special", "__pyx_fuse_1xlogy")
)


@numba.vectorize([numba.float64(numba.float64)])
@numba.jit(nopython=True, parallel=True, fastmath=True)
def numba_gammaln(x):
    return cython_gammaln(x)


@numba.vectorize([numba.float64(numba.float64, numba.float64)])
@numba.jit(nopython=True, parallel=True, fastmath=True)
def numba_xlogy(x, y):
    return cython_xlogy(x, y)


@numba.jit(nopython=True, parallel=True, fastmath=True)
def numba_logpmf(x: NDArray[numpy.int_], n: int, p: NDArray[numpy.float32]) -> float:
    """Calculate the log probability mass function using vectorised scipy special functions."""
    difference = numpy.sum(numba_xlogy(x, p) - numba_gammaln(x + 1))
    return cython_gammaln(n + 1) + difference


#
# Benchmark
#

N_TESTS = 10000

n = 10
p = numpy.array([0.08333333, 0.08333333, 0.08333333, 0.08333333, 0.83333333])
x = numpy.array([1, 1, 1, 1, 6])

# Trigger JIT if required
numba_logpmf(x, n, p)

print(timeit.timeit(lambda: numba_logpmf(x, n, p), number=N_TESTS))

Issue Analytics

State:
Created a year ago
Comments:16 (5 by maintainers)

Top GitHub Comments

1reaction

esccommented, Jul 28, 2022

Ok, I found the issue, I switched the base image from python:3.10-slim to python:3.10, the performance was 0.069724. I assume there might be an apt package that’s not present in python:3.10-slim that could be required? Could you guess at what that might be?

I am glad you managed to resolve this issue. I have no idea what package could be missing that would incur such a performance penalty.

0reactions

rez0ncommented, Aug 11, 2022

Meanwhile I not see any reason to “forget compiled functions” in Docker. it is stored in __pycache__? I running 4 testing attempts on same dataset (just hitting F5 for api endpoint which use timezonefinder under the hood) so any cache (if it has been created should be in place)