question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Numba appears ~10X slower running in docker versus native on OSX

See original GitHub issue

TLDR: Running the same numba code in docker versus on OSX appears almost ~10X slower using timeit benchmarks.

  • Numba on OSX: 0.678s runtime
  • Numba in docker: 4.900s runtime

(N=10000)

  • I have tried using the latest released version of Numba (most recent is visible in the change log (https://github.com/numba/numba/blob/main/CHANGE_LOG).
  • I have included a self contained code sample to reproduce the problem. i.e. it’s possible to run as ‘python bug.py’.

Hi! I really enjoy numba, and appreciate all the work that goes into it. I wanted to report a strange phenomenon that I thought might be noteworthy. I’m doing some numerical simulation using multinomial distributions. I’m doing millions of samplings and so calculation of a logpmf is becoming a bottleneck. I wanted to give numba a try to see if there was any speed up possible. I’ve attempted to rewrite the hot code paths in numba to see if there were any potential gains there. I however notice the same code running on OSX runs much slower when running in a docker image. The code needs to run in a docker container eventually because it will be deployed in a k8s cluster. Perhaps I’m not writing the code in the most optimal way, but it does seem strange that running in docker could cause such a dramatic change in performance. I might expect a few percentage points because of virtualisation, but not this much. I’ve attached a reproducible example below.

Potential docker considerations:

  • Using an intel mac, not arm.
  • Docker VM has 10CPUs available vs 12 for the OSX.
  • Docker VM has 1GB of swap available.
import ctypes
import timeit

import numba
import numpy
from numba import extending
from numpy.typing import NDArray

_PTR = ctypes.POINTER
_dble = ctypes.c_double
_ptr_dble = _PTR(_dble)

gammaln_functype = ctypes.CFUNCTYPE(_dble, _dble)
cython_gammaln = gammaln_functype(
    extending.get_cython_function_address("scipy.special.cython_special", "gammaln")
)

xlogy_functype = ctypes.CFUNCTYPE(_dble, _dble, _dble)
cython_xlogy = xlogy_functype(
    extending.get_cython_function_address("scipy.special.cython_special", "__pyx_fuse_1xlogy")
)


@numba.vectorize([numba.float64(numba.float64)])
@numba.jit(nopython=True, parallel=True, fastmath=True)
def numba_gammaln(x):
    return cython_gammaln(x)


@numba.vectorize([numba.float64(numba.float64, numba.float64)])
@numba.jit(nopython=True, parallel=True, fastmath=True)
def numba_xlogy(x, y):
    return cython_xlogy(x, y)


@numba.jit(nopython=True, parallel=True, fastmath=True)
def numba_logpmf(x: NDArray[numpy.int_], n: int, p: NDArray[numpy.float32]) -> float:
    """Calculate the log probability mass function using vectorised scipy special functions."""
    difference = numpy.sum(numba_xlogy(x, p) - numba_gammaln(x + 1))
    return cython_gammaln(n + 1) + difference


#
# Benchmark
#

N_TESTS = 10000

n = 10
p = numpy.array([0.08333333, 0.08333333, 0.08333333, 0.08333333, 0.83333333])
x = numpy.array([1, 1, 1, 1, 6])

# Trigger JIT if required
numba_logpmf(x, n, p)

print(timeit.timeit(lambda: numba_logpmf(x, n, p), number=N_TESTS))

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:16 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
esccommented, Jul 28, 2022

Ok, I found the issue, I switched the base image from python:3.10-slim to python:3.10, the performance was 0.069724. I assume there might be an apt package that’s not present in python:3.10-slim that could be required? Could you guess at what that might be?

I am glad you managed to resolve this issue. I have no idea what package could be missing that would incur such a performance penalty.

0reactions
rez0ncommented, Aug 11, 2022

Meanwhile I not see any reason to “forget compiled functions” in Docker. it is stored in __pycache__? I running 4 testing attempts on same dataset (just hitting F5 for api endpoint which use timezonefinder under the hood) so any cache (if it has been created should be in place)

Read more comments on GitHub >

github_iconTop Results From Across the Web

Beating some performance into Docker for Mac - Medium
Storing a MySQL or Mongo database (in my case both at the same time) or something more comprehensive like Symfony or Wordpress will...
Read more >
Docker extremely slow, on linux and windows
Hi guys, I was working with Docker on a Win11 machine and the application run very very slow, around 30 seconds or even...
Read more >
Docker-compose run significantly slower than docker run #1062
I've noticed that docker-compose run seems to be significantly slower than plain docker run. Based on verbose logs it looks like it's slow...
Read more >
Docker on M1 Max - Horrible Performance - Reddit
Docker for Mac performance continues to be horrible on the M1. Does anyone know if Docker plan to improve this situation?
Read more >
docker on OSX slow volumes - macos - Stack Overflow
except NO MORE slow downs! You will need to run this anytime you restart your computer or docker. Also note if you get...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found