Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Reflected-list and typed List produce slow concatenate for list of equal sized arrays

See original GitHub issue

Hi, I am trying to take advantage of having N numpy arrays of shape=(1, 2) and implement a faster version of np.concatenate using Numba (for this particular case). I have tried this implementation:

import numpy as np
from numba import njit
from numba.typed import List

@njit
def _concat_equal1(arrays, out):
    for i in range(len(arrays)):
        out[i] =  arrays[i][0]

    return out

def concat_equal1(arrays):
    out = np.empty(shape=(len(arrays), 2), dtype=float)
    return _concat_equal1(arrays, out)

The problem is that when arrays is a reflected-list the performance is really slow (especially when compared with numpy’s “optimized” concatenate function:

a = np.random.random(size=(1000, 2))
la = [ai[np.newaxis] for ai in a]

>>> %timeit np.concatenate(la)
361 µs ± 7.14 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

>>> %timeit concat_equal1(la)
10.7 ms ± 187 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

However, when I use a numba.typed.List I get a much better performance:

>>> tla = List(la)
>>> %timeit concat_equal1(tla)
77.9 µs ± 1.23 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Which seems great! But the time that takes to construct a typed List is orders of magnitude worse:

>>> %timeit tla = List(la)
1.6 ms ± 23.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

This calls append for each item and performs real poorly smh.

Am I doing something wrong? Can this be achieved in other ways?

Thanks, Michael

Issue Analytics

State:
Created 3 years ago
Comments:13 (9 by maintainers)

Top GitHub Comments

1reaction

esccommented, Jun 29, 2020

@mishana great! Thanks for the updates and new benchamarks, I’ll hopefully update the reallocation strategy soon!

0reactions

mishanacommented, Jun 26, 2020

The file https://github.com/numba/numba/blob/master/numba/cext/listobject.c (and .h) is actually the third (and final) file that make up the numba.typed.List implementation, alongside the two files you already mentioned: numba.typed.typedlist.py and numba.typed.listobject.py.

Here is how I usually refer to them:

c-level core datastructure: numba/cext/listobject.c

compiler (LLVM) level bindings to c-core: numba/typed/listobject.py

interpreter (or high-level) wrapper: numba/typed/typedlist.py.

And, just for completeness sake, the reflected list is largely in numba/cpython/listobj.py.

Nice to know, I’ll definitely take a look at them.

Basically, what you see here is that the numba.typed.List performs better when used in an @njit compiled function and faster than a regular Python list in pure Python in the interpreter. Sadly, the numba.typed.List is much slower when used in the interpreter…

I kinda disagree, because I think the superior performance of numba.typed.List in the @njit compiled “habitat” has more to do with the LLVM optimizations (e.g., loop unrolling) than the append() itself. To showcase my theory, let us look at the following timings:

from numba import njit

def init_append_10_list():
     lst = []
     lst.append(1)
     lst.append(2)
     lst.append(3)
     lst.append(4)
     lst.append(5)
     lst.append(6)
     lst.append(7)
     lst.append(8)
     lst.append(9)
     lst.append(10)

@njit
def init_append_10_List():
     lst = List()
     lst.append(1)
     lst.append(2)
     lst.append(3)
     lst.append(4)
     lst.append(5)
     lst.append(6)
     lst.append(7)
     lst.append(8)
     lst.append(9)
     lst.append(10)

def init_append_10_loop_list():
    lst = []
    for i in range(10):
        lst.append(i + 1)

@njit
def init_append_10_loop_List():
    lst = List()
    for i in range(10):
        lst.append(i + 1)


>>> %timeit init_append_10_list()
535 ns ± 3.33 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

>>> %timeit init_append_10_List()
807 ns ± 7.29 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

>>> %timeit init_append_10_loop_list()
1.04 µs ± 10.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

>>> %timeit init_append_10_loop_List()
816 ns ± 4.31 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

As you can see (and probably already know) JIT-compiling with LLVM does wonders to python loops performance (in part, using loop-unrolling). Seems to me, that the raw run-time of typed-list’s append() (in an @njit compiled function) is still 50% slower than the cpython list’s implementation (in the interpreter).

Interesting, I just reviewed the current state of the cpython source code that you posted and it seems like it has been updated to produce a different re-allocation pattern, since I copied it. Perhaps it will make sense for Numba to integrate those changes?

Yes, I think the implementation has to be updated according to the cpython one. Take a look here for the description and rationale of this “List overallocation strategy” change.

Top Results From Across the Web

Numpy concatenate is slow: any alternative approach?

This is basically what is happening in all algorithms based on arrays. Each time you change the size of the array, it needs...

Supported Python features - Numba

Lists must be strictly homogeneous: Numba will reject any list containing objects of different types, even if the types are compatible (for example,...

Concat and Concatenate functions in Power Apps

When you use this function with individual strings, it's equivalent to using the & operator. The Concat function concatenates the result of ...

Concatenate Lists in C# - Code Maze

Let's create a UsingAdd method to concatenate two lists: ... First, we instantiate a new array ( combinedArray ) and set its length...

To use or not to use the ++ operator in Elixir - WyeWorks

You may have already been warned about the risks of using the operator to concatenate lists in Elixir. In fact, a good piece...

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Reflected-list and typed List produce slow concatenate for list of equal sized arrays

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

cuda.jit does NOT preserve the original doc and module of the decorated function

Weird warnings when using @jit on version 0.50

Reflected-list and typed List produce slow concatenate for list of equal sized arrays

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

cuda.jit does NOT preserve the original __doc__ and __module__ of the decorated function

Weird warnings when using @jit on version 0.50

cuda.jit does NOT preserve the original doc and module of the decorated function