question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Reflected-list and typed List produce slow concatenate for list of equal sized arrays

See original GitHub issue

Hi, I am trying to take advantage of having N numpy arrays of shape=(1, 2) and implement a faster version of np.concatenate using Numba (for this particular case). I have tried this implementation:

import numpy as np
from numba import njit
from numba.typed import List

@njit
def _concat_equal1(arrays, out):
    for i in range(len(arrays)):
        out[i] =  arrays[i][0]

    return out

def concat_equal1(arrays):
    out = np.empty(shape=(len(arrays), 2), dtype=float)
    return _concat_equal1(arrays, out) 

The problem is that when arrays is a reflected-list the performance is really slow (especially when compared with numpy’s “optimized” concatenate function:

a = np.random.random(size=(1000, 2))
la = [ai[np.newaxis] for ai in a]

>>> %timeit np.concatenate(la)
361 µs ± 7.14 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

>>> %timeit concat_equal1(la)
10.7 ms ± 187 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

However, when I use a numba.typed.List I get a much better performance:

>>> tla = List(la)
>>> %timeit concat_equal1(tla)
77.9 µs ± 1.23 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Which seems great! But the time that takes to construct a typed List is orders of magnitude worse:

>>> %timeit tla = List(la)
1.6 ms ± 23.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

This calls append for each item and performs real poorly smh.

Am I doing something wrong? Can this be achieved in other ways?

Thanks, Michael

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:13 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
esccommented, Jun 29, 2020

@mishana great! Thanks for the updates and new benchamarks, I’ll hopefully update the reallocation strategy soon!

0reactions
mishanacommented, Jun 26, 2020

The file https://github.com/numba/numba/blob/master/numba/cext/listobject.c (and .h) is actually the third (and final) file that make up the numba.typed.List implementation, alongside the two files you already mentioned: numba.typed.typedlist.py and numba.typed.listobject.py.

Here is how I usually refer to them:

  • c-level core datastructure: numba/cext/listobject.c
  • compiler (LLVM) level bindings to c-core: numba/typed/listobject.py
  • interpreter (or high-level) wrapper: numba/typed/typedlist.py.

And, just for completeness sake, the reflected list is largely in numba/cpython/listobj.py.

Nice to know, I’ll definitely take a look at them.

Basically, what you see here is that the numba.typed.List performs better when used in an @njit compiled function and faster than a regular Python list in pure Python in the interpreter. Sadly, the numba.typed.List is much slower when used in the interpreter…

I kinda disagree, because I think the superior performance of numba.typed.List in the @njit compiled “habitat” has more to do with the LLVM optimizations (e.g., loop unrolling) than the append() itself. To showcase my theory, let us look at the following timings:

from numba import njit

def init_append_10_list():
     lst = []
     lst.append(1)
     lst.append(2)
     lst.append(3)
     lst.append(4)
     lst.append(5)
     lst.append(6)
     lst.append(7)
     lst.append(8)
     lst.append(9)
     lst.append(10)

@njit
def init_append_10_List():
     lst = List()
     lst.append(1)
     lst.append(2)
     lst.append(3)
     lst.append(4)
     lst.append(5)
     lst.append(6)
     lst.append(7)
     lst.append(8)
     lst.append(9)
     lst.append(10)

def init_append_10_loop_list():
    lst = []
    for i in range(10):
        lst.append(i + 1)

@njit
def init_append_10_loop_List():
    lst = List()
    for i in range(10):
        lst.append(i + 1)


>>> %timeit init_append_10_list()
535 ns ± 3.33 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

>>> %timeit init_append_10_List()
807 ns ± 7.29 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

>>> %timeit init_append_10_loop_list()
1.04 µs ± 10.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

>>> %timeit init_append_10_loop_List()
816 ns ± 4.31 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

As you can see (and probably already know) JIT-compiling with LLVM does wonders to python loops performance (in part, using loop-unrolling). Seems to me, that the raw run-time of typed-list’s append() (in an @njit compiled function) is still 50% slower than the cpython list’s implementation (in the interpreter).

Interesting, I just reviewed the current state of the cpython source code that you posted and it seems like it has been updated to produce a different re-allocation pattern, since I copied it. Perhaps it will make sense for Numba to integrate those changes?

Yes, I think the implementation has to be updated according to the cpython one. Take a look here for the description and rationale of this “List overallocation strategy” change.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Numpy concatenate is slow: any alternative approach?
This is basically what is happening in all algorithms based on arrays. Each time you change the size of the array, it needs...
Read more >
Supported Python features - Numba
Lists must be strictly homogeneous: Numba will reject any list containing objects of different types, even if the types are compatible (for example,...
Read more >
Concat and Concatenate functions in Power Apps
When you use this function with individual strings, it's equivalent to using the & operator. The Concat function concatenates the result of ...
Read more >
Concatenate Lists in C# - Code Maze
Let's create a UsingAdd method to concatenate two lists: ... First, we instantiate a new array ( combinedArray ) and set its length...
Read more >
To use or not to use the ++ operator in Elixir - WyeWorks
You may have already been warned about the risks of using the operator to concatenate lists in Elixir. In fact, a good piece...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found