Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

numpy.matmul is slow

See original GitHub issue

On numpy current master (da6e4c71), np.matmul is apparently not using BLAS:

>>> import numpy as np
>>> x = np.random.rand(5, 512, 512)
>>> %timeit np.matmul(x, x)
1 loops, best of 3: 526 ms per loop
>>> def xmul(a, b):
...     out = np.empty_like(a)
...     for j in range(a.shape[0]):
...         out[j] = np.dot(a[j], b[j])
...     return out
>>> %timeit xmul(x, x)
10 loops, best of 3: 28 ms per loop

Of course, it’s a preliminary feature, but probably best to have an issue for this.

Issue Analytics

State:
Created 7 years ago
Reactions:6
Comments:31 (19 by maintainers)

Top GitHub Comments

3reactions

njsmithcommented, Dec 22, 2018

It would be nice if OpenBLAS would handle small matrices better. It’s certainly possible – they can see the size, and decide to do something small and simple if their normal heavy-weight setup isn’t going to be worthwhile. I wouldn’t hold my breath though; this has been a weakness of theirs for years.

There’s some discussion of adding a standard batched GEMM interface to the next version of BLAS, but I wouldn’t hold my breath on that either: https://docs.google.com/document/d/1DY4ImZT1coqri2382GusXgBTTTVdBDvtD5I14QHp9OE/edit#

If you need optimal speed for large stacks of small matrices on numpy right now, I’d try np.einsum (e.g. z = np.einsum("ink,ikm", x, y)), or possibly trying the anaconda builds of numpy that use MKL, to check if MKL handles the small matrices better than OpenBLAS does.

3reactions

bordingjcommented, Dec 22, 2016

intel MKL supports ?gemm_batch