question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

numpy.matmul is slow

See original GitHub issue

On numpy current master (da6e4c71), np.matmul is apparently not using BLAS:

>>> import numpy as np
>>> x = np.random.rand(5, 512, 512)
>>> %timeit np.matmul(x, x)
1 loops, best of 3: 526 ms per loop
>>> def xmul(a, b):
...     out = np.empty_like(a)
...     for j in range(a.shape[0]):
...         out[j] = np.dot(a[j], b[j])
...     return out
>>> %timeit xmul(x, x)
10 loops, best of 3: 28 ms per loop

Of course, it’s a preliminary feature, but probably best to have an issue for this.

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Reactions:6
  • Comments:31 (19 by maintainers)

github_iconTop GitHub Comments

3reactions
njsmithcommented, Dec 22, 2018

It would be nice if OpenBLAS would handle small matrices better. It’s certainly possible – they can see the size, and decide to do something small and simple if their normal heavy-weight setup isn’t going to be worthwhile. I wouldn’t hold my breath though; this has been a weakness of theirs for years.

There’s some discussion of adding a standard batched GEMM interface to the next version of BLAS, but I wouldn’t hold my breath on that either: https://docs.google.com/document/d/1DY4ImZT1coqri2382GusXgBTTTVdBDvtD5I14QHp9OE/edit#

If you need optimal speed for large stacks of small matrices on numpy right now, I’d try np.einsum (e.g. z = np.einsum("ink,ikm", x, y)), or possibly trying the anaconda builds of numpy that use MKL, to check if MKL handles the small matrices better than OpenBLAS does.

3reactions
bordingjcommented, Dec 22, 2016

intel MKL supports ?gemm_batch

Read more comments on GitHub >

github_iconTop Results From Across the Web

numpy matmul very slow when one of the two is np.array ...
This seems to indicate that A.real has the same memory address as A , while A.astype(np.float64) does not. Could this be causing this...
Read more >
Faster Matrix Multiplications in Numpy - Benjamin Johnston
If your matrix multiplications are using a single core, then you may be using a slow BLAS. You can get over 2x performance...
Read more >
numpy matmul very slow when one of the two is np.array ...
I discovered that when matmul ing two numpy arrays, if one of the two is the real or imaginary part of a bigger...
Read more >
What Should I Use for Dot Product and Matrix Multiplication?
matmul for earlier versions. Table of contents. What are dot product and matrix multiplications? What is available for NumPy arrays? (1) element ...
Read more >
Why is this simple function twice as slow as its Python version
Anyway, if the code is essentially dependent on matrix multiplication performance, you will not get a very huge speedup with Julia relative to...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found