numpy.matmul is slow
See original GitHub issueOn numpy current master (da6e4c71), np.matmul is apparently not using BLAS:
>>> import numpy as np
>>> x = np.random.rand(5, 512, 512)
>>> %timeit np.matmul(x, x)
1 loops, best of 3: 526 ms per loop
>>> def xmul(a, b):
... out = np.empty_like(a)
... for j in range(a.shape[0]):
... out[j] = np.dot(a[j], b[j])
... return out
>>> %timeit xmul(x, x)
10 loops, best of 3: 28 ms per loop
Of course, it’s a preliminary feature, but probably best to have an issue for this.
Issue Analytics
- State:
- Created 7 years ago
- Reactions:6
- Comments:31 (19 by maintainers)
Top Results From Across the Web
numpy matmul very slow when one of the two is np.array ...
This seems to indicate that A.real has the same memory address as A , while A.astype(np.float64) does not. Could this be causing this...
Read more >Faster Matrix Multiplications in Numpy - Benjamin Johnston
If your matrix multiplications are using a single core, then you may be using a slow BLAS. You can get over 2x performance...
Read more >numpy matmul very slow when one of the two is np.array ...
I discovered that when matmul ing two numpy arrays, if one of the two is the real or imaginary part of a bigger...
Read more >What Should I Use for Dot Product and Matrix Multiplication?
matmul for earlier versions. Table of contents. What are dot product and matrix multiplications? What is available for NumPy arrays? (1) element ...
Read more >Why is this simple function twice as slow as its Python version
Anyway, if the code is essentially dependent on matrix multiplication performance, you will not get a very huge speedup with Julia relative to...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
It would be nice if OpenBLAS would handle small matrices better. It’s certainly possible – they can see the size, and decide to do something small and simple if their normal heavy-weight setup isn’t going to be worthwhile. I wouldn’t hold my breath though; this has been a weakness of theirs for years.
There’s some discussion of adding a standard batched GEMM interface to the next version of BLAS, but I wouldn’t hold my breath on that either: https://docs.google.com/document/d/1DY4ImZT1coqri2382GusXgBTTTVdBDvtD5I14QHp9OE/edit#
If you need optimal speed for large stacks of small matrices on numpy right now, I’d try
np.einsum
(e.g.z = np.einsum("ink,ikm", x, y)
), or possibly trying the anaconda builds of numpy that use MKL, to check if MKL handles the small matrices better than OpenBLAS does.intel MKL supports ?gemm_batch