PERF Optimize dot product order
See original GitHub issueWhen multiplying 3 or more matrices, the order of parathesis doesn’t impact the results but it can have a very significant impact on the number of operations and on performance see https://en.wikipedia.org/wiki/Matrix_chain_multiplication
For matrix multiplication of dense arrays there is numpy.linalg.multi_dot and we we should use it I think. To find existing occurrences where it could be used, see for instance the result of
git grep 'dot(.*dot'
sklearn/datasets/_samples_generator.py: return np.dot(np.dot(u, s), v.T)
sklearn/datasets/_samples_generator.py: X = np.dot(np.dot(U, 1.0 + np.diag(generator.rand(n_dim))), Vt)
sklearn/decomposition/_fastica.py: w -= np.dot(np.dot(w, W[:j].T), W[:j])
sklearn/decomposition/_fastica.py: return np.dot(np.dot(u * (1. / np.sqrt(s)), u.T), W)
sklearn/decomposition/_fastica.py: S = np.dot(np.dot(W, K), X).T
sklearn/decomposition/_nmf.py: norm_WH = trace_dot(np.dot(np.dot(W.T, W), H), H)
sklearn/decomposition/_nmf.py: denominator = np.dot(np.dot(W.T, W), H)
sklearn/discriminant_analysis.py: self.coef_ = np.dot(self.means_, evecs).dot(evecs.T)
sklearn/gaussian_process/_gpc.py: s_1 = .5 * a.T.dot(C).dot(a) - .5 * R.T.ravel().dot(C.ravel())
sklearn/gaussian_process/_gpc.py: s_3 = b - K.dot(R.dot(b)) # Line 14
sklearn/linear_model/_bayes.py: coef_ = np.dot(X.T, np.dot(
sklearn/linear_model/_logistic.py: ret[:n_features] = X.T.dot(dX.dot(s[:n_features]))
sklearn/linear_model/_ridge.py: AXy = A.dot(X_op.T.dot(y))
Ideally each replacement should be benchmarked.
For matrix multiplication safe_sparse_dot
using a combination of sparse and dense matrice, some of this could apply as well, though defining a general heuristic is probably a bit more difficult there.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:2
- Comments:11 (11 by maintainers)
Top Results From Across the Web
How Do I Attain Peak CPU Performance With Dot Product?
The problem is that your CPU can do one 128-bit load per clock cycle and to do the dot product you need two...
Read more >Optimizing Dot Product - Eric Holk
Any decent dot product implementation should be bound by the memory bandwidth. This is true of many algorithms, but many offer opportunities to ......
Read more >How quickly can you compute the dot product between two ...
How quickly can you compute the dot product between two large vectors? A dot (or scalar) product is a fairly simple operation that...
Read more >Innefficient paralellization? Need some help optimizing a ...
I have a very simple code I'd like to optimize, and I'm not sure I am getting the ... I have implemented two...
Read more >Accelerating DSP functions with dot product instructions - Arm
Accelerating DSP functions with dot product instructions ... These operations are used to improve the performance of the libvpx ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Turns out
multi_dot
can be slower[1] thandot
depending on the size and variation in the size of the arrays. However,multi_dot
uses a much simpler logic to identify the right order if the dot product is on 3 matrices[2]. Considering that most of the nested dot products in the code seem to have 3 matrices, maybemulti_dot
can provide performance gains.[1] https://stackoverflow.com/questions/45852228/how-is-numpy-multi-dot-slower-than-numpy-dot [2] https://github.com/numpy/numpy/blob/94721320b1e13fd60046dc8bd0d343c54c2dd2e9/numpy/linalg/linalg.py#L2664
Yes, that makes sense. I got these errors when I changed them in the wrong places, such as in
_ridge.py
.I pushed the changes for FastICA, NMF, BayesianRidge, and ARDRegression in #17737. Do you want the changes for other modules that have not been benchmarked yet to go in a separate PR?