question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

PERF Optimize dot product order

See original GitHub issue

When multiplying 3 or more matrices, the order of parathesis doesn’t impact the results but it can have a very significant impact on the number of operations and on performance see https://en.wikipedia.org/wiki/Matrix_chain_multiplication

For matrix multiplication of dense arrays there is numpy.linalg.multi_dot and we we should use it I think. To find existing occurrences where it could be used, see for instance the result of

git grep 'dot(.*dot'
sklearn/datasets/_samples_generator.py:    return np.dot(np.dot(u, s), v.T)
sklearn/datasets/_samples_generator.py:    X = np.dot(np.dot(U, 1.0 + np.diag(generator.rand(n_dim))), Vt)
sklearn/decomposition/_fastica.py:    w -= np.dot(np.dot(w, W[:j].T), W[:j])
sklearn/decomposition/_fastica.py:    return np.dot(np.dot(u * (1. / np.sqrt(s)), u.T), W)
sklearn/decomposition/_fastica.py:                S = np.dot(np.dot(W, K), X).T
sklearn/decomposition/_nmf.py:            norm_WH = trace_dot(np.dot(np.dot(W.T, W), H), H)
sklearn/decomposition/_nmf.py:        denominator = np.dot(np.dot(W.T, W), H)
sklearn/discriminant_analysis.py:        self.coef_ = np.dot(self.means_, evecs).dot(evecs.T)
sklearn/gaussian_process/_gpc.py:            s_1 = .5 * a.T.dot(C).dot(a) - .5 * R.T.ravel().dot(C.ravel())
sklearn/gaussian_process/_gpc.py:            s_3 = b - K.dot(R.dot(b))  # Line 14
sklearn/linear_model/_bayes.py:            coef_ = np.dot(X.T, np.dot(
sklearn/linear_model/_logistic.py:        ret[:n_features] = X.T.dot(dX.dot(s[:n_features]))
sklearn/linear_model/_ridge.py:        AXy = A.dot(X_op.T.dot(y))

Ideally each replacement should be benchmarked.

For matrix multiplication safe_sparse_dot using a combination of sparse and dense matrice, some of this could apply as well, though defining a general heuristic is probably a bit more difficult there.

Issue Analytics

  • State:open
  • Created 3 years ago
  • Reactions:2
  • Comments:11 (11 by maintainers)

github_iconTop GitHub Comments

2reactions
postmalloccommented, Jun 25, 2020

Turns out multi_dot can be slower[1] than dot depending on the size and variation in the size of the arrays. However, multi_dot uses a much simpler logic to identify the right order if the dot product is on 3 matrices[2]. Considering that most of the nested dot products in the code seem to have 3 matrices, maybe multi_dot can provide performance gains.

[1] https://stackoverflow.com/questions/45852228/how-is-numpy-multi-dot-slower-than-numpy-dot [2] https://github.com/numpy/numpy/blob/94721320b1e13fd60046dc8bd0d343c54c2dd2e9/numpy/linalg/linalg.py#L2664

1reaction
postmalloccommented, Jun 26, 2020

All of the places where we use multi_dot should use dense data exclusively

Yes, that makes sense. I got these errors when I changed them in the wrong places, such as in _ridge.py.

Yes, please it would be easier for reviewers to evaluate if it’s in one branch…

I pushed the changes for FastICA, NMF, BayesianRidge, and ARDRegression in #17737. Do you want the changes for other modules that have not been benchmarked yet to go in a separate PR?

Read more comments on GitHub >

github_iconTop Results From Across the Web

How Do I Attain Peak CPU Performance With Dot Product?
The problem is that your CPU can do one 128-bit load per clock cycle and to do the dot product you need two...
Read more >
Optimizing Dot Product - Eric Holk
Any decent dot product implementation should be bound by the memory bandwidth. This is true of many algorithms, but many offer opportunities to ......
Read more >
How quickly can you compute the dot product between two ...
How quickly can you compute the dot product between two large vectors? A dot (or scalar) product is a fairly simple operation that...
Read more >
Innefficient paralellization? Need some help optimizing a ...
I have a very simple code I'd like to optimize, and I'm not sure I am getting the ... I have implemented two...
Read more >
Accelerating DSP functions with dot product instructions - Arm
Accelerating DSP functions with dot product instructions ... These operations are used to improve the performance of the libvpx ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found