PERF: use higher level BLAS functions
See original GitHub issueNow that we have access to BLAS functions cython bindings from scipy, we have access to level 3 functions which weren’t present in vendored CBLAS. It would be interesting to check if they could be used in some places instead of a loop of lower level functions.
For example, in cd_fast
I saw that:
# XtA = np.dot(X.T, R) - beta * w
for i in range(n_features):
XtA[i] = (_dot(n_samples, &X[0, i], 1, &R[0], 1)
- beta * w[i])
loop of dot
(level 1) which could be replaced by a gemv
(level 2)
Issue Analytics
- State:
- Created 5 years ago
- Reactions:2
- Comments:10 (9 by maintainers)
Top Results From Across the Web
How does BLAS get such extreme performance?
BLAS is divided into three levels: Level 1 defines a set of linear algebra functions that operate on vectors only. These functions benefit...
Read more >BLAS overview - Arm Performance Libraries Reference Guide
The implementation of higher level linear algebra algorithms on these systems depends critically on the use of the BLAS as building blocks.
Read more >How to use BLAS on taki - High Performance Computing Facility
Level 3 contains operations like matrix-matrix products. In this tutorial, we will show how to compile and run a program that uses BLAS...
Read more >BLAS (Basic Linear Algebra Subprograms) - The Netlib
The BLAS (Basic Linear Algebra Subprograms) are routines that provide standard building blocks for performing basic vector and matrix operations.
Read more >Basic Linear Algebra Subprograms - Wikipedia
Although the BLAS specification is general, BLAS implementations are often optimized for speed on a particular machine, so using them can bring substantial ......
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hm do we want benchmarks for each of them? then doing them one by one might be best.
Other potential candidates include…
https://github.com/scikit-learn/scikit-learn/blob/5f0263fe7f1ce66a91a3af01a54caad7ac546443/sklearn/linear_model/cd_fast.pyx#L499
https://github.com/scikit-learn/scikit-learn/blob/5f0263fe7f1ce66a91a3af01a54caad7ac546443/sklearn/linear_model/cd_fast.pyx#L588-L589
https://github.com/scikit-learn/scikit-learn/blob/5f0263fe7f1ce66a91a3af01a54caad7ac546443/sklearn/linear_model/cd_fast.pyx#L759-L765
https://github.com/scikit-learn/scikit-learn/blob/301076e77b648ea3d715eb823ac006ec0d88e8c3/sklearn/cluster/_k_means.pyx#L68-L71
https://github.com/scikit-learn/scikit-learn/blob/301076e77b648ea3d715eb823ac006ec0d88e8c3/sklearn/cluster/_k_means.pyx#L130-L133