Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Performance of the RBF kernel in epsilon-SVR (SVM)

See original GitHub issue

Describe the bug

My simple implementation of the RBF kernel is significantly faster than the default implementation for n_samples << n_features. How does this happen? The effect diminishes if n_samples >> n_features, in this case both implementations have a similar runtime.

Steps/Code to Reproduce

Gist n_samples << n_features Gist n_samples >> n_features

Expected Results

The default C implementation is faster than a naive implementation in python.

Actual Results

The naive implementation in python is 10x faster than the default C implementation for n_samples << n_features.

Versions

System:
    python: 3.8.12 (default, Aug 30 2021, 00:00:00)  [GCC 11.2.1 20210728 (Red Hat 11.2.1-1)]
executable: /home/t/.cache/pypoetry/virtualenvs/mutation-prediction-VhT0dLh3-py3.8/bin/python
   machine: Linux-5.14.10-200.fc34.x86_64-x86_64-with-glibc2.2.5

Python dependencies:
          pip: 21.0.1
   setuptools: 54.1.2
      sklearn: 0.24.2
        numpy: 1.21.0
        scipy: 1.7.0
       Cython: None
       pandas: 1.2.5
   matplotlib: 3.4.2
       joblib: 1.0.1
threadpoolctl: 2.1.0

Built with OpenMP: True

I ran the benchmark on a Ryzen 7 2700 Octa-Core (Hyperthreading enabled) with 64 GB RAM.

Issue Analytics

State:
Created 2 years ago
Comments:7 (4 by maintainers)

Top GitHub Comments

2reactions

Turakarcommented, Oct 20, 2021

Thank you for your comprehensive answer @ogrisel ! Glad that you find this interesting. The potential of SIMD and memory optimizations is stunning. I once tried ThunderSVM and did not achieve a high performance increase - this might have had two reasons: a) I have a maximum of 500 samples, but thousands of features, and b) the SVM is trained single-threaded (with taskset -c), because I do the parallelization on the level of the HPO. But I will take a second look at it and some alternative libraries.

From my side, I’ve got all the information I hoped for from this issue, so I leave it up to you whether you want to keep this issue open.

2reactions

ogriselcommented, Oct 20, 2021

This is an interesting finding. Thanks for the feedback.

While it is surprising, it can probably be explained by the fact that both the implementations of sklearn.metrics.pairwise.euclidean_distances and numpy.exp have been improved while the C implementation in our libsvm has not improved as much over the years.

If you look at the source code of euclidean_distances you will observe that is uses efficient vectorized operations using what we call the “GEMM trick”: ||x - y||^2 = ||x||^2 + ||y||^2 - 2 <x, y>, the latter <x, y> being a BLAS GEMM operation that is very efficiently implemented using vector instructions in OpenBLAS, BLIS or MKL shipped with numpy or scipy.

The latter (numpy.exp) has recently been optimized in numpy to also use vectorized instructions: https://numpy.org/devdocs/reference/simd/simd-optimizations.html.

@jjerphan is currently working on a low level refactoring on computation routines that will probably allow us to make sklearn.metrics.pairwise.euclidean_distances and pairwise_kernels run even more efficiently, especially on multicore CPUs:

We are aware that the sklearn.svm.SVR is very suboptimal but we did not know that your findings might make it possible to optimize it faster than what we anticipated.

As Gael said above, it’s likely the case that for very large numbers of features, a linear model would be even more efficient and accurate than a RBF-kernel SVM. Still it would be interesting to see if we can get a speed up when the number of samples grows with a medium sized number of features (e.g. a few hundreds).

Without waiting for us to implement those optimizations consistently in this class, you might be interested in trying https://github.com/Xtra-Computing/thundersvm or https://intel.github.io/scikit-learn-intelex/ that can be orders of magnitude faster (sometimes 100x or more) than scikit-learn’s current implementation of SVR, especially on machine with many cores.

Top Results From Across the Web

SVM Classifier and RBF Kernel — How to Make Better Models ...

A complete explanation of the inner workings of Support Vector Machines (SVM) and Radial Basis Function (RBF) kernel. SVM with RBF kernel and...

Errors and correlation coefficient (Epsilon–SVR–RBF kernel ...

We investigate the SVR method using four kernel functions: Radial Basis Function (RBF), Polynomial, Linear, and Sigmoid. We build the ARIMAX model through...

When using SVMs, why do I need to scale the features?

All kernel methods are based on distance. The RBF kernel function is κ(u,v)=exp(−‖u−v‖2) (using γ=1 for simplicity).

sklearn.svm.SVR — scikit-learn 1.2.0 documentation

Specifies the kernel type to be used in the algorithm. If none is given, 'rbf' will be used. If a callable is given...

Performance Investigation of Support Vector Regression using ...

types of SVM regression epsilon-SVR and Nu-SVR. In Epsilon-SVR ε ... Linear. Kernel. RBF. Kernel. Polynomial. Kernel. Sigmoid. Kernel.