For the RBF Kernel in gaussian_process, the calculation of the gradient seems incorrect?
See original GitHub issueDescription
The RBF kernel corresponds to the form: k(x_i, x_j) = exp(-1/2 ||x_i - x_j||^2 / length_scale^2)
Therefore, the gradient with respect to the parameter length_scale
should be:
gradient = k(x_i, x_j) * ( ||x_i - x_j||^2 / length_scale^3)
However, the current implementation seems to use the form: gradient = k(x_i, x_j) * ( ||x_i - x_j||^2 / length_scale^2)
Steps/Code to Reproduce
Example:
import numpy as np
from sklearn.gaussian_process.kernels import RBF
np.random.seed(1)
X = np.array([[1,2], [3,4], [5,6]])
sk_kernel = RBF(2.0)
K_grad = sk_kernel(X, eval_gradient=True)[1][:,:,0]
Expected Results
K_grad =
array([[ 0. , 0.36787944, 0.07326256],
[ 0.36787944, 0. , 0.36787944],
[ 0.07326256, 0.36787944, 0. ]])
Actual Results
K_grad =
array([[ 0. , 0.73575888, 0.14652511],
[ 0.73575888, 0. , 0.73575888],
[ 0.14652511, 0.73575888, 0. ]])
Versions
Darwin-14.5.0-x86_64-i386-64bit Python 3.6.3 (default, Oct 8 2017, 15:07:13) [GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.0.72)] NumPy 1.13.3 SciPy 0.19.1 Scikit-Learn 0.19.0
Issue Analytics
- State:
- Created 5 years ago
- Reactions:3
- Comments:8 (6 by maintainers)
Top Results From Across the Web
Gaussian Processes and Kernels
A Gaussian Process created by a Bayesian linear regression model is degenerate ... By choosing an appropriate kernel function, we can define Gaussian ......
Read more >Is the MLE problem for Gaussian Process Regression convex?
... Carlo based optimisation and this seems to work ok but the cost of evaluating the log likelihood / it's gradient isn't exactly...
Read more >Gaussian Process, not quite for dummies - Yuge Shi
A Gaussian process is a probability distribution over possible ... This RBF kernel ensures the “smoothness” of the covariance matrix, ...
Read more >SVM Classifier and RBF Kernel — How to Make Better Models ...
Hyperplane called “H1” cannot accurately separate the two classes; hence, it is not a viable solution to our problem. The “H2” hyperplane separates...
Read more >sklearn.gaussian_process.GaussianProcessRegressor
GaussianProcessRegressor: Comparison of kernel ridge and Gaussian process ... by ensuring that the calculated values form a positive definite matrix.
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
I agree that’s what it seems to be doing. Could you submit a PR with a test? I’m curious how you found this bug also. We should, if we do not, probably have tests comparing these analytical gradients to numerical estimates.
Yes, this is fixed by #18115. Thanks @cmarmo