question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Predicted standard deviation values of Gaussian Processes are only within [0, 1]

See original GitHub issue

I am using the Gaussian Processes of scikit-learn to estimate behavior of a black box likelihood function f(.). I don’t know the range of the values that this function f(.) produces but I know empirically that it could be as low as -Yxxxx (where Y could be any integer except 0) to 0. Therefore I cannot normalize the output values of f (.) since I don’t know the range of the values that this black box function can produce (it could be even lower than -99999).

I am using scikit-learn’s Gaussian Process module to fit the underlying black box function and then use the gp.predict function to get an estimate of the mean and standard deviation values for some unobserved points. However, I noticed that all of the predicted standard deviation values are in the range (0, 1) instead of more meaningful values such as 500, 1000 etc that I can easily interpret given the predicted means. Therefore, I cannot use these SD values when making my plot because the predicted means are in normally ranges such as (-15000, 0) and their corresponding standard deviation values predicted by the GP is in the range (0, 1) which makes the plot look show a curve without any uncertainty around the predicted means. It seems that gp.predict doesn’t take an input argument that leads to my expected output values for standard deviation and it looks like the Gaussian Process Regression package expects the inputs to also be normalized. So I wonder if there is a way in scikit learn to make gp.predict output standard deviation values in the right range? Why would the gp.predict function predict the means within the right, sensible range but not the values for the standard deviation?

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:1
  • Comments:27 (16 by maintainers)

github_iconTop GitHub Comments

2reactions
rthcommented, Nov 18, 2019

But, thinking about it, I might start confirming it was a bug by standardizing the X and fitting/predicting again, and seeing if the scale of std is more or less unchanged

Yes, looks indeed like a bug. In the following example, the standard deviation of y_pred yields identical results independently of the mean of y (or the y_scale value), even with mostly default parameters,

from sklearn.gaussian_process import GaussianProcessRegressor
import numpy as np

gp = GaussianProcessRegressor(random_state=1, alpha=1e-2)


def load_data(n_samples, random_seed=0, y_scale=1):
    rng = np.random.RandomState(random_seed)
    X = rng.uniform(0, 1, [n_samples, 1])
    y = X * y_scale
    return X, y


y_scale = 1000

X_train, y_train = load_data(n_samples=10, random_seed=0, y_scale=y_scale)

gp.fit(X_train, y_train)

X_test, y_test = load_data(n_samples=30, random_seed=1, y_scale=y_scale)
y_mean, y_std = gp.predict(X_test, return_std=True)
print("predicted means\n", y_mean)
print("\n")
print("predicted stds\n", y_std)

Someone would need to investigate why this is happening in the code. I also don’t have much availability at the moment…

1reaction
plgreenLIRUcommented, Dec 4, 2019

OK I’ll do a small PR to help illustrate the point and then we can talk about where to go from there.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to rescale predicted standard deviation values of scikit ...
While trying to model my data using scikit-learn's GP module I learned that the predicted standard deviation values are always within the range...
Read more >
21: Gaussian Processes 1 Introduction
Note that variance at the data points is exactly zero. We realize that we are not specifying the paramateric form of f(x) and...
Read more >
Understanding Gaussian Process, the Socratic Way | by Wei Yi
Gaussian Process makes predictions with uncertainty. Learn how Gaussian Process works with simple steps.
Read more >
1.7. Gaussian Processes — scikit-learn 1.2.0 documentation
The prediction is probabilistic (Gaussian) so that one can compute empirical confidence intervals and decide based on those if one should refit (online...
Read more >
Gaussian Processes - CEDAR
Srihari. Topics in Gaussian Processes. 1. Examples of use of GP ... For a finite training set we only need to consider values...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found