Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Predicted standard deviation values of Gaussian Processes are only within [0, 1]

See original GitHub issue

I am using the Gaussian Processes of scikit-learn to estimate behavior of a black box likelihood function f(.). I don’t know the range of the values that this function f(.) produces but I know empirically that it could be as low as -Yxxxx (where Y could be any integer except 0) to 0. Therefore I cannot normalize the output values of f (.) since I don’t know the range of the values that this black box function can produce (it could be even lower than -99999).

I am using scikit-learn’s Gaussian Process module to fit the underlying black box function and then use the gp.predict function to get an estimate of the mean and standard deviation values for some unobserved points. However, I noticed that all of the predicted standard deviation values are in the range (0, 1) instead of more meaningful values such as 500, 1000 etc that I can easily interpret given the predicted means. Therefore, I cannot use these SD values when making my plot because the predicted means are in normally ranges such as (-15000, 0) and their corresponding standard deviation values predicted by the GP is in the range (0, 1) which makes the plot look show a curve without any uncertainty around the predicted means. It seems that gp.predict doesn’t take an input argument that leads to my expected output values for standard deviation and it looks like the Gaussian Process Regression package expects the inputs to also be normalized. So I wonder if there is a way in scikit learn to make gp.predict output standard deviation values in the right range? Why would the gp.predict function predict the means within the right, sensible range but not the values for the standard deviation?

Issue Analytics

State:
Created 4 years ago
Reactions:1
Comments:27 (16 by maintainers)

Top GitHub Comments

2reactions

rthcommented, Nov 18, 2019

But, thinking about it, I might start confirming it was a bug by standardizing the X and fitting/predicting again, and seeing if the scale of std is more or less unchanged

Yes, looks indeed like a bug. In the following example, the standard deviation of y_pred yields identical results independently of the mean of y (or the y_scale value), even with mostly default parameters,

from sklearn.gaussian_process import GaussianProcessRegressor
import numpy as np

gp = GaussianProcessRegressor(random_state=1, alpha=1e-2)


def load_data(n_samples, random_seed=0, y_scale=1):
    rng = np.random.RandomState(random_seed)
    X = rng.uniform(0, 1, [n_samples, 1])
    y = X * y_scale
    return X, y


y_scale = 1000

X_train, y_train = load_data(n_samples=10, random_seed=0, y_scale=y_scale)

gp.fit(X_train, y_train)

X_test, y_test = load_data(n_samples=30, random_seed=1, y_scale=y_scale)
y_mean, y_std = gp.predict(X_test, return_std=True)
print("predicted means\n", y_mean)
print("\n")
print("predicted stds\n", y_std)

Someone would need to investigate why this is happening in the code. I also don’t have much availability at the moment…

1reaction

plgreenLIRUcommented, Dec 4, 2019

OK I’ll do a small PR to help illustrate the point and then we can talk about where to go from there.

Top Results From Across the Web

How to rescale predicted standard deviation values of scikit ...

While trying to model my data using scikit-learn's GP module I learned that the predicted standard deviation values are always within the range...

21: Gaussian Processes 1 Introduction

Note that variance at the data points is exactly zero. We realize that we are not specifying the paramateric form of f(x) and...

Understanding Gaussian Process, the Socratic Way | by Wei Yi

Gaussian Process makes predictions with uncertainty. Learn how Gaussian Process works with simple steps.

1.7. Gaussian Processes — scikit-learn 1.2.0 documentation

The prediction is probabilistic (Gaussian) so that one can compute empirical confidence intervals and decide based on those if one should refit (online...

Gaussian Processes - CEDAR

Srihari. Topics in Gaussian Processes. 1. Examples of use of GP ... For a finite training set we only need to consider values...