question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Prediction is dependent on other predicted data points

See original GitHub issue

🐛 Bug

This might not be a bug but instead my misunderstanding of how GPyTorch works, but my understanding is that conditional on the training data (i.e. the observations), the prediction for any given data point is independent of what we ask about other data points. However, I see that asking for certain data points in prediction is adding noise to other predictions, which I assume is due to some optimization that is numerically unstable?

To reproduce

Unfortunately, I don’t have a reproducer I can share, but I can clearly describe the data and model:

model:

ExactGPModel(
  (likelihood): FixedNoiseGaussianLikelihood(
    (noise_covar): FixedGaussianNoise()
  )
  (mean_module): ConstantMean()
  (covar_module): ScaleKernel(
    (base_kernel): RBFKernel(
      (lengthscale_prior): NormalPrior()
      (raw_lengthscale_constraint): GreaterThan(5.000E+01)
      (distance_module): Distance()
    )
    (raw_outputscale_constraint): Positive()
  )
  (feature_extractor): CustomLengthScaleExtractor()
)

where CustomLengthScaleExtractor is a piece-wise monotonic function that just transforms x before it passes through the GP (I don’t think it’s super relevant to the problem, but I can explain more). Essentially this is a 1-dimensional GP with some data points (x, y) such that all the data has x as an integer in [-20, 627] and y between 0, 12. The FixedGaussianNoise is pretty much all set to 1, with some data points having noise of 3.

** Stack trace/error message ** This is if we ask only for data points inside the range:

xpred = torch.arange(-20, 500).to(torch.float32)
with torch.no_grad(), gpytorch.settings.fast_pred_var():
    # observed_pred = likelihood(model(test_x))
    f_preds = gpmodel(xpred)
    f_mean = f_preds.mean

mean = f_preds.mean.detach().cpu().numpy()
std = f_preds.stddev.detach().cpu().numpy()
plt.figure(figsize=(8, 6))
plt.plot(xpred.numpy(), mean, '-', color='gray')

plt.fill_between(xpred.numpy(), mean - std, mean + std,
                 color='gray', alpha=0.2)
plt.xlim(-5, 5)
plt.ylim(10, 11)

ok

However when I expand the range, I get some noisy predictions:

xpred = torch.arange(-20, 1000).to(torch.float32)
with torch.no_grad(), gpytorch.settings.fast_pred_var():
    # observed_pred = likelihood(model(test_x))
    f_preds = gpmodel(xpred)
    f_mean = f_preds.mean

mean = f_preds.mean.detach().cpu().numpy()
std = f_preds.stddev.detach().cpu().numpy()
plt.figure(figsize=(8, 6))
plt.plot(xpred.numpy(), mean, '-', color='gray')

plt.fill_between(xpred.numpy(), mean - std, mean + std,
                 color='gray', alpha=0.2)
plt.xlim(-5, 5)
plt.ylim(10, 11)
plt.tight_layout()

bad

I also get the following warning (which I guess might be because of fast_pred_var):

..site-packages/gpytorch/distributions/multivariate_normal.py:263: NumericalWarning: Negative variance values detected. This is likely due to numerical instabilities. Rounding negative variances up to 1e-06.
  NumericalWarning,

Expected Behavior

I expected that as we asked for a larger range of predictions (including the earlier values), the predictions on the earlier set shouldn’t change. In other words, if I predict on A, then predict on A, B, then the two predictions for A (definitely in mean and probably in variance?) should be the same. Please let me know if I just misunderstand!

I was sure that this error starts occuring once you ask for predictions outside of the model.train_inputs range, but it seems that it starts even before that (which in my case here is at x=627):

bad2

Of course, as you let the prediction range go out to much larger numbers, the predictions at x=0 become nonsense:

bad3

System information

Please complete the following information:

  • GPyTorch version 1.4.1
  • PyTorch version 1.7.1+cu101
  • OS is RHEL

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:6

github_iconTop GitHub Comments

2reactions
wjmaddoxcommented, Jul 27, 2021

Anyways, I’m pretty confident that what you’re seeing as the predictions changing is really the SKI grid changing in response to seeing values of x that are outside the range of the data.

x = torch.rand(8000, 1) * 120 - 10
y = torch.exp(-x / 30) + torch.exp(-torch.sin((x  - 3.5) / 1.5) / 10) + 10 + 0.25 * torch.randn_like(x)
# this data sort of looks like yours, but is crucially on the same scale [-10, 110]

likelihood = gpytorch.likelihoods.GaussianLikelihood()
model = GPRegressionModel(train_x, train_y, likelihood)

model.train()
likelihood.train()

# now print the grid
print(model.covar_module.base_kernel.grid[0])
# tensor([ -9.9977,  -9.9827,  -9.9677,  ..., 110.0126, 110.0276, 110.0426])

# The gpytorch.settings.fast_pred_var flag activates LOVE (for fast variances)
# See https://arxiv.org/abs/1803.06058
with torch.no_grad(), gpytorch.settings.fast_pred_var():
    test_x = torch.linspace(-20, 100, 200).double()
    prediction = likelihood(model(test_x))
    mean = prediction.mean
    # Get lower and upper predictive bounds
    lower, upper = prediction.confidence_region()

# check the grid
print(model.covar_module.base_kernel.grid[0])
# tensor([ -9.9977,  -9.9827,  -9.9677,  ..., 110.0126, 110.0276, 110.0426])

# it also changes if you put the print statements in between your comments
print(model.covar_module.base_kernel.grid[0])
print("-20: ", model(torch.tensor([-20, -10., -5., 0.]).unsqueeze(-1).double()).mean)
print(model.covar_module.base_kernel.grid[0])
print("0", model(torch.tensor([0.]).unsqueeze(-1).double()).mean)
print(model.covar_module.base_kernel.grid[0])
print("0, 10", model(torch.tensor([0., 10.]).double().unsqueeze(-1)).mean)
print(model.covar_module.base_kernel.grid[0])
####
# tensor([ -9.9977,  -9.9827,  -9.9677,  ..., 110.0126, 110.0276, 110.0426],
 #      dtype=torch.float64)
# -20:  tensor([12.3671, 11.8522, 11.8437, 11.7605], dtype=torch.float64,
 #      grad_fn=<ViewBackward>)
# tensor([-20.0489, -20.0327, -20.0164,  ..., 110.0138, 110.0301, 110.0464],
 #      dtype=torch.float64)
# 0 tensor([11.9508], dtype=torch.float64, grad_fn=<ViewBackward>)
# tensor([ -9.9977,  -9.9827,  -9.9677,  ..., 110.0126, 110.0276, 110.0426],
#       dtype=torch.float64)
# 0, 10 tensor([11.9508, 11.9698], dtype=torch.float64, grad_fn=<ViewBackward>)
# tensor([ -9.9977,  -9.9827,  -9.9677,  ..., 110.0126, 110.0276, 110.0426],
#       dtype=torch.float64)

The prediction changes are probably most noticeable because your data has wide range and isn’t standardized to say [0,1] or to have zero mean.

0reactions
wjmaddoxcommented, Jul 27, 2021

Hi, sorry I’m not able to access the data now that I’m taking a look at this, but will try simulating some data to look roughly similar.

A priori, I’d expect that part of what’s happening is that the grid is changing (e.g. expanding if you predict at certain data points) by virtue of this set of code: https://github.com/cornellius-gp/gpytorch/blob/c074c2ff5ba5708761453bbd9be870c35cb57769/gpytorch/kernels/grid_interpolation_kernel.py#L158. In general, this should be nearly equivalent to setting the default grid bounds to be train_x.min() and train_x.max(), but on a dimension-wise setting.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Making Predictions with Regression Analysis - Statistics By Jim
Regression predictions are valid only for the range of data used to estimate the model. The relationship between the independent variables and the...
Read more >
4. Regression and Prediction - Practical Statistics for Data ...
The Y variable is known as the response or dependent variable since it depends on X. ... The goal is to predict the...
Read more >
Regression
❖ The variable that is used to explain or predict the response variable is called the explanatory variable. It is also sometimes called...
Read more >
Regression Basics
The difference between the observed Y and the predicted Y (Y-Y') is called a residual. The predicted Y part is the linear part....
Read more >
Linear Regression (Part-2)— The Behind the Scenes Data ...
We know that linear regression is needed when we are trying to predict the value of one variable (known as dependent variable) with...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found