Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Prediction is dependent on other predicted data points

See original GitHub issue

🐛 Bug

This might not be a bug but instead my misunderstanding of how GPyTorch works, but my understanding is that conditional on the training data (i.e. the observations), the prediction for any given data point is independent of what we ask about other data points. However, I see that asking for certain data points in prediction is adding noise to other predictions, which I assume is due to some optimization that is numerically unstable?

To reproduce

Unfortunately, I don’t have a reproducer I can share, but I can clearly describe the data and model:

model:

ExactGPModel(
  (likelihood): FixedNoiseGaussianLikelihood(
    (noise_covar): FixedGaussianNoise()
  )
  (mean_module): ConstantMean()
  (covar_module): ScaleKernel(
    (base_kernel): RBFKernel(
      (lengthscale_prior): NormalPrior()
      (raw_lengthscale_constraint): GreaterThan(5.000E+01)
      (distance_module): Distance()
    )
    (raw_outputscale_constraint): Positive()
  )
  (feature_extractor): CustomLengthScaleExtractor()
)

where CustomLengthScaleExtractor is a piece-wise monotonic function that just transforms x before it passes through the GP (I don’t think it’s super relevant to the problem, but I can explain more). Essentially this is a 1-dimensional GP with some data points (x, y) such that all the data has x as an integer in [-20, 627] and y between 0, 12. The FixedGaussianNoise is pretty much all set to 1, with some data points having noise of 3.

** Stack trace/error message ** This is if we ask only for data points inside the range:

xpred = torch.arange(-20, 500).to(torch.float32)
with torch.no_grad(), gpytorch.settings.fast_pred_var():
    # observed_pred = likelihood(model(test_x))
    f_preds = gpmodel(xpred)
    f_mean = f_preds.mean

mean = f_preds.mean.detach().cpu().numpy()
std = f_preds.stddev.detach().cpu().numpy()
plt.figure(figsize=(8, 6))
plt.plot(xpred.numpy(), mean, '-', color='gray')

plt.fill_between(xpred.numpy(), mean - std, mean + std,
                 color='gray', alpha=0.2)
plt.xlim(-5, 5)
plt.ylim(10, 11)

However when I expand the range, I get some noisy predictions:

xpred = torch.arange(-20, 1000).to(torch.float32)
with torch.no_grad(), gpytorch.settings.fast_pred_var():
    # observed_pred = likelihood(model(test_x))
    f_preds = gpmodel(xpred)
    f_mean = f_preds.mean

mean = f_preds.mean.detach().cpu().numpy()
std = f_preds.stddev.detach().cpu().numpy()
plt.figure(figsize=(8, 6))
plt.plot(xpred.numpy(), mean, '-', color='gray')

plt.fill_between(xpred.numpy(), mean - std, mean + std,
                 color='gray', alpha=0.2)
plt.xlim(-5, 5)
plt.ylim(10, 11)
plt.tight_layout()

bad

I also get the following warning (which I guess might be because of fast_pred_var):

..site-packages/gpytorch/distributions/multivariate_normal.py:263: NumericalWarning: Negative variance values detected. This is likely due to numerical instabilities. Rounding negative variances up to 1e-06.
  NumericalWarning,

Expected Behavior

I expected that as we asked for a larger range of predictions (including the earlier values), the predictions on the earlier set shouldn’t change. In other words, if I predict on A, then predict on A, B, then the two predictions for A (definitely in mean and probably in variance?) should be the same. Please let me know if I just misunderstand!

I was sure that this error starts occuring once you ask for predictions outside of the model.train_inputs range, but it seems that it starts even before that (which in my case here is at x=627):

bad2

Of course, as you let the prediction range go out to much larger numbers, the predictions at x=0 become nonsense:

bad3

System information

Please complete the following information:

GPyTorch version 1.4.1
PyTorch version 1.7.1+cu101
OS is RHEL

Issue Analytics

State:
Created 2 years ago
Comments:6

Top GitHub Comments

2reactions

wjmaddoxcommented, Jul 27, 2021

Anyways, I’m pretty confident that what you’re seeing as the predictions changing is really the SKI grid changing in response to seeing values of x that are outside the range of the data.

x = torch.rand(8000, 1) * 120 - 10
y = torch.exp(-x / 30) + torch.exp(-torch.sin((x  - 3.5) / 1.5) / 10) + 10 + 0.25 * torch.randn_like(x)
# this data sort of looks like yours, but is crucially on the same scale [-10, 110]

likelihood = gpytorch.likelihoods.GaussianLikelihood()
model = GPRegressionModel(train_x, train_y, likelihood)

model.train()
likelihood.train()

# now print the grid
print(model.covar_module.base_kernel.grid[0])
# tensor([ -9.9977,  -9.9827,  -9.9677,  ..., 110.0126, 110.0276, 110.0426])

# The gpytorch.settings.fast_pred_var flag activates LOVE (for fast variances)
# See https://arxiv.org/abs/1803.06058
with torch.no_grad(), gpytorch.settings.fast_pred_var():
    test_x = torch.linspace(-20, 100, 200).double()
    prediction = likelihood(model(test_x))
    mean = prediction.mean
    # Get lower and upper predictive bounds
    lower, upper = prediction.confidence_region()

# check the grid
print(model.covar_module.base_kernel.grid[0])
# tensor([ -9.9977,  -9.9827,  -9.9677,  ..., 110.0126, 110.0276, 110.0426])

# it also changes if you put the print statements in between your comments
print(model.covar_module.base_kernel.grid[0])
print("-20: ", model(torch.tensor([-20, -10., -5., 0.]).unsqueeze(-1).double()).mean)
print(model.covar_module.base_kernel.grid[0])
print("0", model(torch.tensor([0.]).unsqueeze(-1).double()).mean)
print(model.covar_module.base_kernel.grid[0])
print("0, 10", model(torch.tensor([0., 10.]).double().unsqueeze(-1)).mean)
print(model.covar_module.base_kernel.grid[0])
####
# tensor([ -9.9977,  -9.9827,  -9.9677,  ..., 110.0126, 110.0276, 110.0426],
 #      dtype=torch.float64)
# -20:  tensor([12.3671, 11.8522, 11.8437, 11.7605], dtype=torch.float64,
 #      grad_fn=<ViewBackward>)
# tensor([-20.0489, -20.0327, -20.0164,  ..., 110.0138, 110.0301, 110.0464],
 #      dtype=torch.float64)
# 0 tensor([11.9508], dtype=torch.float64, grad_fn=<ViewBackward>)
# tensor([ -9.9977,  -9.9827,  -9.9677,  ..., 110.0126, 110.0276, 110.0426],
#       dtype=torch.float64)
# 0, 10 tensor([11.9508, 11.9698], dtype=torch.float64, grad_fn=<ViewBackward>)
# tensor([ -9.9977,  -9.9827,  -9.9677,  ..., 110.0126, 110.0276, 110.0426],
#       dtype=torch.float64)

The prediction changes are probably most noticeable because your data has wide range and isn’t standardized to say [0,1] or to have zero mean.

0reactions

wjmaddoxcommented, Jul 27, 2021

Hi, sorry I’m not able to access the data now that I’m taking a look at this, but will try simulating some data to look roughly similar.

A priori, I’d expect that part of what’s happening is that the grid is changing (e.g. expanding if you predict at certain data points) by virtue of this set of code: https://github.com/cornellius-gp/gpytorch/blob/c074c2ff5ba5708761453bbd9be870c35cb57769/gpytorch/kernels/grid_interpolation_kernel.py#L158. In general, this should be nearly equivalent to setting the default grid bounds to be train_x.min() and train_x.max(), but on a dimension-wise setting.