Optimizer returns nan with linear kernel and then breaks
See original GitHub issueHowdy folks,
I am evaluating GPyTorch, coming from GPy.
I am trying to reproduce an GP regression example from GPy in GPyTorch. More in particular I am trying to perform GP regression on the Mauna Loa CO2 data as at the bottom of this notebook (Chapter 7):
https://nbviewer.jupyter.org/github/gpschool/gpss18/blob/master/labs/GPSS_Lab1_2018.ipynb
I started out with an additive combination of an RBF with a linear kernel. During the optimization, I get a lot of nans, and eventually it breaks with the following message:
File "/home/raven.ravenind.net/bmg/anaconda3/envs/gpytorch_env/lib/python3.6/site-packages/gpytorch/models/exact_gp.py", line 291, in __call__
predictive_mean, predictive_covar = self.prediction_strategy.exact_prediction(full_mean, full_covar)
File "/home/raven.ravenind.net/bmg/anaconda3/envs/gpytorch_env/lib/python3.6/site-packages/gpytorch/models/exact_prediction_strategies.py", line 288, in exact_prediction
self.exact_predictive_covar(test_test_covar, test_train_covar),
File "/home/raven.ravenind.net/bmg/anaconda3/envs/gpytorch_env/lib/python3.6/site-packages/gpytorch/models/exact_prediction_strategies.py", line 355, in exact_predictive_covar
test_train_covar)
File "/home/raven.ravenind.net/bmg/anaconda3/envs/gpytorch_env/lib/python3.6/site-packages/gpytorch/models/exact_prediction_strategies.py", line 572, in _exact_predictive_covar_inv_quad_form_root
self._sub_strategies, precomputed_cache, test_train_covar.evaluate_kernel().lazy_tensors
File "/home/raven.ravenind.net/bmg/anaconda3/envs/gpytorch_env/lib/python3.6/site-packages/gpytorch/models/exact_prediction_strategies.py", line 538, in _sub_strategies
for lazy_tensor in self.train_prior_dist.lazy_covariance_matrix.lazy_tensors:
AttributeError: 'LazyEvaluatedKernelTensor' object has no attribute 'lazy_tensors'
Here is my code:
import torch
import gpytorch
from matplotlib import pyplot as plt
import json
import numpy as np
# First load the data
# Here is an opportunity to dump the data to file so we can use it in other GP frameworks like GPyTorch
infile = open("mauna_loa_co2_data.json", 'r')
data = json.load(infile,)
infile.close()
# Training data (X = input, Y = observation)
X, Y = np.array(data['X']), np.array(data['Y'])
# Test data (Xtest = input, Ytest = observations)
Xtest, Ytest = np.array(data['Xtest']), np.array(data['Ytest'])
# Set up our plotting environment
plt.figure(figsize=(14, 8))
# Plot the training data in blue and the test data in red
plt.plot(X, Y, "b.", Xtest, Ytest, "r.")
# Annotate plot
plt.legend(labels=["training data", "test data"])
plt.xlabel("year"), plt.ylabel("CO$_2$ (PPM)"), plt.title("Monthly mean CO$_2$ at the Mauna Loa Observatory, Hawaii");
plt.show()
# reduce the datasize by only using every other training point:
x_train = torch.from_numpy(X[::2]).float()
y_train = torch.from_numpy(Y[::2]).float()
x_test = torch.from_numpy(Xtest).float()
y_test = torch.from_numpy(Ytest).float()
class ExactGPModel(gpytorch.models.ExactGP):
def __init__(self, train_x, train_y, likelihood):
super(ExactGPModel, self).__init__(train_x, train_y, likelihood)
self.mean_module = gpytorch.means.ConstantMean()
self.covar_module_1 = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel())
self.covar_module_2 = gpytorch.kernels.ScaleKernel(gpytorch.kernels.LinearKernel())
self.covar_module = self.covar_module_1 + self.covar_module_2
def forward(self, x):
mean_x = self.mean_module(x)
covar_x = self.covar_module(x)
return gpytorch.distributions.MultivariateNormal(mean_x, covar_x)
likelihood = gpytorch.likelihoods.GaussianLikelihood()
model = ExactGPModel(x_train, y_train, likelihood)
# Find optimal model hyperparameters
model.train()
likelihood.train()
# Use the adam optimizer
optimizer = torch.optim.Adam([
{'params': model.parameters()}, # Includes GaussianLikelihood parameters
], lr=0.1, weight_decay=0.0)
# "Loss" for GPs - the marginal log likelihood
mll = gpytorch.mlls.ExactMarginalLogLikelihood(likelihood, model)
training_iter = 500
for i in range(training_iter):
# Zero gradients from previous iteration
optimizer.zero_grad()
# Output from model
output = model(x_train)
# Calc loss and backprop gradients
loss = -mll(output, y_train)
loss.backward()
print('Iter %d/%d - Loss: %.3f ' % (
i + 1, training_iter, loss.item()
))
optimizer.step()
# Get into evaluation (predictive posterior) mode
model.eval()
likelihood.eval()
with torch.no_grad(), gpytorch.settings.fast_pred_var():
x_new = torch.cat((x_train, x_test),0)
observed_pred = likelihood(model(x_new))
with torch.no_grad():
# Initialize plot
f, ax = plt.subplots(1, 1, figsize=(4, 3))
# Get upper and lower confidence bounds
lower, upper = observed_pred.confidence_region()
# Plot training data as black stars
ax.plot(x_train.numpy(), y_train.numpy(), 'k*')
ax.plot(x_test.numpy(), y_test.numpy(), 'r*')
# Plot predictive means as blue line
ax.plot(x_new.numpy(), observed_pred.mean.numpy(), 'b-')
# Shade between the lower and upper confidence bounds
ax.fill_between(x_new.numpy(), lower.numpy(), upper.numpy(), alpha=0.5)
ax.legend(['Observed Data', 'Mean', 'Confidence'])
plt.show()
Does anybody have any suggestions?
In the mean time I’ll keep digging.
Thanks in advance,
Galto
Issue Analytics
- State:
- Created 4 years ago
- Comments:8 (3 by maintainers)
Top Results From Across the Web
NaN loss when training regression network - Stack Overflow
I was running into my loss function suddenly returning a nan after it go so far into the training process. I checked the...
Read more >Why do l get NaN values when l train my neural network with a ...
This simple 1D toy model exhibits same NaN behavior if we knock off the sigmoid layer, and just increase the number of nodes...
Read more >13. Kernel Methods - YouTube
With linear methods, we may need a whole lot of features to get a hypothesis space that's expressive enough to fit our data...
Read more >Optimize Options (Using the GNU Compiler Collection (GCC))
Statements are independent: if you stop the program with a breakpoint between statements, you can then assign a new value to any variable...
Read more >Cost function turning into nan after a certain number of iterations
Well, if you get NaN values in your cost function, it means that the input is outside of the function domain. E.g. the...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hi @Galto2000,
Looking at everything, this appeared to just be a data normalization issue (i.e., since the data wasn’t normalized, the default hyperparameter initialization was so bad it ran in to problems).
Simply z-scoring the training labels and data led to reasonable predictions: wthout nans during optimization.
If it helps, one little trick I like to use for making sure at least the features are normalized is to do something like:
Of course, with this dataset you’ll also need to normalize the labels so that the initial outputscale (of the RBFKernel) and variance (of the LinearKernel) are more sensible:
@Galto2000 Probably your best bet is the docs for
Kernel
itself: https://github.com/cornellius-gp/gpytorch/blob/master/gpytorch/kernels/kernel.pyplus a few examples like rbf or scale.
If it ends up taking too much time, I should be able to get around to it soonish.