Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Optimizer returns nan with linear kernel and then breaks

See original GitHub issue

Howdy folks,

I am evaluating GPyTorch, coming from GPy.

I am trying to reproduce an GP regression example from GPy in GPyTorch. More in particular I am trying to perform GP regression on the Mauna Loa CO2 data as at the bottom of this notebook (Chapter 7):

https://nbviewer.jupyter.org/github/gpschool/gpss18/blob/master/labs/GPSS_Lab1_2018.ipynb

I started out with an additive combination of an RBF with a linear kernel. During the optimization, I get a lot of nans, and eventually it breaks with the following message:

 File "/home/raven.ravenind.net/bmg/anaconda3/envs/gpytorch_env/lib/python3.6/site-packages/gpytorch/models/exact_gp.py", line 291, in __call__
    predictive_mean, predictive_covar = self.prediction_strategy.exact_prediction(full_mean, full_covar)
  File "/home/raven.ravenind.net/bmg/anaconda3/envs/gpytorch_env/lib/python3.6/site-packages/gpytorch/models/exact_prediction_strategies.py", line 288, in exact_prediction
    self.exact_predictive_covar(test_test_covar, test_train_covar),
  File "/home/raven.ravenind.net/bmg/anaconda3/envs/gpytorch_env/lib/python3.6/site-packages/gpytorch/models/exact_prediction_strategies.py", line 355, in exact_predictive_covar
    test_train_covar)
  File "/home/raven.ravenind.net/bmg/anaconda3/envs/gpytorch_env/lib/python3.6/site-packages/gpytorch/models/exact_prediction_strategies.py", line 572, in _exact_predictive_covar_inv_quad_form_root
    self._sub_strategies, precomputed_cache, test_train_covar.evaluate_kernel().lazy_tensors
  File "/home/raven.ravenind.net/bmg/anaconda3/envs/gpytorch_env/lib/python3.6/site-packages/gpytorch/models/exact_prediction_strategies.py", line 538, in _sub_strategies
    for lazy_tensor in self.train_prior_dist.lazy_covariance_matrix.lazy_tensors:
AttributeError: 'LazyEvaluatedKernelTensor' object has no attribute 'lazy_tensors'

Here is my code:

import torch
import gpytorch
from matplotlib import pyplot as plt
import json
import numpy as np

# First load the data
# Here is an opportunity to dump the data to file so we can use it in other GP frameworks like GPyTorch
infile = open("mauna_loa_co2_data.json", 'r')
data = json.load(infile,)
infile.close()

# Training data (X = input, Y = observation)
X, Y = np.array(data['X']), np.array(data['Y'])

# Test data (Xtest = input, Ytest = observations)
Xtest, Ytest =  np.array(data['Xtest']),  np.array(data['Ytest'])


# Set up our plotting environment
plt.figure(figsize=(14, 8))

# Plot the training data in blue and the test data in red
plt.plot(X, Y, "b.", Xtest, Ytest, "r.")

# Annotate plot
plt.legend(labels=["training data", "test data"])
plt.xlabel("year"), plt.ylabel("CO$_2$ (PPM)"), plt.title("Monthly mean CO$_2$ at the Mauna Loa Observatory, Hawaii");

plt.show()


# reduce the datasize by only using every other training point:
x_train = torch.from_numpy(X[::2]).float()
y_train = torch.from_numpy(Y[::2]).float()
x_test = torch.from_numpy(Xtest).float()
y_test = torch.from_numpy(Ytest).float()


class ExactGPModel(gpytorch.models.ExactGP):
    def __init__(self, train_x, train_y, likelihood):
        super(ExactGPModel, self).__init__(train_x, train_y, likelihood)

        self.mean_module = gpytorch.means.ConstantMean()
        self.covar_module_1 = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel())
        self.covar_module_2 = gpytorch.kernels.ScaleKernel(gpytorch.kernels.LinearKernel())
        self.covar_module = self.covar_module_1 + self.covar_module_2

    def forward(self, x):
        mean_x = self.mean_module(x)
        covar_x = self.covar_module(x)
        return gpytorch.distributions.MultivariateNormal(mean_x, covar_x)


likelihood = gpytorch.likelihoods.GaussianLikelihood()
model = ExactGPModel(x_train, y_train, likelihood)

# Find optimal model hyperparameters
model.train()
likelihood.train()


# Use the adam optimizer
optimizer = torch.optim.Adam([
    {'params': model.parameters()},  # Includes GaussianLikelihood parameters
], lr=0.1, weight_decay=0.0)

# "Loss" for GPs - the marginal log likelihood
mll = gpytorch.mlls.ExactMarginalLogLikelihood(likelihood, model)

training_iter = 500
for i in range(training_iter):
    # Zero gradients from previous iteration
    optimizer.zero_grad()
    # Output from model
    output = model(x_train)
    # Calc loss and backprop gradients
    loss = -mll(output, y_train)
    loss.backward()
    print('Iter %d/%d - Loss: %.3f  ' % (
        i + 1, training_iter, loss.item()
    ))
    optimizer.step()


# Get into evaluation (predictive posterior) mode
model.eval()
likelihood.eval()

with torch.no_grad(), gpytorch.settings.fast_pred_var():
    x_new = torch.cat((x_train, x_test),0)
    observed_pred = likelihood(model(x_new))

with torch.no_grad():
    # Initialize plot
    f, ax = plt.subplots(1, 1, figsize=(4, 3))

    # Get upper and lower confidence bounds
    lower, upper = observed_pred.confidence_region()
    # Plot training data as black stars
    ax.plot(x_train.numpy(), y_train.numpy(), 'k*')
    ax.plot(x_test.numpy(), y_test.numpy(), 'r*')
    # Plot predictive means as blue line
    ax.plot(x_new.numpy(), observed_pred.mean.numpy(), 'b-')
    # Shade between the lower and upper confidence bounds
    ax.fill_between(x_new.numpy(), lower.numpy(), upper.numpy(), alpha=0.5)
    ax.legend(['Observed Data', 'Mean', 'Confidence'])

plt.show()

Does anybody have any suggestions?

In the mean time I’ll keep digging.

Thanks in advance,

Galto

Issue Analytics

State:
Created 4 years ago
Comments:8 (3 by maintainers)

Top GitHub Comments

1reaction

jacobrgardnercommented, Aug 28, 2019

Hi @Galto2000,

Looking at everything, this appeared to just be a data normalization issue (i.e., since the data wasn’t normalized, the default hyperparameter initialization was so bad it ran in to problems).

Simply z-scoring the training labels and data led to reasonable predictions: wthout nans during optimization.

If it helps, one little trick I like to use for making sure at least the features are normalized is to do something like:

class ExactGPModel(gpytorch.models.ExactGP):
    def __init__(self, train_x, train_y, likelihood):
        ...
        self.normalizer = torch.nn.BatchNorm1d(num_features=train_x.size(-1), affine=False)

    def forward(self, x):
        x = self.normalizer(x)
        ...

Of course, with this dataset you’ll also need to normalize the labels so that the initial outputscale (of the RBFKernel) and variance (of the LinearKernel) are more sensible:

mu = y_train.mean()
std = y_train.std()

y_train = (y_train - mu) / std
y_test = (y_test - mu) / std

0reactions

jacobrgardnercommented, Sep 9, 2019

@Galto2000 Probably your best bet is the docs for Kernel itself: https://github.com/cornellius-gp/gpytorch/blob/master/gpytorch/kernels/kernel.py

plus a few examples like rbf or scale.

If it ends up taking too much time, I should be able to get around to it soonish.

Top Results From Across the Web

NaN loss when training regression network - Stack Overflow

I was running into my loss function suddenly returning a nan after it go so far into the training process. I checked the...

Why do l get NaN values when l train my neural network with a ...

This simple 1D toy model exhibits same NaN behavior if we knock off the sigmoid layer, and just increase the number of nodes...

13. Kernel Methods - YouTube

With linear methods, we may need a whole lot of features to get a hypothesis space that's expressive enough to fit our data...

Optimize Options (Using the GNU Compiler Collection (GCC))

Statements are independent: if you stop the program with a breakpoint between statements, you can then assign a new value to any variable...

Cost function turning into nan after a certain number of iterations

Well, if you get NaN values in your cost function, it means that the input is outside of the function domain. E.g. the...

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Optimizer returns nan with linear kernel and then breaks

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

Using batch-GP for learnign single common GP over multiple experiments

[Docs] Multitask GP & classification with missing values