question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Optimizer returns nan with linear kernel and then breaks

See original GitHub issue

Howdy folks,

I am evaluating GPyTorch, coming from GPy.

I am trying to reproduce an GP regression example from GPy in GPyTorch. More in particular I am trying to perform GP regression on the Mauna Loa CO2 data as at the bottom of this notebook (Chapter 7):

https://nbviewer.jupyter.org/github/gpschool/gpss18/blob/master/labs/GPSS_Lab1_2018.ipynb

I started out with an additive combination of an RBF with a linear kernel. During the optimization, I get a lot of nans, and eventually it breaks with the following message:

 File "/home/raven.ravenind.net/bmg/anaconda3/envs/gpytorch_env/lib/python3.6/site-packages/gpytorch/models/exact_gp.py", line 291, in __call__
    predictive_mean, predictive_covar = self.prediction_strategy.exact_prediction(full_mean, full_covar)
  File "/home/raven.ravenind.net/bmg/anaconda3/envs/gpytorch_env/lib/python3.6/site-packages/gpytorch/models/exact_prediction_strategies.py", line 288, in exact_prediction
    self.exact_predictive_covar(test_test_covar, test_train_covar),
  File "/home/raven.ravenind.net/bmg/anaconda3/envs/gpytorch_env/lib/python3.6/site-packages/gpytorch/models/exact_prediction_strategies.py", line 355, in exact_predictive_covar
    test_train_covar)
  File "/home/raven.ravenind.net/bmg/anaconda3/envs/gpytorch_env/lib/python3.6/site-packages/gpytorch/models/exact_prediction_strategies.py", line 572, in _exact_predictive_covar_inv_quad_form_root
    self._sub_strategies, precomputed_cache, test_train_covar.evaluate_kernel().lazy_tensors
  File "/home/raven.ravenind.net/bmg/anaconda3/envs/gpytorch_env/lib/python3.6/site-packages/gpytorch/models/exact_prediction_strategies.py", line 538, in _sub_strategies
    for lazy_tensor in self.train_prior_dist.lazy_covariance_matrix.lazy_tensors:
AttributeError: 'LazyEvaluatedKernelTensor' object has no attribute 'lazy_tensors'

Here is my code:

import torch
import gpytorch
from matplotlib import pyplot as plt
import json
import numpy as np

# First load the data
# Here is an opportunity to dump the data to file so we can use it in other GP frameworks like GPyTorch
infile = open("mauna_loa_co2_data.json", 'r')
data = json.load(infile,)
infile.close()

# Training data (X = input, Y = observation)
X, Y = np.array(data['X']), np.array(data['Y'])

# Test data (Xtest = input, Ytest = observations)
Xtest, Ytest =  np.array(data['Xtest']),  np.array(data['Ytest'])


# Set up our plotting environment
plt.figure(figsize=(14, 8))

# Plot the training data in blue and the test data in red
plt.plot(X, Y, "b.", Xtest, Ytest, "r.")

# Annotate plot
plt.legend(labels=["training data", "test data"])
plt.xlabel("year"), plt.ylabel("CO$_2$ (PPM)"), plt.title("Monthly mean CO$_2$ at the Mauna Loa Observatory, Hawaii");

plt.show()


# reduce the datasize by only using every other training point:
x_train = torch.from_numpy(X[::2]).float()
y_train = torch.from_numpy(Y[::2]).float()
x_test = torch.from_numpy(Xtest).float()
y_test = torch.from_numpy(Ytest).float()


class ExactGPModel(gpytorch.models.ExactGP):
    def __init__(self, train_x, train_y, likelihood):
        super(ExactGPModel, self).__init__(train_x, train_y, likelihood)

        self.mean_module = gpytorch.means.ConstantMean()
        self.covar_module_1 = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel())
        self.covar_module_2 = gpytorch.kernels.ScaleKernel(gpytorch.kernels.LinearKernel())
        self.covar_module = self.covar_module_1 + self.covar_module_2

    def forward(self, x):
        mean_x = self.mean_module(x)
        covar_x = self.covar_module(x)
        return gpytorch.distributions.MultivariateNormal(mean_x, covar_x)


likelihood = gpytorch.likelihoods.GaussianLikelihood()
model = ExactGPModel(x_train, y_train, likelihood)

# Find optimal model hyperparameters
model.train()
likelihood.train()


# Use the adam optimizer
optimizer = torch.optim.Adam([
    {'params': model.parameters()},  # Includes GaussianLikelihood parameters
], lr=0.1, weight_decay=0.0)

# "Loss" for GPs - the marginal log likelihood
mll = gpytorch.mlls.ExactMarginalLogLikelihood(likelihood, model)

training_iter = 500
for i in range(training_iter):
    # Zero gradients from previous iteration
    optimizer.zero_grad()
    # Output from model
    output = model(x_train)
    # Calc loss and backprop gradients
    loss = -mll(output, y_train)
    loss.backward()
    print('Iter %d/%d - Loss: %.3f  ' % (
        i + 1, training_iter, loss.item()
    ))
    optimizer.step()


# Get into evaluation (predictive posterior) mode
model.eval()
likelihood.eval()

with torch.no_grad(), gpytorch.settings.fast_pred_var():
    x_new = torch.cat((x_train, x_test),0)
    observed_pred = likelihood(model(x_new))

with torch.no_grad():
    # Initialize plot
    f, ax = plt.subplots(1, 1, figsize=(4, 3))

    # Get upper and lower confidence bounds
    lower, upper = observed_pred.confidence_region()
    # Plot training data as black stars
    ax.plot(x_train.numpy(), y_train.numpy(), 'k*')
    ax.plot(x_test.numpy(), y_test.numpy(), 'r*')
    # Plot predictive means as blue line
    ax.plot(x_new.numpy(), observed_pred.mean.numpy(), 'b-')
    # Shade between the lower and upper confidence bounds
    ax.fill_between(x_new.numpy(), lower.numpy(), upper.numpy(), alpha=0.5)
    ax.legend(['Observed Data', 'Mean', 'Confidence'])

plt.show()

Does anybody have any suggestions?

In the mean time I’ll keep digging.

Thanks in advance,

Galto

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:8 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
jacobrgardnercommented, Aug 28, 2019

Hi @Galto2000,

Looking at everything, this appeared to just be a data normalization issue (i.e., since the data wasn’t normalized, the default hyperparameter initialization was so bad it ran in to problems).

Simply z-scoring the training labels and data led to reasonable predictions: image wthout nans during optimization.

If it helps, one little trick I like to use for making sure at least the features are normalized is to do something like:

class ExactGPModel(gpytorch.models.ExactGP):
    def __init__(self, train_x, train_y, likelihood):
        ...
        self.normalizer = torch.nn.BatchNorm1d(num_features=train_x.size(-1), affine=False)

    def forward(self, x):
        x = self.normalizer(x)
        ...

Of course, with this dataset you’ll also need to normalize the labels so that the initial outputscale (of the RBFKernel) and variance (of the LinearKernel) are more sensible:

mu = y_train.mean()
std = y_train.std()

y_train = (y_train - mu) / std
y_test = (y_test - mu) / std
0reactions
jacobrgardnercommented, Sep 9, 2019

@Galto2000 Probably your best bet is the docs for Kernel itself: https://github.com/cornellius-gp/gpytorch/blob/master/gpytorch/kernels/kernel.py

plus a few examples like rbf or scale.

If it ends up taking too much time, I should be able to get around to it soonish.

Read more comments on GitHub >

github_iconTop Results From Across the Web

NaN loss when training regression network - Stack Overflow
I was running into my loss function suddenly returning a nan after it go so far into the training process. I checked the...
Read more >
Why do l get NaN values when l train my neural network with a ...
This simple 1D toy model exhibits same NaN behavior if we knock off the sigmoid layer, and just increase the number of nodes...
Read more >
13. Kernel Methods - YouTube
With linear methods, we may need a whole lot of features to get a hypothesis space that's expressive enough to fit our data...
Read more >
Optimize Options (Using the GNU Compiler Collection (GCC))
Statements are independent: if you stop the program with a breakpoint between statements, you can then assign a new value to any variable...
Read more >
Cost function turning into nan after a certain number of iterations
Well, if you get NaN values in your cost function, it means that the input is outside of the function domain. E.g. the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found