question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Question] Cannot reproduce GPy with GPyTorch for 3D problem

See original GitHub issue

Hi!

I’m currently working on porting a simple GPy model to GPyTorch. It’s for a regression task, in which the input data (X) are coordinates in the range X \in [0, 7] and the targets are values in the range [-158, 3067]. At the moment I’m not interested in extrapolation, only interpolation.

The GPy model I’m using is defined as follows:

def build_model(X_data: np.ndarray, y_data:np.ndarray, domain: Domain, mode: int):
    """Return GP model given a set of data and a domain.
    """
    if mode == 1:
        k1 = GPy.kern.RBF(input_dim=len(domain))
        k2 = GPy.kern.Bias(input_dim=len(domain))
        kernel = k1 + k2
        kernel['rbf.lengthscale'].set_prior(GPy.priors.Gamma(a=1, b=2))
    else:
        raise ValueError(f'Unknown mode: {mode}')
    model = GPy.models.GPRegression(X_data, y_data, kernel, normalizer=GPy.util.normalizer.Standardize())
    model.optimize()
    _ = model.optimize_restarts(verbose=True)
    return model

This yields the following model gpy_model

and the predicted function values in the range [2.5,7] (since the function grows rapidly as it approaches zero) like this: gpy_pes

For the GPyTorch model, I’ve followed the ExactGP regression tutorial. I’ve modified the model slightly to perform z-scaling of the input values, due to large range of my y values (I do it for the X-positions as well, but that doesn’t seem to have an effect). Here is my model

import numpy as np
import torch
import gpytorch
import matplotlib.pyplot as plt

class SimpleGPyTorch(gpytorch.models.ExactGP):
    def __init__(self, train_x, train_y, likelihood, mode, standardize):
        super(SimpleGPyTorch, self).__init__(train_x, train_y, likelihood)
        self.mean_module = gpytorch.means.ConstantMean()
        # Single RBF kernel, no bias kernel
        self.covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel())
        # self.covar_module = gpytorch.kernels.SpectralMixtureKernel(num_mixtures=4, ard_num_dims=3)
        self.mode = mode # either 1 for CPU or 2 for GPU
        self.standardize = standardize
        if standardize:
            # Save mean/std of input for standardization
            self.y_mean = train_y.mean()
            self.y_std = train_y.std()
            self.x_mean = train_x.mean(dim=0, keepdim=True)
            self.x_std = train_x.std(dim=0, keepdim=True)
            self.standardize_training_data()
        

    def forward(self, x):
        mean = self.mean_module(x)
        covar = self.covar_module(x)
        return gpytorch.distributions.MultivariateNormal(mean, covar)


    def optimize(self, training_iter=125, verbose=False, plot=True):
        model = self
        likelihood = self.likelihood
        X = self.train_inputs[0]
        y = self.train_targets

        model.train()
        likelihood.train()
        optimizer = torch.optim.Adam(model.parameters(), lr=0.1)
        # optimizer = FullBatchLBFGS(model.parameters(), lr=0.01)

        mll = gpytorch.mlls.ExactMarginalLogLikelihood(likelihood, model)
        loss_trace = []
        
        def closure():
            optimizer.zero_grad()
            output = model(X)
            loss = -mll(output, y) # reach MLE through gradient descent
            return loss
        with gpytorch.settings.cg_tolerance(0.01), gpytorch.settings.cg_tolerance(10000), gpytorch.settings.max_preconditioner_size(100):
            for i in range(training_iter):
                # Set gradients from previous iteration to 0
                loss = closure()
                loss.backward()
                optimizer.step()

                # options = {'closure': closure, 'current_loss': loss, 'max_ls': 20}
                # loss, _, _, _, _, _, _, fail = optimizer.step(options)
                if verbose and i%50 == 0:
                    print(f'Iteration {i} - Loss: {loss.item():.3f} - Lengthscale: {model.covar_module.base_kernel.lengthscale.item():.3f} - Noise: {model.likelihood.noise.item():.3f}')
                loss_trace.append(loss.detach().numpy())

                # if fail: 
                #     break
            if plot:
                _, ax = plt.subplots(figsize=(8,6))
                ax.set_xlabel("Training iteration")
                ax.set_ylabel("Marginal Log Likelihood Loss")
                ax.plot(loss_trace)
        return loss_trace


    def standardize_training_data(self):
        # Standardize targets
        self.train_targets -= self.y_mean
        self.train_targets /= self.y_std
        # Standardize features
        train_x = self.train_inputs[0]
        train_x -= self.x_mean
        train_x /= self.x_std
        self.train_inputs = (train_x,)


    def predict(self, x):
        self.eval()
        self.likelihood.eval()
                
        with torch.no_grad(), gpytorch.settings.fast_pred_var(): 
            x_pred = torch.from_numpy(x).type(torch.FloatTensor)

            if self.standardize:
                # Standardize prediction features
                x_pred = (x_pred - self.x_mean) / self.x_std
                
        
            prediction = self.likelihood(self(x_pred))
            mean = prediction.mean.detach().numpy()
            var = prediction.variance.detach().numpy()

            if self.standardize:
                # Rescale prediction to original training data scale
                original_mean = self.y_mean.detach().numpy()
                original_std = self.y_std.detach().numpy()
                mean = mean*original_std + original_mean
                var = var*original_std**2 # Variance is stationary and is only changed by a factor - https://github.com/scikit-learn/scikit-learn/blob/2beed55847ee70d363bdbfe14ee4401438fba057/sklearn/gaussian_process/_gpr.py#L355
            return mean, var

which yields the following training curve and model parameters. gpytorch_mll gpytorch_params

and the predicted y values: gpytorch

I’ve tried different optimizers (Adam, SGD, LBFGS from PyTorch as well as PyTorch-LBFGS), learning rates, using a gamma prior on the lengthscale, changing the kernel (to the GaussianMixture kernel), increasing the accuracy of the CG solves and the preconditioner size. I’ve also tried restarting training to avoid getting stuck in local minima, but in those case I’ve found that the model sometimes converges to predicting a flat function.

My guess is that the poor performance of the GPyTorch model is due to the model not converging very well during training. I see much better results when capping the range of y values to for instance [0, 50] instead of the full range, but this is not necessary for GPy.

As a sanity check, I’ve tested these exact models for both a toy problem (a noisy sine) as well as a simpler version of the problem above (2D positions instead of 3D), in which both GPy and GPyTorch agrees almost perfectly. For the 3D problem, I’ve also tried implementing the same model using scikit-learn, and there I get very similar results as for GPy. Is there something I’m missing?

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:13 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
elindgrencommented, Sep 22, 2021

I think I managed to get it working. Using botorch in combination with setting a positive constraint on the noise and increasing the max cholesky size to be larger than the size of my dataset leads to very similar results to GPy. The hyperparameters are different than for GPy, but the predicted model is very similar:

z-benzene

I had also made an error when calculating the MAE and RMSE earlier, so they were closer before than what they seem. Now GPy and GPyTorch are almost identical.

mae-gpytorch

The problem with the model predicting a flat function was also an error that I had made: I had forgotten to copy my input tensors to the model, and since I perform z-scaling of both input data and targets on model creation meant that they both went to zero after a few models had been created (since the same tensors were erroneously used for all models).

@Balandat, @wjmaddox and @jacobrgardner, thank you for taking the time to help me investigate my problem! And thank you for a very nice and well documented gaussian process package!

0reactions
jacobrgardnercommented, Sep 21, 2021

@elindgren the lower MLL could be expected. GPy technically uses a noise lower bound of 1e-6 in the form of adding jitter when computing cholesky: https://github.com/SheffieldML/GPy/blob/3e19a85575687e37fd6f61174115d7c94d2c96e6/GPy/util/linalg.py#L65

Removing the lower bound in GPyTorch entirely will let you get potentially much smaller noises than 1e-6 (and therefore potentially lower losses), though at the potential cost of numerical instability as the conditioning of the kernel matrix gets worse. The NumericalWarnings with jitter of 1e-8 don’t seem too bad though: 1e-8 is still a really small amount of jitter.

If you want the warnings to go away, you could probably run everything in fp64 instead of torch’s default fp32 by adding torch.set_default_dtype(torch.float64) at the top of your script.

Alternatively, if you want to set the same lower bound, you can use a GreaterThan(1e-6) constraint.

It’s not surprising that the lengthscale gets smaller as the noise decreases: smaller lengthscales cause the GP fit to interpolate the training data more tightly, which makes sense in a lower noise regime.

Read more comments on GitHub >

github_iconTop Results From Across the Web

[QUESTION]how to use gpytorch achieve same effect as ...
I want to use GPR interplotation ,there are two python pakage to use ,gpytorch and GPy. the GPy can automatically optimizes parameters ,so...
Read more >
Newest 'gpytorch' Questions
I am using gpytorch library to apply Deep Gaussian Processes for a regression analysis. Here is my code: import tqdm import gpytorch from...
Read more >
gpytorch.kernels — GPyTorch 1.9.0 documentation
As a result, if you want to use a gpytorch.kernels.Kernel object just to get an actual torch.tensor representing the covariance matrix, you may...
Read more >
Scientific Machine Learning Through Physics–Informed ...
The primary research question was to determine what PINNs are and ... of problems that cannot be addressed using conventional approaches.
Read more >
GPy.models package - Read the Docs
kwargs – the kwargs for the scatter plots; projection (str) – for now 2d or 3d projection (other projections can be implemented, see...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found