Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

FixedNoiseGaussianLikelihood Vs GaussianLikelihood with noise constraint: how to properly model a noise-free GP?

See original GitHub issue

Hello,

I am working on implementing in GPyTorch a Gaussian process emulator (GPE). In my case, this is essentially a Gaussian process with mean function given by a linear regression model and covariance function given by a simple kernel (e.g. RBF). The linear regressor weights and bias, the scaler kernel outputscale and the kernel lengthscales are supposed to be tuned concurrently during the training process.

I have so far been using the ExactGP formulation with GaussianLikelihood, which from the tutorial was the way for getting started in predicting a scalar output of the form

y = f(X) + epsilon

with epsilon being a zero-mean normal distribution. Using this formulation, the likelihood noise term is tuned as well during the training process.

However, since my intent is to predict a computer code output which is supposed to be deterministic (observed data not polluted by any noise), I’d like to simply use a noise-free GPE formulation to make predictions at new (never simulated) input points. I basically would like the uncertainty around the predictions to be the one from the latent function posterior distribution without adding the variance of the noise modeled using epsilon.

This is what you get when you do:

predictions = model(X_new)

instead of

predictions = model.likelihood(model(X_new))

The point here is that in the first one you are still training the likelihood noise term which, in this case, you don’t want to model.

In order to remove it, the first thing I have tried is to use the FixedNoiseGaussianLikelihood in this way:

noise_level = 0.0
noise = noise_level*torch.ones(X_train.shape[0])
likelihood = gpytorch.likelihoods.FixedNoiseGaussianLikelihood(noise=noise, learn_additional_noise=False)

Although I can get a nice R2 score = 1.000 after the training when predicting points in the training dataset itself, I think that I am not doing things properly, since this formulation was born to model heteroskedastic noise. Moreover, this formulation allows the user to insert additional noise in the likelihood when predicting at new input points.

What I like about this formulation is that I am not learning any noise (learn_additional_noise=False), but at the same time I feel like this shouldn’t be the proper way to go for noise-free GPs. Also, this formulation sometimes yields to errors, especially when I increase the number of samples in the training dataset. Another annoying thing is that it rises a warning because I do not want to specify other additional noise in the likelihood when predicting new points

UserWarning: You have passed data through a FixedNoiseGaussianLikelihood that did not match the size of the fixed noise, *and* you did not specify noise. This is treated as a no-op.

so that I have to predict using

predictions = model(X_new)

(ignoring the likelihood, not that it will change much).

So I went back to the GaussianLikelihood and I found out something interesting. Basically, if you do not initialize the likelihood noise term, it gets tuned during the training process. If you initialize it to a value above the inferior bound given by the GreaterThan constraint (default is 1e-4), the noise still gets tuned. However, when I tried to initialize it to exactly the inferior bound, I have noticed that It didn’t update at all at each epoch in the training loop.

To give some context, this is the GPE formulation I am using:

class AffineMean(gpytorch.means.Mean):
	def __init__(self, in_dim, batch_shape=torch.Size()):
		super().__init__()
		self.register_parameter(name='weight', parameter=torch.nn.Parameter(torch.zeros((*batch_shape, in_dim), requires_grad=True, dtype=torch.float)))
		self.register_parameter(name='bias', parameter=torch.nn.Parameter(torch.zeros((*batch_shape, 1), requires_grad=True, dtype=torch.float)))

	def forward(self, x):
		return torch.matmul(self.weight, torch.transpose(x, 0, 1)) + self.bias


class ExactGPModel(gpytorch.models.ExactGP):
	def __init__(self, in_dim, train_x, train_y, likelihood):
		super(ExactGPModel, self).__init__(train_x, train_y, likelihood)
		self.mean_module = AffineMean(in_dim)
		self.covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel(ard_num_dims=in_dim))

	def forward(self, x):
		mean_x = self.mean_module(x)
		covar_x = self.covar_module(x)
		return gpytorch.distributions.MultivariateNormal(mean_x, covar_x)

and this is how I initialize the model:

likelihood = gpytorch.likelihoods.GaussianLikelihood()
model = ExactGPModel(in_dim, X_train, y_train, likelihood)

lsc_inf = np.log(0.1)
lsc_sup = np.log(10.0)
noise_level = 1e-8
hyperparameters = {
	'likelihood.noise_covar.raw_noise_constraint': gpytorch.constraints.GreaterThan(noise_level),
	'likelihood.noise_covar.noise': torch.tensor(noise_level),
	'covar_module.base_kernel.raw_lengthscale': (lsc_sup - lsc_inf)*torch.rand(in_dim) + lsc_inf,
	'covar_module.outputscale': torch.tensor(1.0)
}
if not np.isclose(data_mean, 0.0):
	hyperparameters['mean_module.bias'] = torch.tensor(data_mean)
model.initialize(**hyperparameters)

This is the result of the training using the FixedNoiseGaussianLikelihood, noise_level = 0.0:

Bias: tensor([0.0409])
Weigth: tensor([-0.0698, -0.0846, -0.0331,  0.0130,  0.5209,  0.0260,  0.0269, -0.5383,
         0.6175,  0.0662,  0.4325,  0.0672, -0.1155])
Outputscale: 0.5184513330459595
Lengthscale: tensor([[1.2976, 2.1197, 3.0498, 0.6758, 0.7924, 1.5327, 1.9657, 0.2474, 1.5582,
         0.3736, 2.8948, 1.9096, 0.5094]])
Likelihood noise: tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0.])

This is the results of the training using the GaussianLikelihood with ‘likelihood.noise_covar.noise’: inferior bound of the noise term GreaterThan constraint (in this case, changed to be 1e-8):

Bias: tensor([0.0409])
Weigth: tensor([-0.0698, -0.0846, -0.0331,  0.0130,  0.5209,  0.0260,  0.0269, -0.5383,
         0.6175,  0.0662,  0.4325,  0.0672, -0.1155])
Outputscale: 0.5184513330459595
Lengthscale: tensor([[1.2976, 2.1197, 3.0498, 0.6758, 0.7924, 1.5327, 1.9657, 0.2474, 1.5582,
         0.3736, 2.8948, 1.9096, 0.5094]])
Likelihood noise: tensor([1.0000e-08])

Can someone please advise me on how to correctly model a noise-free GP?

Thank you

Issue Analytics

State:
Created 3 years ago
Comments:5 (2 by maintainers)

Top GitHub Comments

1reaction

jacobrgardnercommented, Jul 12, 2020

@stelong No you’ll need to change the constraint so that whatever value you set is in bounds. That said, I wouldn’t set it too low because you’ll run in to numerical issues.

With a likelihood noise that small, it won’t much matter whether you predict through the likelihood or not.

0reactions

stelongcommented, Jul 17, 2020

Hi @eytan , sorry for the late reply.

I have already spent some time writing my custom wrapper around gpytorch for Gaussian process emulation and I have managed to get nice high cross-validated R2 scores on my dataset (which is built using model evaluations as I was mentioning). The trained emulators are also able to predict inside the input parameter space with low uncertainty. I haven’t tested the emulators outside the training input parameters’ ranges because for my research purposes I am only interested in interpolation tasks for the moment.

I honestly didn’t know about using acquisition functions to build the training dataset in the first place, I have simply used Latin hypercube designs for that, but I guess maybe I could have simulated less with a more sophisticated strategy as the acquisition function looks like.

BoTorch looks nice and ready to use, also my wrapper now does something like

model = GP()
model.fit(X_train, y_train)
mean, std = model.predict(X_test)

after a lot of coding though. I guess it will be worth for me looking at how the two implementations compare each other.

I appreciate your and @jacobrgardner 's advices, thanks!

Top Results From Across the Web

gpytorch.likelihoods — GPyTorch 1.9.0 documentation

noise_covar – A model for the noise covariance. This can be a simple homoskedastic noise model, or a GP that is to be...

Source code for botorch.models.gp_regression

/usr/bin/env python3 r""" Gaussian Process Regression models based on GPyTorch models. ... import ( FixedNoiseGaussianLikelihood, GaussianLikelihood, ...

Gaussian Process Training with Input Noise

Our model correctly expands the variance around all steep areas whereas MLHGP can only do so where high noise is observed (see areas...

Gaussian Process Training with Input Noise - NIPS papers

We present a simple yet effective GP model for training on input points cor- ... available noise-free and the CO2 sensors are affected...

The botorch from pytorch - GithubHelp

import torch from botorch.models import SingleTaskGP from botorch.fit import ... implement the Gaussian Processes with deterministic noise-free evaluations?