FixedNoiseGaussianLikelihood Vs GaussianLikelihood with noise constraint: how to properly model a noise-free GP?
See original GitHub issueHello,
I am working on implementing in GPyTorch a Gaussian process emulator (GPE). In my case, this is essentially a Gaussian process with mean function given by a linear regression model and covariance function given by a simple kernel (e.g. RBF). The linear regressor weights and bias, the scaler kernel outputscale and the kernel lengthscales are supposed to be tuned concurrently during the training process.
I have so far been using the ExactGP formulation with GaussianLikelihood, which from the tutorial was the way for getting started in predicting a scalar output of the form
y = f(X) + epsilon
with epsilon being a zero-mean normal distribution. Using this formulation, the likelihood noise term is tuned as well during the training process.
However, since my intent is to predict a computer code output which is supposed to be deterministic (observed data not polluted by any noise), I’d like to simply use a noise-free GPE formulation to make predictions at new (never simulated) input points. I basically would like the uncertainty around the predictions to be the one from the latent function posterior distribution without adding the variance of the noise modeled using epsilon.
This is what you get when you do:
predictions = model(X_new)
instead of
predictions = model.likelihood(model(X_new))
The point here is that in the first one you are still training the likelihood noise term which, in this case, you don’t want to model.
In order to remove it, the first thing I have tried is to use the FixedNoiseGaussianLikelihood in this way:
noise_level = 0.0
noise = noise_level*torch.ones(X_train.shape[0])
likelihood = gpytorch.likelihoods.FixedNoiseGaussianLikelihood(noise=noise, learn_additional_noise=False)
Although I can get a nice R2 score = 1.000 after the training when predicting points in the training dataset itself, I think that I am not doing things properly, since this formulation was born to model heteroskedastic noise. Moreover, this formulation allows the user to insert additional noise in the likelihood when predicting at new input points.
What I like about this formulation is that I am not learning any noise (learn_additional_noise=False), but at the same time I feel like this shouldn’t be the proper way to go for noise-free GPs. Also, this formulation sometimes yields to errors, especially when I increase the number of samples in the training dataset. Another annoying thing is that it rises a warning because I do not want to specify other additional noise in the likelihood when predicting new points
UserWarning: You have passed data through a FixedNoiseGaussianLikelihood that did not match the size of the fixed noise, *and* you did not specify noise. This is treated as a no-op.
so that I have to predict using
predictions = model(X_new)
(ignoring the likelihood, not that it will change much).
So I went back to the GaussianLikelihood and I found out something interesting. Basically, if you do not initialize the likelihood noise term, it gets tuned during the training process. If you initialize it to a value above the inferior bound given by the GreaterThan constraint (default is 1e-4), the noise still gets tuned. However, when I tried to initialize it to exactly the inferior bound, I have noticed that It didn’t update at all at each epoch in the training loop.
To give some context, this is the GPE formulation I am using:
class AffineMean(gpytorch.means.Mean):
def __init__(self, in_dim, batch_shape=torch.Size()):
super().__init__()
self.register_parameter(name='weight', parameter=torch.nn.Parameter(torch.zeros((*batch_shape, in_dim), requires_grad=True, dtype=torch.float)))
self.register_parameter(name='bias', parameter=torch.nn.Parameter(torch.zeros((*batch_shape, 1), requires_grad=True, dtype=torch.float)))
def forward(self, x):
return torch.matmul(self.weight, torch.transpose(x, 0, 1)) + self.bias
class ExactGPModel(gpytorch.models.ExactGP):
def __init__(self, in_dim, train_x, train_y, likelihood):
super(ExactGPModel, self).__init__(train_x, train_y, likelihood)
self.mean_module = AffineMean(in_dim)
self.covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel(ard_num_dims=in_dim))
def forward(self, x):
mean_x = self.mean_module(x)
covar_x = self.covar_module(x)
return gpytorch.distributions.MultivariateNormal(mean_x, covar_x)
and this is how I initialize the model:
likelihood = gpytorch.likelihoods.GaussianLikelihood()
model = ExactGPModel(in_dim, X_train, y_train, likelihood)
lsc_inf = np.log(0.1)
lsc_sup = np.log(10.0)
noise_level = 1e-8
hyperparameters = {
'likelihood.noise_covar.raw_noise_constraint': gpytorch.constraints.GreaterThan(noise_level),
'likelihood.noise_covar.noise': torch.tensor(noise_level),
'covar_module.base_kernel.raw_lengthscale': (lsc_sup - lsc_inf)*torch.rand(in_dim) + lsc_inf,
'covar_module.outputscale': torch.tensor(1.0)
}
if not np.isclose(data_mean, 0.0):
hyperparameters['mean_module.bias'] = torch.tensor(data_mean)
model.initialize(**hyperparameters)
This is the result of the training using the FixedNoiseGaussianLikelihood, noise_level = 0.0:
Bias: tensor([0.0409])
Weigth: tensor([-0.0698, -0.0846, -0.0331, 0.0130, 0.5209, 0.0260, 0.0269, -0.5383,
0.6175, 0.0662, 0.4325, 0.0672, -0.1155])
Outputscale: 0.5184513330459595
Lengthscale: tensor([[1.2976, 2.1197, 3.0498, 0.6758, 0.7924, 1.5327, 1.9657, 0.2474, 1.5582,
0.3736, 2.8948, 1.9096, 0.5094]])
Likelihood noise: tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0.])
This is the results of the training using the GaussianLikelihood with ‘likelihood.noise_covar.noise’: inferior bound of the noise term GreaterThan constraint (in this case, changed to be 1e-8):
Bias: tensor([0.0409])
Weigth: tensor([-0.0698, -0.0846, -0.0331, 0.0130, 0.5209, 0.0260, 0.0269, -0.5383,
0.6175, 0.0662, 0.4325, 0.0672, -0.1155])
Outputscale: 0.5184513330459595
Lengthscale: tensor([[1.2976, 2.1197, 3.0498, 0.6758, 0.7924, 1.5327, 1.9657, 0.2474, 1.5582,
0.3736, 2.8948, 1.9096, 0.5094]])
Likelihood noise: tensor([1.0000e-08])
Can someone please advise me on how to correctly model a noise-free GP?
Thank you
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (2 by maintainers)
Top GitHub Comments
@stelong No you’ll need to change the constraint so that whatever value you set is in bounds. That said, I wouldn’t set it too low because you’ll run in to numerical issues.
With a likelihood noise that small, it won’t much matter whether you predict through the likelihood or not.
Hi @eytan , sorry for the late reply.
I have already spent some time writing my custom wrapper around gpytorch for Gaussian process emulation and I have managed to get nice high cross-validated R2 scores on my dataset (which is built using model evaluations as I was mentioning). The trained emulators are also able to predict inside the input parameter space with low uncertainty. I haven’t tested the emulators outside the training input parameters’ ranges because for my research purposes I am only interested in interpolation tasks for the moment.
I honestly didn’t know about using acquisition functions to build the training dataset in the first place, I have simply used Latin hypercube designs for that, but I guess maybe I could have simulated less with a more sophisticated strategy as the acquisition function looks like.
BoTorch looks nice and ready to use, also my wrapper now does something like
after a lot of coding though. I guess it will be worth for me looking at how the two implementations compare each other.
I appreciate your and @jacobrgardner 's advices, thanks!