Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Training / Evaluation Question

See original GitHub issue

Hello, I’ve been playing around with gpytorch recently and it is great! I just had a few questions around training and evaluation. These are in the context of exact GPs for regression.

I have something like this for training for i in range(training_iter): optimizer.zero_grad() output = model(train_x) loss = -mll(output, train_y) loss.backward() print('Iter %d/%d - Loss: %.3f noise: %.3f' % ( i + 1, training_iter, loss.item(), torch.mean(model.likelihood.noise) )) optimizer.step() and something like this for evaluation with torch.no_grad(), gpytorch.settings.fast_pred_var(): marginal_likelihood = mll(model(train_x), train_y) print(marginal_likelihood.item()) These unfortunately, give different values. Is there a reason behind this? Shouldn’t they both be capturing the marginal loglikelihood of the training data?

The related question is how are folks training their GPs? The number of iterations seem arbitrary in the docs, so I was wondering if there was some termination condition people generally use. I sometimes see the loss increasing and oscillating (which is totally fine - if I wanted to treat it as a NN, I would lower learning rate etc), but I figured since GPs have closed forms, there should be a better way to do this rather than hyperparameter tuning the learning rate and early stopping.

Issue Analytics

State:
Created 3 years ago
Reactions:1
Comments:7 (2 by maintainers)

Top GitHub Comments

1reaction

jacobrgardnercommented, Jul 12, 2020

I would definitely use the mll as output in train mode, as that is the objective function you are training through. I don’t think it would make much sense to use your training set for early stopping in eval mode.

I’m going to close this for now, feel free to reopen or open a new issue for further questions.

1reaction

KeAWangcommented, Jul 10, 2020

Gradient descent is only changing the hyperparameters of the kernel. It wont cause the prior to be the same as the posterior. Those will always be difference since one is the distribution before seeing data and the other is a distribution conditioned on data.

The marginal log likelihood of a GP is computed from the prior mean and prior covariance matrix among the training points. Hence mll expects the prior multivariate normal distribution as its input. Note that you could pass in any multivariate distribution to mll and it’ll run without errors. This is why you are able to pass in model(x), which is always a multivariate normal distribution, even when model is in .eval() mode.