Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Question] Conditional model output when using SGD

See original GitHub issue

I am trying to use a multitask gaussian process for Bayesian optimization with a dataset that has about 10,000 records. In order to speedup training, I am trying to use mini-batch SGD to find the hyperparameters of the model´s kernel and mean (for this I randomly sample my training data, set the training data of the model, calculate the marginal log-likelihood and step the optimizer).

Before moving the model into eval(), I load the whole data to the model by using set_train_data(). However, whenever I perform predictions, I observe that the mean of the model does not match the training examples. Moreover, I would expect the width of confidence interval for the function to decrease when approaching the training points.

My guess is that this is related to how MultitaskGaussianLikelihood works. Is this the case? Since I am trying to use SGD, I would like to make the accuracy of the predictions of the training data in the conditioned model independent of the results achieved during training. Is this possible?

Issue Analytics

State:
Created 2 years ago
Comments:6

Top GitHub Comments

1reaction

wjmaddoxcommented, Sep 15, 2021

Yes, so what’s likely happening is that there’s a local “optimum” of the parameters which has [short lengthscale, large noise] for the five data point case (an “underfit” GP). Hence why you’re severely under-fitting. In this case, conditioning on the observed data (your posterior) will do not that much and the GP’s posterior will tend to be like the prior.

Here, I profiled out the 5 data regime to visualize, note the large trough of low MLL for a noise around 0.5:

I also truncated values at 3 b/c they skyrocket past there.

To prevent this from occuring, beyond what I was suggesting previously, you could also place a prior / constraint on the likelihood (which changes the optimization landscape) to prevent it from getting too large before optimizing.

0reactions

fleskovarcommented, Sep 15, 2021

Thank you so much for the explanation, it really helped me understand how things work under the hood. I have just run some tests with priors and constraints on the noise parameter and I am now able to get the results I was expecting.

Additionally, thank you for the link to the paper about SGD and the recommendation on the KroneckerMultiTaskGP (I might come back with additional questions about this later!).