[Question] Conditional model output when using SGD
See original GitHub issueI am trying to use a multitask gaussian process for Bayesian optimization with a dataset that has about 10,000 records. In order to speedup training, I am trying to use mini-batch SGD to find the hyperparameters of the model´s kernel and mean (for this I randomly sample my training data, set the training data of the model, calculate the marginal log-likelihood and step the optimizer).
Before moving the model into eval()
, I load the whole data to the model by using set_train_data()
. However, whenever I perform predictions, I observe that the mean of the model does not match the training examples. Moreover, I would expect the width of confidence interval for the function to decrease when approaching the training points.
My guess is that this is related to how MultitaskGaussianLikelihood
works. Is this the case? Since I am trying to use SGD, I would like to make the accuracy of the predictions of the training data in the conditioned model independent of the results achieved during training. Is this possible?
Issue Analytics
- State:
- Created 2 years ago
- Comments:6
Top GitHub Comments
Yes, so what’s likely happening is that there’s a local “optimum” of the parameters which has [short lengthscale, large noise] for the five data point case (an “underfit” GP). Hence why you’re severely under-fitting. In this case, conditioning on the observed data (your posterior) will do not that much and the GP’s posterior will tend to be like the prior.
Here, I profiled out the 5 data regime to visualize, note the large trough of low MLL for a noise around 0.5:
I also truncated values at 3 b/c they skyrocket past there.
To prevent this from occuring, beyond what I was suggesting previously, you could also place a prior / constraint on the likelihood (which changes the optimization landscape) to prevent it from getting too large before optimizing.
Thank you so much for the explanation, it really helped me understand how things work under the hood. I have just run some tests with priors and constraints on the noise parameter and I am now able to get the results I was expecting.
Additionally, thank you for the link to the paper about SGD and the recommendation on the KroneckerMultiTaskGP (I might come back with additional questions about this later!).