[Question] Odd Behavior for simple GP/BO
See original GitHub issueI’m trying to fit a Guassian Process to a simple polynomial without noise using Ax’s get_botorch function, but I’m seeing some unexpected behavior. In certain cases, the GP fails to fit to the points, even though I am using an ExactGP with a noiseless Expected Improvement acquisition function. I’ve modified the Using a custom botorch model with Ax tutorial to evaluate the function y = x1**2 instead of the Branin function, and to use the (analytic) ExpectedImprovement acquisition function instead of the qNoisyExpectedImprovement acquisition function, since I thought that might be what’s causing the issue. I’ve attached plots of several past outcomes (generated by calling render(plot_slice(...))
, and I’d be happy to post my code if necessary.
Issue Analytics
- State:
- Created 4 years ago
- Comments:5 (3 by maintainers)
Top GitHub Comments
Great, thanks for your help!
Thanks.
So I ran this a few times in a notebook and was able to reproduce this. If there are only 2-3 data points, the behavior that the model produces a constant fit with high variance is to be expected – there is just not enough information in 3 data points for a non-parametric model with a Matern Kernel to make much sense of it.
The fact that in your example this occurs also for > 3 data points is most likely due to the fact that you do not put a prior on the lengthscales of your Kernel. As a result, a priori any lengthscale is equally likely to the model. However, Ax by default normalizes the inputs to the unit cube
[0, 1]^d
, so lengthscales > 1 are pretty meaningless.That’s why the
SingleTaskGP
BoTorch model Ax uses by default puts a prior on the lengthscales with very little probability mass greater than 1. If you do the same thing, that is, add alengthscale_prior=GammaPrior(3.0, 6.0)
arg to yourMaternKernel
, I don’t see this behavior anymore for n > 3 data points in your example.More generally speaking, if you have prior knowledge that your function has a particular structure (e.g. is a polynomial of some degree, monotonic, periodic, …), using a different, more parametric Kernel (e.g. a polynomial Kernel) would be the way to go, and would result in more interpretable fits (assuming that the function indeed satisfies the structural assumptions).
I hope this helps.