Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Question] Odd Behavior for simple GP/BO

See original GitHub issue

I’m trying to fit a Guassian Process to a simple polynomial without noise using Ax’s get_botorch function, but I’m seeing some unexpected behavior. In certain cases, the GP fails to fit to the points, even though I am using an ExactGP with a noiseless Expected Improvement acquisition function. I’ve modified the Using a custom botorch model with Ax tutorial to evaluate the function y = x1**2 instead of the Branin function, and to use the (analytic) ExpectedImprovement acquisition function instead of the qNoisyExpectedImprovement acquisition function, since I thought that might be what’s causing the issue. I’ve attached plots of several past outcomes (generated by calling render(plot_slice(...)), and I’d be happy to post my code if necessary. newplot newplot(1)

Issue Analytics

State:
Created 4 years ago
Comments:5 (3 by maintainers)

Top GitHub Comments

1reaction

keeganharriscommented, Nov 6, 2019

Great, thanks for your help!

1reaction

Balandatcommented, Nov 6, 2019

Thanks.

So I ran this a few times in a notebook and was able to reproduce this. If there are only 2-3 data points, the behavior that the model produces a constant fit with high variance is to be expected – there is just not enough information in 3 data points for a non-parametric model with a Matern Kernel to make much sense of it.

The fact that in your example this occurs also for > 3 data points is most likely due to the fact that you do not put a prior on the lengthscales of your Kernel. As a result, a priori any lengthscale is equally likely to the model. However, Ax by default normalizes the inputs to the unit cube [0, 1]^d, so lengthscales > 1 are pretty meaningless.

That’s why the SingleTaskGP BoTorch model Ax uses by default puts a prior on the lengthscales with very little probability mass greater than 1. If you do the same thing, that is, add a lengthscale_prior=GammaPrior(3.0, 6.0) arg to your MaternKernel, I don’t see this behavior anymore for n > 3 data points in your example.

newplot

More generally speaking, if you have prior knowledge that your function has a particular structure (e.g. is a polynomial of some degree, monotonic, periodic, …), using a different, more parametric Kernel (e.g. a polynomial Kernel) would be the way to go, and would result in more interpretable fits (assuming that the function indeed satisfies the structural assumptions).

I hope this helps.