Hyperparameter optimization fitting weird values on very small data (5-10 samples)
See original GitHub issueI’ve run into a somewhat weird case where optimizing the kernel parameters leads to very high values of variance with both RBF and Matern52 kernel. Regardless if I normalize the inputs, I get about the same likelihood (NLL) and very similar kernel parameters for both kernels, and adding more data points doesn’t change this.
Here’s a smaller test case with just five data points
X = np.array([
[0.032, 0.745],
[0.428, 0.916],
[0.427, 0.574],
[0.938, 0.425],
[0.999, 0.999],
], dtype=np.float32)
Y = np.array([
[1.5244],
[2.2617],
[1.577],
[1.79],
[3.0],
], dtype=np.float32)
m = GPy.models.GPRegression(X, Y, normalizer=True)
m.optimize()
print(m)
m = GPy.models.GPRegression(X, Y)
m.optimize()
print(m)
Name : GP regression
Objective : 1.2008143563132885
Number of Parameters : 3
Number of Optimization Parameters : 3
Updates : True
Parameters:
GP_regression. | value | constraints | priors
rbf.variance | 151063.25336423848 | +ve |
rbf.lengthscale | 191.19406798472033 | +ve |
Gaussian_noise.variance | 5.562684646268137e-309 | +ve |
Name : GP regression
Objective : -1.776857320119305
Number of Parameters : 3
Number of Optimization Parameters : 3
Updates : True
Parameters:
GP_regression. | value | constraints | priors
rbf.variance | 45818.36132621084 | +ve |
rbf.lengthscale | 191.2225103915742 | +ve |
Gaussian_noise.variance | 5.562684646268137e-309 | +ve |
Changing the last X item from 0.999
to 0.999999
increases the variance by an order of magnitude
Notice how the rbf.variance is on the order of tens to hundreds of thousands. Very minor changes in the data cause this to change by large amounts. for example, here’s a changed version of the first input where the variance suddenly is 600k
X = np.array([
[0.032, 0.745],
[0.428, 0.916],
[0.427, 0.574],
[0.938, 0.425],
[0.999999, 0.999],
], dtype=np.float32)
Y = np.array([
[1.5244],
[2.2617],
[1.577],
[1.79],
[3.0],
], dtype=np.float32)
m = GPy.models.GPRegression(X, Y, normalizer=True)
m.optimize()
print(m.param_array)
# [6.34150113e+005 3.92048931e+002 5.56268465e-309]
Small change drops variance from 63000 to 0.65
I created these 5 points by trying to remove as many as I could from the following dataset while keeping the variance extremely high. Here’s another modified example, where the second and third values in X are changed from 0.428
and 0.427
to 0.40
and 0.44
and the suddenly the resulting variance is just 0.65
, instead of the previous 63415
.
X = np.array([
[0.032, 0.745],
[0.40, 0.916],
[0.44, 0.574],
[0.938, 0.425],
[0.999999, 0.999],
], dtype=np.float32)
Y = np.array([
[1.5244],
[2.2617],
[1.577],
[1.79],
[3.0],
], dtype=np.float32)
m = GPy.models.GPRegression(X, Y, normalizer=True)
m.optimize()
print(m)
# [0.65296766 0.06262393 0.34703868]
Same problem with 10 data points
Here is yet another example, but with a 10 points instead of 5. I actually started with this dataset and tried to simplify and remove points as much as I could while keeping the weird behavior. In this case, the variance goes up to 3.7million.
X = np.array([
[0.9098032 , 0.10719252],
[0.07689123, 0. ],
[0.92378354, 0.29961774],
[0.93852574, 0.42585665],
[1. , 1. ],
[0.03271898, 0.7458292 ],
[0.4280041 , 0.9168633 ],
[0.42700773, 0.57499546],
[0.4229433 , 0.9811548 ],
[0.90882766, 0.49437103]
], dtype=np.float32)
Y = np.array([
[1.12418827],
[0.07689123],
[1.52301903],
[1.79023902],
[3. ],
[1.52437748],
[2.26173076],
[1.57699869],
[2.38525289],
[1.89756974]
], dtype=np.float32)
m = GPy.models.GPRegression(X, Y, normalizer=True)
m.optimize()
print(m)
# [3.78870122e+006 1.75181624e+003 5.56268465e-309]
m = GPy.models.GPRegression(X, Y)
m.optimize()
print(m)
# [1.92090099e+006 1.62450526e+003 5.56268465e-309]
Overflows, underflows, restarts and invalid values
As I’ve been trying to figure this out for a few hours, I’m also sometimes getting numerical errors, specifically stuff like this
/home/darth/projects/bopt/.venv/lib/python3.6/site-packages/GPy/kern/src/stationary.py:168: RuntimeWarning:overflow encountered in true_divide
/home/darth/projects/bopt/.venv/lib/python3.6/site-packages/GPy/kern/src/rbf.py:51: RuntimeWarning:overflow encountered in square
/home/darth/projects/bopt/.venv/lib/python3.6/site-packages/GPy/kern/src/rbf.py:54: RuntimeWarning:invalid value encountered in multiply
I don’t have all the errors, but I’ve seen both overflows and underflows occur sometimes. Usually running optimize_restarts()
with a larger number of restarts causes a few of them to fail, though the optimizer still converges to roughly the same values (even 1e7). Note that this happens even if I use np.float64
.
How do I diagnose this? What is actually going on?
Prior to running into this issue I’ve been actually working on my own implementation of GPs, and tried to use GPy as a solution to my own numerical issues, but it seems that either I’m doing something fundamentally wrong, or GPs are just a lot more fragile?
I’m not sure if there is any more stable way to optimize the kernel. What I’m ultimately after is bayesian optimization, so I don’t really need to fit the kernel perfectly in case the data is a bit weird (which I don’t see it being in this case), but I’d like to avoid pathological cases where the optimization blows up.
Is there some way to make the optimization more robust, or “safer”, in the sense of putting a stronger prior towards smoother GPs or noisy inputs? Or is there something that I’m just fundamentally missing?
edit: I realized something after waking up today. The data in question is actually linear (sampled from a nearly noiseless linear function), which I’m not sure if is the cause of the numerical instability, but definitely worth mentioning.
One thought I had is to constrain the Gaussian_noise
to a higher value than 0
, but I’m not sure if that is a good solution, or if there is something better to try.
Issue Analytics
- State:
- Created 5 years ago
- Comments:7 (2 by maintainers)
Top GitHub Comments
I’m guessing that the problem is because several hyperparameter configurations lead to very similar marginal likelihoods. What happens is the ML estimation finds different solutions when you run it. It’s not the GP that’s fragile but the use of a single (non-Bayesian/probabilistic) setting for these hyperparameters. One can imagine either the marginal likelihood landscape is very flat (so it doesn’t care where it optimises) or very hilly with lots of local maxima/minima*.
In order of correctness: (1) Something I’ve meant to do for years was get the CCD stuff sorted out. I think clearly there’s enough people who need it that I should probably get that done soon! To summarise: This effectively finds the posterior given different hyperparameter configurations and then weighted-averages these (weighted by the associated marginal likelihoods - the upshot is that uncertainty over the variance in this case would appear in the uncertainty in the posterior (over models). This is the right thing to do. Sorry I’ve not done it yet!
(2) A quick solution if you’ve a prior belief on what the hyperparameters should be is to use a prior over the hyperparameters, that will shift where the ML solution ends up. I can’t remember exactly how to do this, something like m.kern.variance.prior = GPy.priors.Gamma(1,0.1)
(3) You can put severely strict “priors” that restrict the range a parameter can take, e.g. k.lengthscale.constrain_bounded(1,10).
(4) It might also be worth looking at optimize_restarts which tries several times from different starting points, to ensure the ML solution isn’t a poor local minimum.
(5) Or just use optimize but set your hyperparameters before hand to be near where you think the right answer is.
Hope some of these help. Sorry again (1) isn’t implemented yet. Mike.
*I think it optimises the negative-log-likelihood so technically it’s minima we’re looking for, I think…bit confusing.
@darthdeus Hi Jakub, could you share how you made the likelihood surface (image https://i.imgur.com/io1JMHy.png)? I googled around to find the function for this without a success. Thanks.