Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Hyperparameter optimization fitting weird values on very small data (5-10 samples)

See original GitHub issue

I’ve run into a somewhat weird case where optimizing the kernel parameters leads to very high values of variance with both RBF and Matern52 kernel. Regardless if I normalize the inputs, I get about the same likelihood (NLL) and very similar kernel parameters for both kernels, and adding more data points doesn’t change this.

Here’s a smaller test case with just five data points

X = np.array([
    [0.032, 0.745],
    [0.428, 0.916],
    [0.427, 0.574],
    [0.938, 0.425],
    [0.999, 0.999],
], dtype=np.float32)

Y = np.array([
    [1.5244],
    [2.2617],
    [1.577],
    [1.79],
    [3.0],
], dtype=np.float32)

m = GPy.models.GPRegression(X, Y, normalizer=True)
m.optimize()
print(m)

m = GPy.models.GPRegression(X, Y)
m.optimize()
print(m)

Name : GP regression
Objective : 1.2008143563132885
Number of Parameters : 3
Number of Optimization Parameters : 3
Updates : True
Parameters:
  GP_regression.           |                   value  |  constraints  |  priors
  rbf.variance             |      151063.25336423848  |      +ve      |        
  rbf.lengthscale          |      191.19406798472033  |      +ve      |        
  Gaussian_noise.variance  |  5.562684646268137e-309  |      +ve      |        

Name : GP regression
Objective : -1.776857320119305
Number of Parameters : 3
Number of Optimization Parameters : 3
Updates : True
Parameters:
  GP_regression.           |                   value  |  constraints  |  priors
  rbf.variance             |       45818.36132621084  |      +ve      |        
  rbf.lengthscale          |       191.2225103915742  |      +ve      |        
  Gaussian_noise.variance  |  5.562684646268137e-309  |      +ve      |

Changing the last X item from `0.999` to `0.999999` increases the variance by an order of magnitude

Notice how the rbf.variance is on the order of tens to hundreds of thousands. Very minor changes in the data cause this to change by large amounts. for example, here’s a changed version of the first input where the variance suddenly is 600k

X = np.array([
    [0.032, 0.745],
    [0.428, 0.916],
    [0.427, 0.574],
    [0.938, 0.425],
    [0.999999, 0.999],
], dtype=np.float32)

Y = np.array([
    [1.5244],
    [2.2617],
    [1.577],
    [1.79],
    [3.0],
], dtype=np.float32)

m = GPy.models.GPRegression(X, Y, normalizer=True)
m.optimize()
print(m.param_array)
# [6.34150113e+005 3.92048931e+002 5.56268465e-309]

Small change drops variance from 63000 to 0.65

I created these 5 points by trying to remove as many as I could from the following dataset while keeping the variance extremely high. Here’s another modified example, where the second and third values in X are changed from 0.428 and 0.427 to 0.40 and 0.44 and the suddenly the resulting variance is just 0.65, instead of the previous 63415.

X = np.array([
    [0.032, 0.745],
    [0.40, 0.916],
    [0.44, 0.574],
    [0.938, 0.425],
    [0.999999, 0.999],
], dtype=np.float32)

Y = np.array([
    [1.5244],
    [2.2617],
    [1.577],
    [1.79],
    [3.0],
], dtype=np.float32)

m = GPy.models.GPRegression(X, Y, normalizer=True)
m.optimize()
print(m)
# [0.65296766 0.06262393 0.34703868]

Same problem with 10 data points

Here is yet another example, but with a 10 points instead of 5. I actually started with this dataset and tried to simplify and remove points as much as I could while keeping the weird behavior. In this case, the variance goes up to 3.7million.

X = np.array([
    [0.9098032 , 0.10719252],
    [0.07689123, 0.        ],
    [0.92378354, 0.29961774],
    [0.93852574, 0.42585665],
    [1.        , 1.        ],
    [0.03271898, 0.7458292 ],
    [0.4280041 , 0.9168633 ],
    [0.42700773, 0.57499546],
    [0.4229433 , 0.9811548 ],
    [0.90882766, 0.49437103]
], dtype=np.float32)

Y = np.array([
    [1.12418827],
    [0.07689123],
    [1.52301903],
    [1.79023902],
    [3.        ],
    [1.52437748],
    [2.26173076],
    [1.57699869],
    [2.38525289],
    [1.89756974]
], dtype=np.float32)

m = GPy.models.GPRegression(X, Y, normalizer=True)
m.optimize()
print(m)
# [3.78870122e+006 1.75181624e+003 5.56268465e-309]

m = GPy.models.GPRegression(X, Y)
m.optimize()
print(m)
# [1.92090099e+006 1.62450526e+003 5.56268465e-309]

Overflows, underflows, restarts and invalid values

As I’ve been trying to figure this out for a few hours, I’m also sometimes getting numerical errors, specifically stuff like this

 /home/darth/projects/bopt/.venv/lib/python3.6/site-packages/GPy/kern/src/stationary.py:168: RuntimeWarning:overflow encountered in true_divide
 /home/darth/projects/bopt/.venv/lib/python3.6/site-packages/GPy/kern/src/rbf.py:51: RuntimeWarning:overflow encountered in square
 /home/darth/projects/bopt/.venv/lib/python3.6/site-packages/GPy/kern/src/rbf.py:54: RuntimeWarning:invalid value encountered in multiply

I don’t have all the errors, but I’ve seen both overflows and underflows occur sometimes. Usually running optimize_restarts() with a larger number of restarts causes a few of them to fail, though the optimizer still converges to roughly the same values (even 1e7). Note that this happens even if I use np.float64.

How do I diagnose this? What is actually going on?

Prior to running into this issue I’ve been actually working on my own implementation of GPs, and tried to use GPy as a solution to my own numerical issues, but it seems that either I’m doing something fundamentally wrong, or GPs are just a lot more fragile?

I’m not sure if there is any more stable way to optimize the kernel. What I’m ultimately after is bayesian optimization, so I don’t really need to fit the kernel perfectly in case the data is a bit weird (which I don’t see it being in this case), but I’d like to avoid pathological cases where the optimization blows up.

Is there some way to make the optimization more robust, or “safer”, in the sense of putting a stronger prior towards smoother GPs or noisy inputs? Or is there something that I’m just fundamentally missing?

edit: I realized something after waking up today. The data in question is actually linear (sampled from a nearly noiseless linear function), which I’m not sure if is the cause of the numerical instability, but definitely worth mentioning.

One thought I had is to constrain the Gaussian_noise to a higher value than 0, but I’m not sure if that is a good solution, or if there is something better to try.

Issue Analytics

State:
Created 5 years ago
Comments:7 (2 by maintainers)

Top GitHub Comments

4reactions

lionfish0commented, Feb 22, 2019

I’m guessing that the problem is because several hyperparameter configurations lead to very similar marginal likelihoods. What happens is the ML estimation finds different solutions when you run it. It’s not the GP that’s fragile but the use of a single (non-Bayesian/probabilistic) setting for these hyperparameters. One can imagine either the marginal likelihood landscape is very flat (so it doesn’t care where it optimises) or very hilly with lots of local maxima/minima*.

In order of correctness: (1) Something I’ve meant to do for years was get the CCD stuff sorted out. I think clearly there’s enough people who need it that I should probably get that done soon! To summarise: This effectively finds the posterior given different hyperparameter configurations and then weighted-averages these (weighted by the associated marginal likelihoods - the upshot is that uncertainty over the variance in this case would appear in the uncertainty in the posterior (over models). This is the right thing to do. Sorry I’ve not done it yet!

(2) A quick solution if you’ve a prior belief on what the hyperparameters should be is to use a prior over the hyperparameters, that will shift where the ML solution ends up. I can’t remember exactly how to do this, something like m.kern.variance.prior = GPy.priors.Gamma(1,0.1)

(3) You can put severely strict “priors” that restrict the range a parameter can take, e.g. k.lengthscale.constrain_bounded(1,10).

(4) It might also be worth looking at optimize_restarts which tries several times from different starting points, to ensure the ML solution isn’t a poor local minimum.

(5) Or just use optimize but set your hyperparameters before hand to be near where you think the right answer is.

Hope some of these help. Sorry again (1) isn’t implemented yet. Mike.

*I think it optimises the negative-log-likelihood so technically it’s minima we’re looking for, I think…bit confusing.

0reactions

flydream0428commented, May 28, 2021

@darthdeus Hi Jakub, could you share how you made the likelihood surface (image https://i.imgur.com/io1JMHy.png)? I googled around to find the function for this without a success. Thanks.

Top Results From Across the Web

Common Problems in Hyperparameter Optimization - SigOpt

Common Problems in Hyperparameter Optimization · Introduction · #1 Trusting the Defaults · #2 Using the Wrong Metric · #3 Overfitting · #4...

Is hyperparameter tuning on sample of dataset a bad idea?

In many cases hyperparameter tuning on the whole dataset using either Grid or Random search is too costly time-wise. To evaluate each set...

Hyperparameter Optimization With Random Search and Grid ...

Define a search space as a bounded domain of hyperparameter values and randomly sample points in that domain. Grid Search.

Effect of the Sampling of a Dataset in the Hyperparameter ...

The Hyperparameters Optimization (HPO) problem requires a deep understanding of the ML model at hand due to the hyperparameters values settings and their ......

Different types of Hyper-Parameter Tuning. - Medium

Hyperparameters are needed to be set before fitting the data in order ... of the optimal values of hyperparameters for any machine learning ......