White Noise kernel addition with grid interpolation
See original GitHub issueHi all,
I’m unable to run a non-exact GP model with white noise kernel addition. Specifically, I tried the Kronecker classification and regression examples, along with the additive classification example, substituting in: self.base_covar_module = RBFKernel(log_lengthscale_bounds=(-5, 6)) + WhiteNoiseKernel(input_variance)
Where input variance is: input_variance = torch.squeeze(torch.from_numpy(numpy.random.rand(len(train_y),1) / 100.))
In the Kronecker example I receive an error related to dimension size:
The expanded size of the tensor (1) must match the existing size (900) at non-singleton dimension 1
And with the additive classification example I receive an error of:
‘SumLazyVariable’ object has no attribute ‘repeat’
Any thoughts on implementing kernel addition at the grid inducing points?
Issue Analytics
- State:
- Created 5 years ago
- Comments:5 (3 by maintainers)
Top GitHub Comments
Hmmmm.
This is a little trickier actually. In classification, you don’t define the grid interpolation kernel, the
GridInducingVariationalGP
handles it for you (because it needs to learn things related to the inducing points). See our kissgp classification example, and note that we just call the RBF kernel. The problem being, of course, that we can’t just add the WhiteNoiseKernel to the RBF kernel, because the WhiteNoiseKernel is ill-defined for the inducing points.I think it’s technically easy enough to solve, we basically want to add the
DiagLazyVariable
thatWhiteNoiseKernel
returns to the test covariance here: https://github.com/cornellius-gp/gpytorch/blob/1578f80b18056b0d1cc6d0386048f1fd83499c49/gpytorch/models/grid_inducing_variational_gp.py#L88-L90Something like:
This could be accomplished by extending
GridInducingVariationalGP
to aGridInducingPlusWhiteNoiseVariationalGP
or something. However, this is obviously a little unsatisfactory from a usability stand point. Maybe @gpleiss and I can think about whether we can better support kernels during variational inference that need to operate on the data kernel but not on the inducing point kernel.This is now possible to do for variational inference by way of #335. The WhiteNoiseKernel still won’t apply to the scalable methods like it does in the exact GP case, but all that’s necessary is a new variational strategy that adds the white noise at the end of the
forward
method of whatever base variational strategy is being used. Something like: