interpretation of scale parameter as variance?
See original GitHub issueI apologize if this is a dumb question but I wanted to understand whether the scale parameter can parameter can be interpreted as a decomposition of variance components. I modified your multi-gpu model which contains the protein dataset so that the model is a sum of a rbf and linear kernel as follows:
class ExactGPModel(gpytorch.models.ExactGP):
def __init__(self, train_x, train_y, likelihood, n_devices):
super(ExactGPModel, self).__init__(train_x, train_y, likelihood)
self.mean_module = gpytorch.means.ConstantMean()
self.covar_module_1 = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel())
self.covar_module_2 = gpytorch.kernels.ScaleKernel(gpytorch.kernels.LinearKernel())
base_covar_module = self.covar_module_1 + self.covar_module_2
self.covar_module = gpytorch.kernels.MultiDeviceKernel(
base_covar_module, device_ids=range(n_devices),
output_device=output_device
)
def forward(self, x):
mean_x = self.mean_module(x)
covar_x = self.covar_module(x)
return gpytorch.distributions.MultivariateNormal(mean_x, covar_x)
When I estimate the scale normalized scale parameters I get the following:
print('noise: %.3f \n '
'rbf kernel scale: %.3f \n '
'rbf kernel length parameter: %.3f \n '
'linear kernel scale: %.3f \n '
'linear kernel variance: %.3f' %
(model.likelihood.noise.item(),
model.covar_module_1.outputscale.item(),
model.covar_module_1.base_kernel.lengthscale.item(),
model.covar_module_2.outputscale.item(),
model.covar_module_2.base_kernel.variance.item()
))
noise: 0.077
rbf kernel scale: 0.818
rbf kernel length parameter: 0.299
linear kernel scale: 0.693
linear kernel variance: 0.693
Since y has been standardized am I to interpret noise^2, rbf kernel scale^2, and linear kernel variance^2 to be the decomposition of the variance of y into their components based on their kernel components?
The square of the terms 0.077^2 + 0.818^2 + 0.693^3 = 1.00786 which is close to 1 but not exactly 1.
The reason I ask is because variance decomposition models are used a great deal in statistical genetics and am trying to understand if these GPyTorch estimates are giving me the same thing. This page may be useful for some background in variance decomposition
Issue Analytics
- State:
- Created 4 years ago
- Reactions:1
- Comments:10 (4 by maintainers)
Top GitHub Comments
Kind of. The scale kernel “signal variance” relative to the likelihood “noise variance” can kind of be viewed as a signal to noise ratio: the higher the signal variance is relative to the noise, the more the GP is trying to actually “fit” variations in the data rather than explain them as noise. E.g., if noise >> signal, then you’ll typically end up with a very flat mean function and constant variances, but if signal >> noise, then that means the GP is actually trying to fit the variances.
When you have multiple kernels involved, things get a little messier, but you can still think of it generally in terms of “total signal variance” compared to noise, and kernel components with higher signal variances are contributing more to the model fit.
I believe there are some theoretical discussions of these properties in the literature if you wanted them to be more formal. Does that make sense?
I thought I did! I am using the following training code, based on your examples.
then I run the following
Why don’t I try it again just to make sure, don’t want you to waste time unnecessarily. I will have to work on it next week but will let you know once I try it out.