Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

interpretation of scale parameter as variance?

See original GitHub issue

I apologize if this is a dumb question but I wanted to understand whether the scale parameter can parameter can be interpreted as a decomposition of variance components. I modified your multi-gpu model which contains the protein dataset so that the model is a sum of a rbf and linear kernel as follows:

class ExactGPModel(gpytorch.models.ExactGP):
    def __init__(self, train_x, train_y, likelihood, n_devices):
        super(ExactGPModel, self).__init__(train_x, train_y, likelihood)
        self.mean_module = gpytorch.means.ConstantMean()
        self.covar_module_1 = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel())
        self.covar_module_2 = gpytorch.kernels.ScaleKernel(gpytorch.kernels.LinearKernel())
        base_covar_module = self.covar_module_1 + self.covar_module_2

        self.covar_module = gpytorch.kernels.MultiDeviceKernel(
            base_covar_module, device_ids=range(n_devices),
            output_device=output_device
        )

    def forward(self, x):
        mean_x = self.mean_module(x)
        covar_x = self.covar_module(x)
        return gpytorch.distributions.MultivariateNormal(mean_x, covar_x)

When I estimate the scale normalized scale parameters I get the following:

print('noise: %.3f \n '
      'rbf kernel scale: %.3f \n   '
      'rbf kernel length parameter: %.3f \n  '
      'linear kernel scale: %.3f \n  '
      'linear kernel variance: %.3f'         %
      (model.likelihood.noise.item(),
       model.covar_module_1.outputscale.item(),
       model.covar_module_1.base_kernel.lengthscale.item(),
       model.covar_module_2.outputscale.item(),
       model.covar_module_2.base_kernel.variance.item()
                                          ))

noise: 0.077 
 rbf kernel scale: 0.818 
   rbf kernel length parameter: 0.299 
  linear kernel scale: 0.693 
  linear kernel variance: 0.693

Since y has been standardized am I to interpret noise^2, rbf kernel scale^2, and linear kernel variance^2 to be the decomposition of the variance of y into their components based on their kernel components?

The square of the terms 0.077^2 + 0.818^2 + 0.693^3 = 1.00786 which is close to 1 but not exactly 1.

The reason I ask is because variance decomposition models are used a great deal in statistical genetics and am trying to understand if these GPyTorch estimates are giving me the same thing. This page may be useful for some background in variance decomposition

https://limix.readthedocs.io/en/latest/vardec.html

Issue Analytics

State:
Created 4 years ago
Reactions:1
Comments:10 (4 by maintainers)

Top GitHub Comments

1reaction

jacobrgardnercommented, Oct 1, 2019

Kind of. The scale kernel “signal variance” relative to the likelihood “noise variance” can kind of be viewed as a signal to noise ratio: the higher the signal variance is relative to the noise, the more the GP is trying to actually “fit” variations in the data rather than explain them as noise. E.g., if noise >> signal, then you’ll typically end up with a very flat mean function and constant variances, but if signal >> noise, then that means the GP is actually trying to fit the variances.

When you have multiple kernels involved, things get a little messier, but you can still think of it generally in terms of “total signal variance” compared to noise, and kernel components with higher signal variances are contributing more to the model fit.

I believe there are some theoretical discussions of these properties in the literature if you wanted them to be more formal. Does that make sense?

0reactions

cmlakhancommented, Oct 18, 2019

I thought I did! I am using the following training code, based on your examples.

def train(train_x, train_y, n_devices, output_device, checkpoint_size, preconditioner_size, n_training_iter,):

    likelihood = gpytorch.likelihoods.GaussianLikelihood().to(output_device)
    model = ExactGPModel(train_x, train_y, likelihood, n_devices).to(output_device)
    model.double()
    model.train()
    likelihood.train()

    optimizer = FullBatchLBFGS(model.parameters(), lr=0.1)
    # "Loss" for GPs - the marginal log likelihood
    mll = gpytorch.mlls.ExactMarginalLogLikelihood(likelihood, model)


    with gpytorch.beta_features.checkpoint_kernel(checkpoint_size), gpytorch.settings.max_preconditioner_size(preconditioner_size):

        def closure():
            optimizer.zero_grad()
            output = model(train_x)
            loss = -mll(output, train_y)
            return loss

        loss = closure()
        loss.backward()

        for i in range(n_training_iter):
            options = {'closure': closure, 'current_loss': loss, 'max_ls': 20}
            loss, _, _, _, _, _, _, fail = optimizer.step(options)

            print('Iter %d/%d - Loss: %.10f' % (
                i + 1, n_training_iter, loss.item()
            ))

            if fail:
                print('Convergence reached!')
                break

    print(f"Finished training on {train_x.size(0)} data points using {n_devices} GPUs.")
    return model, likelihood

then I run the following


model, likelihood = train(train_x, train_y,
                          n_devices=n_devices,
                          output_device=output_device,
                          checkpoint_size=checkpoint_size,
                          preconditioner_size=preconditioner_size,
                          n_training_iter=50)

Why don’t I try it again just to make sure, don’t want you to waste time unnecessarily. I will have to work on it next week but will let you know once I try it out.

Top Results From Across the Web

What is the difference between "scale parameter" and the ...

“Variance“ has a definite meaning. Variance always means the second central moment, and when we estimate or test the variance, ...

Scale parameter - Wikipedia

In probability theory and statistics, a scale parameter is a special kind of numerical parameter of a parametric family of probability distributions.

Scale Parameter in Statistics

A scale parameter stretches or squeezes a graph. They are used with location parameters to determine the shape and location of a distribution....

1.3.6.4. Location and Scale Parameters

Location and scale parameters are typically used in modeling applications. For example, the following graph is the probability density function for the standard ......

1 Definition of a Scale Parameter

Pivotal quantities are used to the construction of test statistics, e.g., Student's t−statistic is pivotal for a normal distribution with unknown variance (and ......