question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

interpretation of scale parameter as variance?

See original GitHub issue

I apologize if this is a dumb question but I wanted to understand whether the scale parameter can parameter can be interpreted as a decomposition of variance components. I modified your multi-gpu model which contains the protein dataset so that the model is a sum of a rbf and linear kernel as follows:

class ExactGPModel(gpytorch.models.ExactGP):
    def __init__(self, train_x, train_y, likelihood, n_devices):
        super(ExactGPModel, self).__init__(train_x, train_y, likelihood)
        self.mean_module = gpytorch.means.ConstantMean()
        self.covar_module_1 = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel())
        self.covar_module_2 = gpytorch.kernels.ScaleKernel(gpytorch.kernels.LinearKernel())
        base_covar_module = self.covar_module_1 + self.covar_module_2

        self.covar_module = gpytorch.kernels.MultiDeviceKernel(
            base_covar_module, device_ids=range(n_devices),
            output_device=output_device
        )

    def forward(self, x):
        mean_x = self.mean_module(x)
        covar_x = self.covar_module(x)
        return gpytorch.distributions.MultivariateNormal(mean_x, covar_x)

When I estimate the scale normalized scale parameters I get the following:

print('noise: %.3f \n '
      'rbf kernel scale: %.3f \n   '
      'rbf kernel length parameter: %.3f \n  '
      'linear kernel scale: %.3f \n  '
      'linear kernel variance: %.3f'         %
      (model.likelihood.noise.item(),
       model.covar_module_1.outputscale.item(),
       model.covar_module_1.base_kernel.lengthscale.item(),
       model.covar_module_2.outputscale.item(),
       model.covar_module_2.base_kernel.variance.item()
                                          ))
noise: 0.077 
 rbf kernel scale: 0.818 
   rbf kernel length parameter: 0.299 
  linear kernel scale: 0.693 
  linear kernel variance: 0.693

Since y has been standardized am I to interpret noise^2, rbf kernel scale^2, and linear kernel variance^2 to be the decomposition of the variance of y into their components based on their kernel components?

The square of the terms 0.077^2 + 0.818^2 + 0.693^3 = 1.00786 which is close to 1 but not exactly 1.

The reason I ask is because variance decomposition models are used a great deal in statistical genetics and am trying to understand if these GPyTorch estimates are giving me the same thing. This page may be useful for some background in variance decomposition

https://limix.readthedocs.io/en/latest/vardec.html

Issue Analytics

  • State:open
  • Created 4 years ago
  • Reactions:1
  • Comments:10 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
jacobrgardnercommented, Oct 1, 2019

Kind of. The scale kernel “signal variance” relative to the likelihood “noise variance” can kind of be viewed as a signal to noise ratio: the higher the signal variance is relative to the noise, the more the GP is trying to actually “fit” variations in the data rather than explain them as noise. E.g., if noise >> signal, then you’ll typically end up with a very flat mean function and constant variances, but if signal >> noise, then that means the GP is actually trying to fit the variances.

When you have multiple kernels involved, things get a little messier, but you can still think of it generally in terms of “total signal variance” compared to noise, and kernel components with higher signal variances are contributing more to the model fit.

I believe there are some theoretical discussions of these properties in the literature if you wanted them to be more formal. Does that make sense?

0reactions
cmlakhancommented, Oct 18, 2019

I thought I did! I am using the following training code, based on your examples.

def train(train_x, train_y, n_devices, output_device, checkpoint_size, preconditioner_size, n_training_iter,):

    likelihood = gpytorch.likelihoods.GaussianLikelihood().to(output_device)
    model = ExactGPModel(train_x, train_y, likelihood, n_devices).to(output_device)
    model.double()
    model.train()
    likelihood.train()

    optimizer = FullBatchLBFGS(model.parameters(), lr=0.1)
    # "Loss" for GPs - the marginal log likelihood
    mll = gpytorch.mlls.ExactMarginalLogLikelihood(likelihood, model)


    with gpytorch.beta_features.checkpoint_kernel(checkpoint_size), gpytorch.settings.max_preconditioner_size(preconditioner_size):

        def closure():
            optimizer.zero_grad()
            output = model(train_x)
            loss = -mll(output, train_y)
            return loss

        loss = closure()
        loss.backward()

        for i in range(n_training_iter):
            options = {'closure': closure, 'current_loss': loss, 'max_ls': 20}
            loss, _, _, _, _, _, _, fail = optimizer.step(options)

            print('Iter %d/%d - Loss: %.10f' % (
                i + 1, n_training_iter, loss.item()
            ))

            if fail:
                print('Convergence reached!')
                break

    print(f"Finished training on {train_x.size(0)} data points using {n_devices} GPUs.")
    return model, likelihood

then I run the following


model, likelihood = train(train_x, train_y,
                          n_devices=n_devices,
                          output_device=output_device,
                          checkpoint_size=checkpoint_size,
                          preconditioner_size=preconditioner_size,
                          n_training_iter=50)


Why don’t I try it again just to make sure, don’t want you to waste time unnecessarily. I will have to work on it next week but will let you know once I try it out.

Read more comments on GitHub >

github_iconTop Results From Across the Web

What is the difference between "scale parameter" and the ...
“Variance“ has a definite meaning. Variance always means the second central moment, and when we estimate or test the variance, ...
Read more >
Scale parameter - Wikipedia
In probability theory and statistics, a scale parameter is a special kind of numerical parameter of a parametric family of probability distributions.
Read more >
Scale Parameter in Statistics
A scale parameter stretches or squeezes a graph. They are used with location parameters to determine the shape and location of a distribution....
Read more >
1.3.6.4. Location and Scale Parameters
Location and scale parameters are typically used in modeling applications. For example, the following graph is the probability density function for the standard ......
Read more >
1 Definition of a Scale Parameter
Pivotal quantities are used to the construction of test statistics, e.g., Student's t−statistic is pivotal for a normal distribution with unknown variance (and ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found