Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Question] Implementing Sparse GPs for Batch Independent MultiOutputs

See original GitHub issue

Question

Not a bug, sorry for the incorrect label, maybe docs/examples would have been better.

I am trying to implement a Sparse GP that takes in a multidimensional input and has an independent multidimensional output. For the regular Exact GP case, I understand adding batch_shape=torch.Size([dim_output]) to the means and kernels methods (along with num_tasks=dim_output to the likelihood) does the trick. However for SparseGP with an InducingPointKernel when I add batch_shape=torch.Size([dim_output]) I get the following error:

Traceback (most recent call last):
  File "multioutput-sgp-example.py", line 57, in <module>
    model = BatchIndependentMultitaskSGPModel(train_x, train_y, likelihood)
  File "multioutput-sgp-example.py", line 41, in __init__
    self.covar_module = gpytorch.kernels.InducingPointKernel(
TypeError: __init__() got an unexpected keyword argument 'batch_shape'

Not adding batch_shape to the InducingPointKernel I get the following:

Traceback (most recent call last):
  File "multioutput-sgp-example.py", line 82, in <module>
    loss = -mll(output, train_y)
  File "/usr/lib/python3.8/site-packages/gpytorch/module.py", line 24, in __call__
    outputs = self.forward(*inputs, **kwargs)
  File "/usr/lib/python3.8/site-packages/gpytorch/mlls/exact_marginal_log_likelihood.py", line 55, in forward
    res = res.add(added_loss_term.loss(*params))
  File "/usr/lib/python3.8/site-packages/gpytorch/mlls/inducing_point_kernel_added_loss_term.py", line 18, in loss
    return 0.5 * (diag / noise_diag).sum()
RuntimeError: The size of tensor a (100) must match the size of tensor b (36) at non-singleton dimension 1

100 is my sample size and 6 is my output dimension.

To reproduce

** Code snippet to reproduce **

import math
import torch
import gpytorch

train_x = torch.randn(100, 9)

train_y = torch.stack([
    torch.sin(train_x[:,0] * (2 * math.pi)) + torch.randn(100) * 0.2,
    torch.cos(train_x[:,1] * (2 * math.pi)) + torch.randn(100) * 0.2,
    torch.sin(train_x[:,2] * (2 * math.pi)) + torch.randn(100) * 0.2,
    torch.cos(train_x[:,3] * (2 * math.pi)) + torch.randn(100) * 0.2,
    torch.sin(train_x[:,4] * (2 * math.pi)) + torch.randn(100) * 0.2,
    torch.cos(train_x[:,5] * (2 * math.pi)) + torch.randn(100) * 0.2,
], -1)

#test_x = torch.linspace(0, 1, 51)
test_x = torch.randn(51, 6)

# Load data onto GPU
if torch.cuda.is_available():
    train_x = train_x.cuda()
    train_y = train_y.cuda()
    test_x = test_x.cuda()

class BatchIndependentMultitaskSGPModel(gpytorch.models.ExactGP):
    def __init__(self, train_x, train_y, likelihood):
        super().__init__(train_x, train_y, likelihood)

        # INPUT: num_inducing_points
        num_inducing_points = 30
        
        self.mean_module = gpytorch.means.ConstantMean(batch_shape=torch.Size([6]))
        self.base_covar_module = gpytorch.kernels.ScaleKernel(
            gpytorch.kernels.RBFKernel(batch_shape=torch.Size([6])),
            batch_shape=torch.Size([6])
        )

        self.covar_module = gpytorch.kernels.InducingPointKernel(
            self.base_covar_module,
            inducing_points=train_x[:num_inducing_points],
            #batch_shape=torch.Size([6]),    
            likelihood=likelihood
        )

    def forward(self, x):
        mean_x = self.mean_module(x)
        covar_x = self.covar_module(x)
        return gpytorch.distributions.MultitaskMultivariateNormal.from_batch_mvn(
            gpytorch.distributions.MultivariateNormal(mean_x, covar_x)
        )


likelihood = gpytorch.likelihoods.MultitaskGaussianLikelihood(num_tasks=6)
model = BatchIndependentMultitaskSGPModel(train_x, train_y, likelihood)

# Load model onto GPU
if torch.cuda.is_available():
    model = model.cuda()
    likelihood = likelihood.cuda()
    
# Find optimal model hyperparameters
model.train()
likelihood.train()

# Use the adam optimizer
optimizer = torch.optim.Adam([
    {'params': model.parameters()},  # Includes GaussianLikelihood parameters
], lr=0.1)

# "Loss" for GPs - the marginal log likelihood
mll = gpytorch.mlls.ExactMarginalLogLikelihood(likelihood, model)

training_iterations = 50

for i in range(training_iterations):
    optimizer.zero_grad()
    output = model(train_x)
    loss = -mll(output, train_y)
    loss.backward()
    print('Iter %d/%d - Loss: %.3f' % (i + 1, training_iterations, loss.item()))
    optimizer.step()

** Stack trace/error message ** See above error messages

Expected Behavior

Ideally I should be able to use Sparse GPs in a multi input/ multi output fashion. When adding an InducingPointKernel to existing regular multi input/ multi output Exact GP, (thereby turning it into a SGP model) the model should train and predict properly.

System information

Please complete the following information:

GPyTorch Version: 1.0.1`
PyTorch Version: 1.4.0`
ArchLinux

Issue Analytics

State:
Created 4 years ago
Comments:5 (4 by maintainers)

Top GitHub Comments

1reaction

jacobrgardnercommented, Feb 15, 2020

Yeah I’ll take a look soon. Sorry about that, February has been really crazy for us with both ICML recently and then UAI next week, so we’re really building a backlog of things to look at over the past month…

0reactions

monabfcommented, Oct 9, 2020

@jacobrgardner is this now fixed to work with the Batch Independent Multitask GP model, or can we still only use sparse models with batch mode GP?

Top Results From Across the Web

Batch Independent Multioutput GP - GPyTorch's documentation

This notebook demonstrates how to wrap independent GP models into a convenient Multi-Output GP model. It uses batch dimensions for efficient computation.

Multi-Output - GP Model Zoo

There are some questions: Are there correlations between the outputs or independent? Is there missing data? Do we actually need to model all...

Incremental Variational Sparse Gaussian Process Regression

More recently, several attempts have been made to modify variational batch algorithms to incremental algorithms for learning sparse GPs [1, 9, 10].

Sparse Information Filter for Fast Gaussian Process Regression

describe a training algorithm, LIA → PP, that employs the IF formulation on independent mini-batches as a warm-up phase and recovers the previously...

Dual Parameterization of Sparse Variational Gaussian ...

Our dual parameterization speeds-up inference using natural gradient descent, and provides a tighter evidence lower bound for hyperparameter learning. The ...