Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Question] Implementing multi-output multi-task approximate GP

See original GitHub issue

I am looking into implementing a model that produces multiple correlated output for multiple tasks (multi-task-multi-output - MTMO). For this type of model, I assume that the input tensor has the shape n x d+1 (d inputs plus the additional task index) while the output tensor has the shape n x o (where o is the number of correlated outputs). Additionally, all outputs are observed simultaneously but not all tasks are. The training data for this mode would look like this:

train_x = torch.cat([torch.rand(n_training, 1), (torch.rand(n_training,1)>0.5).float()], dim=1)
train_y = torch.stack([
    torch.sin(5.5 * train_x[:,0])* train_x[:,0]**2 * train_x[:,-1] + torch.cos(5.5 * train_x[:,0])* train_x[:,0]**2  * (1-train_x[:,-1]),
    torch.cos(5.5 * train_x[:,0])* train_x[:,0]*2 * train_x[:,-1] + torch.sin(5.5 * train_x[:,0])* train_x[:,0]*2 * (1-train_x[:,-1])
], axis = -1)

For the exact GP case, the model looks like this:

class MultiOutputMultiTaskGP(ExactGP,):

    def __init__(
        self,
        train_X: Tensor,
        train_Y: Tensor,
        likelihood: MultitaskGaussianLikelihood = None,
        rank = None,
        lik_rank = None,
    ) -> None:

        num_tasks = train_Y.shape[-1]
        batch_shape, ard_num_dims = train_X.shape[:-2], train_X.shape[-1]
        
        if lik_rank is None:
            lik_rank = rank
                       
        self._validate_tensor_args(X=train_X, Y=train_Y)

        if likelihood is None:
            likelihood = MultitaskGaussianLikelihood(
                num_tasks=num_tasks, 
                rank=lik_rank if lik_rank is not None else 0,
            )

        super(MultiOutputMultiTaskGP, self).__init__(train_X, train_Y, likelihood)
        self._rank = rank if rank is not None else num_tasks

        self.mean_module = MultitaskMean(
            ConstantMean(), num_tasks=num_tasks
        )
        
        self.data_kernel = MaternKernel()
        self.task_kernel = IndexKernel(num_tasks=len(torch.unique(train_X[..., -1])))
        self.output_kernel = IndexKernel(num_tasks=num_tasks)
        
        self.to(train_X)
        
    def forward(self, x: Tensor) -> MultitaskMultivariateNormal:
        mean_x = self.mean_module(x)
        task_term = self.task_kernel(x[..., -1].long())
        data_and_task_x = self.data_kernel(x[..., :-1]).mul(task_term)
        output_x = self.output_kernel.covar_matrix
        covar_x = KroneckerProductLazyTensor(data_and_task_x, output_x)
        return MultitaskMultivariateNormal(mean_x, covar_x)

I want to implement an approximate version of this model by using the LMCVariationalStrategy but I am facing some issues:

I am not sure if inducing points should include the task index. In the case the index has to be included, there does not seem to be a straight forward way to keep the values fixed while also learning the optimal location of the inducing points during training (learn_inducing_locations=True)
Even though I specified batch_shape = torch.Size([p]) of all kernels and means, the forward method of the exact GP does not seem to work. The shape of task_term in the forward method above becomes [p x p] when the input shape is [p x n] (I believe the correct shape should be [n x n]).

Do you know how I should implement this model? Thanks

Issue Analytics

State:
Created 2 years ago
Comments:6 (3 by maintainers)

Top GitHub Comments

3reactions

jacobrgardnercommented, Oct 24, 2021

@fleskovar yes, but a closely following line prevents it from working by default: https://github.com/cornellius-gp/gpytorch/blob/fc2053b0fc00517880fbc11adc7f5802242eec6a/gpytorch/models/exact_prediction_strategies.py#L232

The reason this is done is that otherwise making predictions with the model repeatedly would either need to be done in a torch.no_grad context, or rapidly run out of memory due to accumulating compute graphs.

@gpleiss I don’t think it’s very simple to add a warning here. The problem is that currently you can backprop w.r.t the test inputs just fine with the caches detached, and that’s a much more common operation (e.g., differentiating a bayesopt acquisition function with respect to the candidate). We wouldn’t want to raise the warning every time we call backward for that purpose.

Maybe we raise a warning if (1) the user calls backward, and (2) the last set of test inputs didn’t require grad OR (1) the user calls backward and (2) the test inputs were equal to the train inputs, which require grad (we already test for equality in __call__).

I think that would catch most cases (or at least more than we do now) – basically if the test inputs require grad and are different from the train inputs, we assume that the backward was for the purpose of getting derivatives of the test inputs. Otherwise, if the test inputs don’t require grad or they do but are actually the train inputs, we assume the backward call was for the hyperparameters and/or train inputs.

1reaction

gpleisscommented, Sep 30, 2021

First of all, @fleskovar and @ianhill60 I am sorry for the very slow reply!

I am not sure if inducing points should include the task index. In the case the index has to be included, there does not seem to be a straight forward way to keep the values fixed while also learning the optimal location of the inducing points during training (learn_inducing_locations=True)

From a practical software perspective: this will probably require a different variational strategy. However, it seems like there are lots of requests for a similar Hadamard-style multi-task SVGP model, so I’ll probably take a look at implementing that soon.

From a technical perspective: you’d probably want one set of inducing points per task.