[Question] Implementing multi-output multi-task approximate GP
See original GitHub issueI am looking into implementing a model that produces multiple correlated output for multiple tasks (multi-task-multi-output - MTMO). For this type of model, I assume that the input tensor has the shape n x d+1 (d inputs plus the additional task index) while the output tensor has the shape n x o (where o is the number of correlated outputs). Additionally, all outputs are observed simultaneously but not all tasks are. The training data for this mode would look like this:
train_x = torch.cat([torch.rand(n_training, 1), (torch.rand(n_training,1)>0.5).float()], dim=1)
train_y = torch.stack([
torch.sin(5.5 * train_x[:,0])* train_x[:,0]**2 * train_x[:,-1] + torch.cos(5.5 * train_x[:,0])* train_x[:,0]**2 * (1-train_x[:,-1]),
torch.cos(5.5 * train_x[:,0])* train_x[:,0]*2 * train_x[:,-1] + torch.sin(5.5 * train_x[:,0])* train_x[:,0]*2 * (1-train_x[:,-1])
], axis = -1)
For the exact GP case, the model looks like this:
class MultiOutputMultiTaskGP(ExactGP,):
def __init__(
self,
train_X: Tensor,
train_Y: Tensor,
likelihood: MultitaskGaussianLikelihood = None,
rank = None,
lik_rank = None,
) -> None:
num_tasks = train_Y.shape[-1]
batch_shape, ard_num_dims = train_X.shape[:-2], train_X.shape[-1]
if lik_rank is None:
lik_rank = rank
self._validate_tensor_args(X=train_X, Y=train_Y)
if likelihood is None:
likelihood = MultitaskGaussianLikelihood(
num_tasks=num_tasks,
rank=lik_rank if lik_rank is not None else 0,
)
super(MultiOutputMultiTaskGP, self).__init__(train_X, train_Y, likelihood)
self._rank = rank if rank is not None else num_tasks
self.mean_module = MultitaskMean(
ConstantMean(), num_tasks=num_tasks
)
self.data_kernel = MaternKernel()
self.task_kernel = IndexKernel(num_tasks=len(torch.unique(train_X[..., -1])))
self.output_kernel = IndexKernel(num_tasks=num_tasks)
self.to(train_X)
def forward(self, x: Tensor) -> MultitaskMultivariateNormal:
mean_x = self.mean_module(x)
task_term = self.task_kernel(x[..., -1].long())
data_and_task_x = self.data_kernel(x[..., :-1]).mul(task_term)
output_x = self.output_kernel.covar_matrix
covar_x = KroneckerProductLazyTensor(data_and_task_x, output_x)
return MultitaskMultivariateNormal(mean_x, covar_x)
I want to implement an approximate version of this model by using the LMCVariationalStrategy but I am facing some issues:
- I am not sure if inducing points should include the task index. In the case the index has to be included, there does not seem to be a straight forward way to keep the values fixed while also learning the optimal location of the inducing points during training (
learn_inducing_locations=True
) - Even though I specified
batch_shape = torch.Size([p])
of all kernels and means, the forward method of the exact GP does not seem to work. The shape oftask_term
in the forward method above becomes [p x p] when the input shape is [p x n] (I believe the correct shape should be [n x n]).
Do you know how I should implement this model? Thanks
Issue Analytics
- State:
- Created 2 years ago
- Comments:6 (3 by maintainers)
Top Results From Across the Web
Multi-Output - GP Model Zoo
Details. Concretly, we have a regression problem and we are trying to predict y ∈ R N × P where N is the...
Read more >Fast Approximate Multi-output Gaussian Processes - arXiv
In this work we show how approximating a covariance kernel using eigenfunctions and eigenvalues can greatly reduce the computational complexity ...
Read more >Multitask/Multioutput GPs with Exact Inference
Multitask /Multioutput GPs with Exact Inference¶. Exact GPs can be used to model vector valued functions, or functions that represent multiple tasks.
Read more >Multi Output Gaussian Processes, Mauricio Alvarez - YouTube
Multi Output Gaussian ProcessesMauricio AlvarezUniversity of Sheffieldhttp://gpss.cc/gpss17/slides/multipleOutputGPs.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@fleskovar yes, but a closely following line prevents it from working by default: https://github.com/cornellius-gp/gpytorch/blob/fc2053b0fc00517880fbc11adc7f5802242eec6a/gpytorch/models/exact_prediction_strategies.py#L232
The reason this is done is that otherwise making predictions with the model repeatedly would either need to be done in a
torch.no_grad
context, or rapidly run out of memory due to accumulating compute graphs.@gpleiss I don’t think it’s very simple to add a warning here. The problem is that currently you can backprop w.r.t the test inputs just fine with the caches detached, and that’s a much more common operation (e.g., differentiating a bayesopt acquisition function with respect to the candidate). We wouldn’t want to raise the warning every time we call backward for that purpose.
Maybe we raise a warning if (1) the user calls backward, and (2) the last set of test inputs didn’t require grad OR (1) the user calls backward and (2) the test inputs were equal to the train inputs, which require grad (we already test for equality in
__call__
).I think that would catch most cases (or at least more than we do now) – basically if the test inputs require grad and are different from the train inputs, we assume that the backward was for the purpose of getting derivatives of the test inputs. Otherwise, if the test inputs don’t require grad or they do but are actually the train inputs, we assume the backward call was for the hyperparameters and/or train inputs.
First of all, @fleskovar and @ianhill60 I am sorry for the very slow reply!
From a practical software perspective: this will probably require a different variational strategy. However, it seems like there are lots of requests for a similar Hadamard-style multi-task SVGP model, so I’ll probably take a look at implementing that soon.
From a technical perspective: you’d probably want one set of inducing points per task.