[Question] Sparse GPs for Batch Independent MultiOutputs slow prediction?
See original GitHub issueQuestion
Thanks for this great package!
Not really a bug, rather a question. I need to train a GP with 20,000x7 training inputs, 20,000x5 training outputs, and 40,000x7 test inputs. My first call was to look into the implementation of sparse GPs, because that’s what I usually use with GPy. I implemented a sparse + batch independent multioutput GP model class as in issue #1043 and tested it with the UCI elevators dataset from the SGPR tutorial.
However, I noticed that while training runs as expected, prediction is unreasonably slow. Attached is a small example comparing training and prediction time with GPyTorch and GPy. The difference gets much larger as the number of output dimensions, training samples or inducing points grows.
So here are my questions:
- How come prediction takes so long with the sparse + multioutput GP model from #1043? Is there a bug somewhere, or is this actually expected?
- Is it just a bad call to try using SGPR with multioutputs? I don’t care which method I am using, I just need to be able to train large, multioutput GPs efficiently. This will mostly be done on GPU, but I would still like it to run reasonably on CPU so I can do some tests locally. With sparse GPs from GPy my use case ran in around 2h, which I would consider reasonable on CPU for tests, but with my current GPyTorch implementation prediction alone takes about 20h… I am open to suggestions!
Thanks a lot for your help.
To reproduce
import time
import urllib.request
from math import floor
import GPy
import gpytorch
import torch
from gpytorch.distributions import MultivariateNormal
from gpytorch.kernels import ScaleKernel, RBFKernel, InducingPointKernel
from gpytorch.means import ConstantMean
from scipy.io import loadmat
if __name__ == '__main__':
# Run GPyTorch SGPR + independent multioutputs example
# https://docs.gpytorch.ai/en/v1.2.1/examples/02_Scalable_Exact_GPs/SGPR_Regression_CUDA.html
# https://github.com/cornellius-gp/gpytorch/issues/1043
print('Downloading \'elevators\' UCI dataset...')
urllib.request.urlretrieve(
'https://drive.google.com/uc?export=download&id=1jhWL3YUHvXIaftia4qeAyDwVxo6j1alk',
'../elevators.mat')
output_size = 2
data = torch.Tensor(loadmat('../elevators.mat')['data'])
X = data[:, :-1]
X = X - X.min(0)[0]
X = 2 * (X / X.max(0)[0]) - 1
y = data[:, -1]
# MAKE MULTIOUTPUT DATA
y = y.reshape(-1, 1)
y = y.repeat(1, output_size)
print(X.shape, y.shape)
train_n = int(floor(0.8 * len(X)))
train_x = X[:train_n, :].contiguous()
train_y = y[:train_n].contiguous()
test_x = X[train_n:, :].contiguous()
test_y = y[train_n:].contiguous()
if torch.cuda.is_available():
train_x, train_y, test_x, test_y = train_x.cuda(), train_y.cuda(), test_x.cuda(), test_y.cuda()
# CONVERT TO BATCH GP
train_x = train_x.repeat(output_size, 1, 1)
train_y = train_y.transpose(-2, -1)
test_x = test_x.repeat(output_size, 1, 1)
test_y = test_y.transpose(-2, -1)
print(train_x.shape, train_y.shape, test_x.shape)
class GPRegressionModel(gpytorch.models.ExactGP):
def __init__(self, train_x, train_y, likelihood):
super(GPRegressionModel, self).__init__(train_x, train_y,
likelihood)
self.mean_module = ConstantMean(
batch_shape=torch.Size([output_size]))
self.base_covar_module = ScaleKernel(RBFKernel(
batch_shape=torch.Size([output_size])),
batch_shape=torch.Size([output_size]))
self.covar_module = InducingPointKernel(
self.base_covar_module,
inducing_points=train_x[:, :500, :],
likelihood=likelihood)
def forward(self, x):
mean_x = self.mean_module(x)
covar_x = self.covar_module(x)
return MultivariateNormal(mean_x, covar_x)
likelihood = gpytorch.likelihoods.GaussianLikelihood(
batch_shape=torch.Size([output_size]))
model = GPRegressionModel(train_x, train_y, likelihood)
if torch.cuda.is_available():
model = model.cuda()
likelihood = likelihood.cuda()
# Train
training_iterations = 50
model.train()
likelihood.train()
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)
mll = gpytorch.mlls.ExactMarginalLogLikelihood(likelihood, model)
start_whole = time.time()
for i in range(training_iterations):
start = time.time()
# Zero backprop gradients
optimizer.zero_grad()
# Get output from model
output = model(train_x)
# Calc loss and backprop derivatives
loss = -mll(output, train_y).sum()
loss.backward()
end = time.time()
print('Iter %d/%d - Loss: %.3f' % (
i + 1, training_iterations, loss.item()), 'in', str(end - start))
optimizer.step()
torch.cuda.empty_cache()
end_whole = time.time()
print('GPyTorch training time', str(end_whole - start_whole))
model.eval()
likelihood.eval()
start = time.time()
with gpytorch.settings.max_preconditioner_size(10), torch.no_grad():
with gpytorch.settings.max_root_decomposition_size(
30), gpytorch.settings.fast_pred_var():
preds = model(test_x)
end = time.time()
print('predict', str(test_x.shape[1]), 'in', str(end - start))
print('Test MAE: {}'.format(torch.mean(torch.abs(preds.mean - test_y))))
# Compare GPy
train_x = X[:train_n, :].contiguous()
train_y = y[:train_n].contiguous()
test_x = X[train_n:, :].contiguous()
test_y = y[train_n:].contiguous()
gpykernel = GPy.kern.RBF(input_dim=train_x.numpy().shape[1], ARD=True)
gpymodel = GPy.core.SparseGP(train_x.numpy(),
train_y.numpy(),
train_x.numpy()[:500, :],
kernel=gpykernel,
likelihood=GPy.likelihoods.Gaussian(),
inference_method=GPy.inference.latent_function_inference.VarDTC())
start = time.time()
gpymodel.optimize(messages=True, max_iters=50)
end = time.time()
print('GPy training time', str(end - start))
start = time.time()
gpymean, gpyvar = gpymodel.predict(test_x.numpy())
end = time.time()
print('predict', str(len(test_x)), 'in', str(end - start))
print('Test MAE: {}'.format(torch.mean(torch.abs(torch.tensor(gpymean) -
test_y))))
** On my laptop’s CPU **
GPyTorch training time 182.81384825706482
predict 3320 in 6.573011159896851
Test MAE: 0.07266882807016373
GPy training time 220.74766993522644
predict 3320 in 0.06382608413696289
Test MAE: 0.06792577020335429
** On GPU with Google Colab **
GPyTorch training time 12.468220949172974
predict 3320 in 0.2444145679473877
Test MAE: 0.07270058244466782
GPy training time 340.8614845275879
predict 3320 in 0.11290717124938965
Test MAE: 0.06792590803210088
Expected Behavior
I would expect training and prediction time for both GPyTorch and GPy to be of the same order of magnitude on CPU, GPyTorch an order of magnitude faster on GPU. For training this is approximately the case, but not for prediction…
System information
Please complete the following information:
- GPyTorch version 1.2.0
- PyTorch version 1.6.0
- Mac OS Catalina
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:5 (3 by maintainers)
Top GitHub Comments
@gpleiss while we’re at it, we could generalize all
UWV + A
linear operators and use the matrix inversion lemma and matrix determinant lemma. This is probably a good idea for LinearKernel too.LowRankPlusDiag
?Probably any time we can do a direct solve in the same asymptotic complexity as CG we should prefer the direct solve.
Since you’ve been taking on the torch function stuff, I can get to work on this.
After updating to the latest version of GPyTorch, running the exact same code above raises an error in the forward function. @gpleiss @jacobrgardner could something have gone wrong with this fix?
Thanks for your help!
Output of the example code after updating GPyTorch: