[Bug] Time series prediction with batch independent SGPR
See original GitHub issueš Bug
Hi,
I still have trouble doing large-scale multioutput regression for time series as discussed in #1349 and #1376. I agree that PR #1356 improved the time performance, which now seems satisfactory both on CPU (reasonably similar to GPy) and on GPU (much faster).
However, in my case I am using time series data (input 20k x 7, output 20k x 5), and I want to predict trajectories of my system (rollouts). I am not looking at uncertainty propagation along the rollout for now, I just want to predict the mean trajectory without considering uncertainty in the inputs.
My problem is that I need to predict each point of the rollout one after the other. But the prediction of one single point takes too long, about as long as the prediction of a few thousand points!
For example, in the following test code, the prediction time for the test set (3k x 18) of around 0.5s is acceptable and comparable to that of GPy, but the prediction time for a single test point is also around 0.5s and that makes predicting a long rollout super slow.
Is there a fix for this? Or a better way of predicting rollouts? Or should I be using something else than SGPR for multioutput time series prediction with about 150k samples? I tried Multitask SVGP, it was a bit better for my issue (0.5s for 3k predictions, 0.05s for a single point), but learning was much slower and the test code led to running out of memory when I tried it on the GPUā¦
Thanks a lot for your help!
To reproduce: example code
import time
import urllib.request
from math import floor
import GPy
import gpytorch
import torch
from gpytorch.distributions import MultivariateNormal
from gpytorch.kernels import ScaleKernel, RBFKernel, InducingPointKernel
from gpytorch.means import ConstantMean
from scipy.io import loadmat
if __name__ == '__main__':
# Run GPyTorch SGPR + independent multioutputs example: approximate
# https://docs.gpytorch.ai/en/v1.2.1/examples/02_Scalable_Exact_GPs/SGPR_Regression_CUDA.html
# https://github.com/cornellius-gp/gpytorch/issues/1043
print('Downloading \'elevators\' UCI dataset...')
urllib.request.urlretrieve(
'https://drive.google.com/uc?export=download&id=1jhWL3YUHvXIaftia4qeAyDwVxo6j1alk',
'../elevators.mat')
output_size = 5
nb_inducing_points = 500
data = torch.Tensor(loadmat('../elevators.mat')['data'])
X = data[:, :-1]
X = X - X.min(0)[0]
X = 2 * (X / X.max(0)[0]) - 1
y = data[:, -1]
# MAKE MULTIOUTPUT DATA
y = y.reshape(-1, 1)
y = y.repeat(1, output_size)
print(X.shape, y.shape)
input_size = X.shape[1]
train_n = int(floor(0.8 * len(X)))
train_x = X[:train_n, :].contiguous()
train_y = y[:train_n].contiguous()
test_x = X[train_n:, :].contiguous()
test_y = y[train_n:].contiguous()
if torch.cuda.is_available():
train_x, train_y, test_x, test_y = train_x.cuda(), train_y.cuda(), test_x.cuda(), test_y.cuda()
# CONVERT TO BATCH GP
train_x = train_x.repeat(output_size, 1, 1)
train_y = train_y.transpose(-2, -1)
test_x = test_x.repeat(output_size, 1, 1)
test_y = test_y.transpose(-2, -1)
print(train_x.shape, train_y.shape, test_x.shape)
class GPRegressionModel(gpytorch.models.ExactGP):
def __init__(self, train_x, train_y, likelihood):
super(GPRegressionModel, self).__init__(train_x, train_y,
likelihood)
self.mean_module = ConstantMean(
batch_shape=torch.Size([output_size]))
self.base_covar_module = ScaleKernel(RBFKernel(
batch_shape=torch.Size([output_size])),
batch_shape=torch.Size([output_size]))
inducing_points = train_x[:, :nb_inducing_points, :]
print(inducing_points.shape)
self.covar_module = InducingPointKernel(
self.base_covar_module,
inducing_points=inducing_points,
likelihood=likelihood)
def forward(self, x):
mean_x = self.mean_module(x)
covar_x = self.covar_module(x)
return MultivariateNormal(mean_x, covar_x)
likelihood = gpytorch.likelihoods.GaussianLikelihood(
batch_shape=torch.Size([output_size]))
model = GPRegressionModel(train_x, train_y, likelihood)
if torch.cuda.is_available():
model = model.cuda()
likelihood = likelihood.cuda()
# Train
training_iterations = 10
model.train()
likelihood.train()
optimizer = torch.optim.Adam(model.parameters(), lr=0.1)
mll = gpytorch.mlls.ExactMarginalLogLikelihood(likelihood, model)
start_whole = time.time()
for i in range(training_iterations):
start = time.time()
# Zero backprop gradients
optimizer.zero_grad()
# Get output from model
output = model(train_x)
# Calc loss and backprop derivatives
loss = -mll(output, train_y).sum()
loss.backward()
end = time.time()
print('Iter %d/%d - Loss: %.3f' % (
i + 1, training_iterations, loss.item()), 'in', str(end - start))
optimizer.step()
torch.cuda.empty_cache()
end_whole = time.time()
print('GPyTorch training time', str(end_whole - start_whole))
model.eval()
likelihood.eval()
start = time.time()
with gpytorch.settings.max_preconditioner_size(10), torch.no_grad():
with gpytorch.settings.max_root_decomposition_size(
30), gpytorch.settings.fast_pred_var():
preds = model(test_x)
end = time.time()
print('predict', str(test_x.shape), 'in', str(end - start))
start = time.time()
with gpytorch.settings.max_preconditioner_size(10), torch.no_grad():
with gpytorch.settings.max_root_decomposition_size(
30), gpytorch.settings.fast_pred_var():
preds = model(test_x)
end = time.time()
print('predict 2nd time', str(test_x.shape), 'in', str(end - start))
print('Test MAE: {}'.format(torch.mean(torch.abs(preds.mean - test_y))))
start = time.time()
with gpytorch.settings.max_preconditioner_size(10), torch.no_grad():
with gpytorch.settings.max_root_decomposition_size(
30), gpytorch.settings.fast_pred_var():
preds = model(test_x[:, 0, :])
end = time.time()
print('predict single point', str(test_x[:, 0, :].shape), 'in', str(end -
start),
'\n')
# Compare GPy
# Cannot really use GPU: https://github.com/SheffieldML/GPy/issues/441
# CONVERT BACK FROM BATCH GP
train_x = X[:train_n, :].contiguous()
train_y = y[:train_n].contiguous()
test_x = X[train_n:, :].contiguous()
test_y = y[train_n:].contiguous()
gpykernel = GPy.kern.RBF(input_dim=train_x.numpy().shape[1], ARD=True)
gpymodel = GPy.core.SparseGP(train_x.numpy(),
train_y.numpy(),
train_x.numpy()[:nb_inducing_points, :],
kernel=gpykernel,
likelihood=GPy.likelihoods.Gaussian(),
inference_method=GPy.inference.latent_function_inference.VarDTC())
start = time.time()
gpymodel.optimize(messages=True, max_iters=training_iterations)
end = time.time()
print('GPy training time', str(end - start))
start = time.time()
gpymean, gpyvar = gpymodel.predict(test_x.numpy())
end = time.time()
print('predict', test_x.shape, 'in', str(end - start))
start = time.time()
gpymean, gpyvar = gpymodel.predict(test_x.numpy())
end = time.time()
print('predict 2nd time', test_x.shape, 'in', str(end - start))
print('Test MAE: {}'.format(torch.mean(torch.abs(torch.tensor(gpymean) -
test_y))))
start = time.time()
gpymean, gpyvar = gpymodel.predict(test_x[0].reshape(1, -1).numpy())
end = time.time()
print('predict single point', test_x[0].reshape(1, -1).shape, 'in',
str(end - start), '\n')
** Stack trace/error message ** On CPU:
GPyTorch training time 37.12532615661621
predict torch.Size([5, 3320, 18]) in 5.143052101135254
predict 2nd time torch.Size([5, 3320, 18]) in 0.6981589794158936
predict single point torch.Size([5, 18]) in 0.5465080738067627
GPy training time 52.490806102752686
predict torch.Size([3320, 18]) in 0.05931401252746582
predict 2nd time torch.Size([3320, 18]) in 0.04732799530029297
predict single point torch.Size([1, 18]) in 0.0005881786346435547
On GPU:
GPyTorch training time 1.204216480255127
predict torch.Size([5, 3320, 18]) in 0.3035309314727783
predict 2nd time torch.Size([5, 3320, 18]) in 0.0034995079040527344
predict single point torch.Size([5, 18]) in 0.002752065658569336
GPy training time 86.75350904464722
predict torch.Size([3320, 18]) in 0.1132357120513916
predict 2nd time torch.Size([3320, 18]) in 0.11741828918457031
predict single point torch.Size([1, 18]) in 0.0013453960418701172
Expected Behavior
I was expecting the prediction time for a single point to be at least one order of magnitude lower than that of 3k points, so that I can predict rollouts in a reasonable time.
System information
Please complete the following information:
- GPyTorch version 1.3.1
- PyTorch version 1.7.0
- Mac OS Catalina
Issue Analytics
- State:
- Created 3 years ago
- Comments:10 (5 by maintainers)
Top GitHub Comments
These are the times I get:
GPyTorch 1.4 (GPU)
CPU
GPyTorch 1.3 (GPU)
CPU
So things should definitely be faster on CPU.
@gpleiss yes, even with the new release I still have similar results with the original codeā¦ Any idea why Iām not seeing that improvement? Iām puzzled.
On CPU:
On GPU: