Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Bug] Time series prediction with batch independent SGPR

See original GitHub issue

🐛 Bug

Hi,

I still have trouble doing large-scale multioutput regression for time series as discussed in #1349 and #1376. I agree that PR #1356 improved the time performance, which now seems satisfactory both on CPU (reasonably similar to GPy) and on GPU (much faster).

However, in my case I am using time series data (input 20k x 7, output 20k x 5), and I want to predict trajectories of my system (rollouts). I am not looking at uncertainty propagation along the rollout for now, I just want to predict the mean trajectory without considering uncertainty in the inputs.

My problem is that I need to predict each point of the rollout one after the other. But the prediction of one single point takes too long, about as long as the prediction of a few thousand points!

For example, in the following test code, the prediction time for the test set (3k x 18) of around 0.5s is acceptable and comparable to that of GPy, but the prediction time for a single test point is also around 0.5s and that makes predicting a long rollout super slow.

Is there a fix for this? Or a better way of predicting rollouts? Or should I be using something else than SGPR for multioutput time series prediction with about 150k samples? I tried Multitask SVGP, it was a bit better for my issue (0.5s for 3k predictions, 0.05s for a single point), but learning was much slower and the test code led to running out of memory when I tried it on the GPU…

Thanks a lot for your help!

To reproduce: example code

import time
import urllib.request
from math import floor

import GPy
import gpytorch
import torch
from gpytorch.distributions import MultivariateNormal
from gpytorch.kernels import ScaleKernel, RBFKernel, InducingPointKernel
from gpytorch.means import ConstantMean
from scipy.io import loadmat

if __name__ == '__main__':
    # Run GPyTorch SGPR + independent multioutputs example: approximate
    # https://docs.gpytorch.ai/en/v1.2.1/examples/02_Scalable_Exact_GPs/SGPR_Regression_CUDA.html
    # https://github.com/cornellius-gp/gpytorch/issues/1043
    print('Downloading \'elevators\' UCI dataset...')
    urllib.request.urlretrieve(
        'https://drive.google.com/uc?export=download&id=1jhWL3YUHvXIaftia4qeAyDwVxo6j1alk',
        '../elevators.mat')
    output_size = 5
    nb_inducing_points = 500
    data = torch.Tensor(loadmat('../elevators.mat')['data'])
    X = data[:, :-1]
    X = X - X.min(0)[0]
    X = 2 * (X / X.max(0)[0]) - 1
    y = data[:, -1]
    # MAKE MULTIOUTPUT DATA
    y = y.reshape(-1, 1)
    y = y.repeat(1, output_size)
    print(X.shape, y.shape)
    input_size = X.shape[1]
    train_n = int(floor(0.8 * len(X)))
    train_x = X[:train_n, :].contiguous()
    train_y = y[:train_n].contiguous()
    test_x = X[train_n:, :].contiguous()
    test_y = y[train_n:].contiguous()
    if torch.cuda.is_available():
        train_x, train_y, test_x, test_y = train_x.cuda(), train_y.cuda(), test_x.cuda(), test_y.cuda()
    # CONVERT TO BATCH GP
    train_x = train_x.repeat(output_size, 1, 1)
    train_y = train_y.transpose(-2, -1)
    test_x = test_x.repeat(output_size, 1, 1)
    test_y = test_y.transpose(-2, -1)
    print(train_x.shape, train_y.shape, test_x.shape)


    class GPRegressionModel(gpytorch.models.ExactGP):
        def __init__(self, train_x, train_y, likelihood):
            super(GPRegressionModel, self).__init__(train_x, train_y,
                                                    likelihood)
            self.mean_module = ConstantMean(
                batch_shape=torch.Size([output_size]))
            self.base_covar_module = ScaleKernel(RBFKernel(
                batch_shape=torch.Size([output_size])),
                batch_shape=torch.Size([output_size]))
            inducing_points = train_x[:, :nb_inducing_points, :]
            print(inducing_points.shape)
            self.covar_module = InducingPointKernel(
                self.base_covar_module,
                inducing_points=inducing_points,
                likelihood=likelihood)

        def forward(self, x):
            mean_x = self.mean_module(x)
            covar_x = self.covar_module(x)
            return MultivariateNormal(mean_x, covar_x)


    likelihood = gpytorch.likelihoods.GaussianLikelihood(
        batch_shape=torch.Size([output_size]))
    model = GPRegressionModel(train_x, train_y, likelihood)
    if torch.cuda.is_available():
        model = model.cuda()
        likelihood = likelihood.cuda()
    # Train
    training_iterations = 10
    model.train()
    likelihood.train()
    optimizer = torch.optim.Adam(model.parameters(), lr=0.1)
    mll = gpytorch.mlls.ExactMarginalLogLikelihood(likelihood, model)
    start_whole = time.time()
    for i in range(training_iterations):
        start = time.time()
        # Zero backprop gradients
        optimizer.zero_grad()
        # Get output from model
        output = model(train_x)
        # Calc loss and backprop derivatives
        loss = -mll(output, train_y).sum()
        loss.backward()
        end = time.time()
        print('Iter %d/%d - Loss: %.3f' % (
            i + 1, training_iterations, loss.item()), 'in', str(end - start))
        optimizer.step()
        torch.cuda.empty_cache()
    end_whole = time.time()
    print('GPyTorch training time', str(end_whole - start_whole))
    model.eval()
    likelihood.eval()
    start = time.time()
    with gpytorch.settings.max_preconditioner_size(10), torch.no_grad():
        with gpytorch.settings.max_root_decomposition_size(
                30), gpytorch.settings.fast_pred_var():
            preds = model(test_x)
    end = time.time()
    print('predict', str(test_x.shape), 'in', str(end - start))
    start = time.time()
    with gpytorch.settings.max_preconditioner_size(10), torch.no_grad():
        with gpytorch.settings.max_root_decomposition_size(
                30), gpytorch.settings.fast_pred_var():
            preds = model(test_x)
    end = time.time()
    print('predict 2nd time', str(test_x.shape), 'in', str(end - start))
    print('Test MAE: {}'.format(torch.mean(torch.abs(preds.mean - test_y))))
    start = time.time()
    with gpytorch.settings.max_preconditioner_size(10), torch.no_grad():
        with gpytorch.settings.max_root_decomposition_size(
                30), gpytorch.settings.fast_pred_var():
            preds = model(test_x[:, 0, :])
    end = time.time()
    print('predict single point', str(test_x[:, 0, :].shape), 'in', str(end -
                                                                        start),
          '\n')

    # Compare GPy
    # Cannot really use GPU: https://github.com/SheffieldML/GPy/issues/441
    # CONVERT BACK FROM BATCH GP
    train_x = X[:train_n, :].contiguous()
    train_y = y[:train_n].contiguous()
    test_x = X[train_n:, :].contiguous()
    test_y = y[train_n:].contiguous()
    gpykernel = GPy.kern.RBF(input_dim=train_x.numpy().shape[1], ARD=True)
    gpymodel = GPy.core.SparseGP(train_x.numpy(),
                                 train_y.numpy(),
                                 train_x.numpy()[:nb_inducing_points, :],
                                 kernel=gpykernel,
                                 likelihood=GPy.likelihoods.Gaussian(),
                                 inference_method=GPy.inference.latent_function_inference.VarDTC())
    start = time.time()
    gpymodel.optimize(messages=True, max_iters=training_iterations)
    end = time.time()
    print('GPy training time', str(end - start))
    start = time.time()
    gpymean, gpyvar = gpymodel.predict(test_x.numpy())
    end = time.time()
    print('predict', test_x.shape, 'in', str(end - start))
    start = time.time()
    gpymean, gpyvar = gpymodel.predict(test_x.numpy())
    end = time.time()
    print('predict 2nd time', test_x.shape, 'in', str(end - start))
    print('Test MAE: {}'.format(torch.mean(torch.abs(torch.tensor(gpymean) -
                                                     test_y))))
    start = time.time()
    gpymean, gpyvar = gpymodel.predict(test_x[0].reshape(1, -1).numpy())
    end = time.time()
    print('predict single point', test_x[0].reshape(1, -1).shape, 'in',
          str(end - start), '\n')

** Stack trace/error message ** On CPU:

GPyTorch training time 37.12532615661621
predict torch.Size([5, 3320, 18]) in 5.143052101135254
predict 2nd time torch.Size([5, 3320, 18]) in 0.6981589794158936
predict single point torch.Size([5, 18]) in 0.5465080738067627 

GPy training time 52.490806102752686
predict torch.Size([3320, 18]) in 0.05931401252746582
predict 2nd time torch.Size([3320, 18]) in 0.04732799530029297
predict single point torch.Size([1, 18]) in 0.0005881786346435547

On GPU:

GPyTorch training time 1.204216480255127
predict torch.Size([5, 3320, 18]) in 0.3035309314727783
predict 2nd time torch.Size([5, 3320, 18]) in 0.0034995079040527344
predict single point torch.Size([5, 18]) in 0.002752065658569336 

GPy training time 86.75350904464722
predict torch.Size([3320, 18]) in 0.1132357120513916
predict 2nd time torch.Size([3320, 18]) in 0.11741828918457031
predict single point torch.Size([1, 18]) in 0.0013453960418701172

Expected Behavior

I was expecting the prediction time for a single point to be at least one order of magnitude lower than that of 3k points, so that I can predict rollouts in a reasonable time.

System information

Please complete the following information:

GPyTorch version 1.3.1
PyTorch version 1.7.0
Mac OS Catalina

Issue Analytics

State:
Created 3 years ago
Comments:10 (5 by maintainers)

Top GitHub Comments

1reaction

gpleisscommented, Feb 26, 2021

These are the times I get:

GPyTorch 1.4 (GPU)

Train X shape: torch.Size([5, 13279, 18])
GPyTorch training time 1.6287736892700195
predict torch.Size([5, 3320, 18]) in 0.15840530395507812
predict 2nd time torch.Size([5, 3320, 18]) in 0.006852388381958008
predict single point torch.Size([5, 1, 18]) in 0.002796173095703125

CPU

GPyTorch training time 11.376405954360962
predict torch.Size([5, 3320, 18]) in 0.8048131465911865
predict 2nd time torch.Size([5, 3320, 18]) in 0.2591075897216797
predict single point torch.Size([5, 1, 18]) in 0.18338346481323242

GPyTorch 1.3 (GPU)

GPyTorch training time 1.689206600189209
predict torch.Size([5, 3320, 18]) in 0.38948655128479004
predict 2nd time torch.Size([5, 3320, 18]) in 0.001882791519165039
predict single point torch.Size([5, 1, 18]) in 0.00189208984375

CPU

GPyTorch training time 12.399611473083496
predict torch.Size([5, 3320, 18]) in 3.049643039703369
predict 2nd time torch.Size([5, 3320, 18]) in 0.30733466148376465
predict single point torch.Size([5, 1, 18]) in 0.22333765029907227

So things should definitely be faster on CPU.

0reactions

monabfcommented, Feb 23, 2021

@gpleiss yes, even with the new release I still have similar results with the original code… Any idea why I’m not seeing that improvement? I’m puzzled.

On CPU:

GPyTorch training time 36.57296395301819
predict torch.Size([5, 3320, 18]) in 2.4842982292175293
predict 2nd time torch.Size([5, 3320, 18]) in 0.8440001010894775
predict single point torch.Size([5, 18]) in 0.5074729919433594

GPy training time 60.12309408187866
predict torch.Size([3320, 18]) in 0.07687807083129883
predict 2nd time torch.Size([3320, 18]) in 0.06141304969787598
predict single point torch.Size([1, 18]) in 0.0006778240203857422

On GPU:

GPyTorch training time 0.9482972621917725
predict torch.Size([5, 3320, 18]) in 0.07795548439025879
predict 2nd time torch.Size([5, 3320, 18]) in 0.0073430538177490234
predict single point torch.Size([5, 18]) in 0.005500316619873047 

GPy training time 109.57454776763916
predict torch.Size([3320, 18]) in 0.13934540748596191
predict 2nd time torch.Size([3320, 18]) in 0.14101672172546387
predict single point torch.Size([1, 18]) in 0.0014951229095458984