question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. ItĀ collects links to all the places you might be looking at while hunting down a tough bug.

And, if youā€™re still stuck at the end, weā€™re happy to hop on a call to see how we can help out.

[Bug] Time series prediction with batch independent SGPR

See original GitHub issue

šŸ› Bug

Hi,

I still have trouble doing large-scale multioutput regression for time series as discussed in #1349 and #1376. I agree that PR #1356 improved the time performance, which now seems satisfactory both on CPU (reasonably similar to GPy) and on GPU (much faster).

However, in my case I am using time series data (input 20k x 7, output 20k x 5), and I want to predict trajectories of my system (rollouts). I am not looking at uncertainty propagation along the rollout for now, I just want to predict the mean trajectory without considering uncertainty in the inputs.

My problem is that I need to predict each point of the rollout one after the other. But the prediction of one single point takes too long, about as long as the prediction of a few thousand points!

For example, in the following test code, the prediction time for the test set (3k x 18) of around 0.5s is acceptable and comparable to that of GPy, but the prediction time for a single test point is also around 0.5s and that makes predicting a long rollout super slow.

Is there a fix for this? Or a better way of predicting rollouts? Or should I be using something else than SGPR for multioutput time series prediction with about 150k samples? I tried Multitask SVGP, it was a bit better for my issue (0.5s for 3k predictions, 0.05s for a single point), but learning was much slower and the test code led to running out of memory when I tried it on the GPUā€¦

Thanks a lot for your help!

To reproduce: example code

import time
import urllib.request
from math import floor

import GPy
import gpytorch
import torch
from gpytorch.distributions import MultivariateNormal
from gpytorch.kernels import ScaleKernel, RBFKernel, InducingPointKernel
from gpytorch.means import ConstantMean
from scipy.io import loadmat

if __name__ == '__main__':
    # Run GPyTorch SGPR + independent multioutputs example: approximate
    # https://docs.gpytorch.ai/en/v1.2.1/examples/02_Scalable_Exact_GPs/SGPR_Regression_CUDA.html
    # https://github.com/cornellius-gp/gpytorch/issues/1043
    print('Downloading \'elevators\' UCI dataset...')
    urllib.request.urlretrieve(
        'https://drive.google.com/uc?export=download&id=1jhWL3YUHvXIaftia4qeAyDwVxo6j1alk',
        '../elevators.mat')
    output_size = 5
    nb_inducing_points = 500
    data = torch.Tensor(loadmat('../elevators.mat')['data'])
    X = data[:, :-1]
    X = X - X.min(0)[0]
    X = 2 * (X / X.max(0)[0]) - 1
    y = data[:, -1]
    # MAKE MULTIOUTPUT DATA
    y = y.reshape(-1, 1)
    y = y.repeat(1, output_size)
    print(X.shape, y.shape)
    input_size = X.shape[1]
    train_n = int(floor(0.8 * len(X)))
    train_x = X[:train_n, :].contiguous()
    train_y = y[:train_n].contiguous()
    test_x = X[train_n:, :].contiguous()
    test_y = y[train_n:].contiguous()
    if torch.cuda.is_available():
        train_x, train_y, test_x, test_y = train_x.cuda(), train_y.cuda(), test_x.cuda(), test_y.cuda()
    # CONVERT TO BATCH GP
    train_x = train_x.repeat(output_size, 1, 1)
    train_y = train_y.transpose(-2, -1)
    test_x = test_x.repeat(output_size, 1, 1)
    test_y = test_y.transpose(-2, -1)
    print(train_x.shape, train_y.shape, test_x.shape)


    class GPRegressionModel(gpytorch.models.ExactGP):
        def __init__(self, train_x, train_y, likelihood):
            super(GPRegressionModel, self).__init__(train_x, train_y,
                                                    likelihood)
            self.mean_module = ConstantMean(
                batch_shape=torch.Size([output_size]))
            self.base_covar_module = ScaleKernel(RBFKernel(
                batch_shape=torch.Size([output_size])),
                batch_shape=torch.Size([output_size]))
            inducing_points = train_x[:, :nb_inducing_points, :]
            print(inducing_points.shape)
            self.covar_module = InducingPointKernel(
                self.base_covar_module,
                inducing_points=inducing_points,
                likelihood=likelihood)

        def forward(self, x):
            mean_x = self.mean_module(x)
            covar_x = self.covar_module(x)
            return MultivariateNormal(mean_x, covar_x)


    likelihood = gpytorch.likelihoods.GaussianLikelihood(
        batch_shape=torch.Size([output_size]))
    model = GPRegressionModel(train_x, train_y, likelihood)
    if torch.cuda.is_available():
        model = model.cuda()
        likelihood = likelihood.cuda()
    # Train
    training_iterations = 10
    model.train()
    likelihood.train()
    optimizer = torch.optim.Adam(model.parameters(), lr=0.1)
    mll = gpytorch.mlls.ExactMarginalLogLikelihood(likelihood, model)
    start_whole = time.time()
    for i in range(training_iterations):
        start = time.time()
        # Zero backprop gradients
        optimizer.zero_grad()
        # Get output from model
        output = model(train_x)
        # Calc loss and backprop derivatives
        loss = -mll(output, train_y).sum()
        loss.backward()
        end = time.time()
        print('Iter %d/%d - Loss: %.3f' % (
            i + 1, training_iterations, loss.item()), 'in', str(end - start))
        optimizer.step()
        torch.cuda.empty_cache()
    end_whole = time.time()
    print('GPyTorch training time', str(end_whole - start_whole))
    model.eval()
    likelihood.eval()
    start = time.time()
    with gpytorch.settings.max_preconditioner_size(10), torch.no_grad():
        with gpytorch.settings.max_root_decomposition_size(
                30), gpytorch.settings.fast_pred_var():
            preds = model(test_x)
    end = time.time()
    print('predict', str(test_x.shape), 'in', str(end - start))
    start = time.time()
    with gpytorch.settings.max_preconditioner_size(10), torch.no_grad():
        with gpytorch.settings.max_root_decomposition_size(
                30), gpytorch.settings.fast_pred_var():
            preds = model(test_x)
    end = time.time()
    print('predict 2nd time', str(test_x.shape), 'in', str(end - start))
    print('Test MAE: {}'.format(torch.mean(torch.abs(preds.mean - test_y))))
    start = time.time()
    with gpytorch.settings.max_preconditioner_size(10), torch.no_grad():
        with gpytorch.settings.max_root_decomposition_size(
                30), gpytorch.settings.fast_pred_var():
            preds = model(test_x[:, 0, :])
    end = time.time()
    print('predict single point', str(test_x[:, 0, :].shape), 'in', str(end -
                                                                        start),
          '\n')

    # Compare GPy
    # Cannot really use GPU: https://github.com/SheffieldML/GPy/issues/441
    # CONVERT BACK FROM BATCH GP
    train_x = X[:train_n, :].contiguous()
    train_y = y[:train_n].contiguous()
    test_x = X[train_n:, :].contiguous()
    test_y = y[train_n:].contiguous()
    gpykernel = GPy.kern.RBF(input_dim=train_x.numpy().shape[1], ARD=True)
    gpymodel = GPy.core.SparseGP(train_x.numpy(),
                                 train_y.numpy(),
                                 train_x.numpy()[:nb_inducing_points, :],
                                 kernel=gpykernel,
                                 likelihood=GPy.likelihoods.Gaussian(),
                                 inference_method=GPy.inference.latent_function_inference.VarDTC())
    start = time.time()
    gpymodel.optimize(messages=True, max_iters=training_iterations)
    end = time.time()
    print('GPy training time', str(end - start))
    start = time.time()
    gpymean, gpyvar = gpymodel.predict(test_x.numpy())
    end = time.time()
    print('predict', test_x.shape, 'in', str(end - start))
    start = time.time()
    gpymean, gpyvar = gpymodel.predict(test_x.numpy())
    end = time.time()
    print('predict 2nd time', test_x.shape, 'in', str(end - start))
    print('Test MAE: {}'.format(torch.mean(torch.abs(torch.tensor(gpymean) -
                                                     test_y))))
    start = time.time()
    gpymean, gpyvar = gpymodel.predict(test_x[0].reshape(1, -1).numpy())
    end = time.time()
    print('predict single point', test_x[0].reshape(1, -1).shape, 'in',
          str(end - start), '\n')

** Stack trace/error message ** On CPU:

GPyTorch training time 37.12532615661621
predict torch.Size([5, 3320, 18]) in 5.143052101135254
predict 2nd time torch.Size([5, 3320, 18]) in 0.6981589794158936
predict single point torch.Size([5, 18]) in 0.5465080738067627 

GPy training time 52.490806102752686
predict torch.Size([3320, 18]) in 0.05931401252746582
predict 2nd time torch.Size([3320, 18]) in 0.04732799530029297
predict single point torch.Size([1, 18]) in 0.0005881786346435547 

On GPU:

GPyTorch training time 1.204216480255127
predict torch.Size([5, 3320, 18]) in 0.3035309314727783
predict 2nd time torch.Size([5, 3320, 18]) in 0.0034995079040527344
predict single point torch.Size([5, 18]) in 0.002752065658569336 

GPy training time 86.75350904464722
predict torch.Size([3320, 18]) in 0.1132357120513916
predict 2nd time torch.Size([3320, 18]) in 0.11741828918457031
predict single point torch.Size([1, 18]) in 0.0013453960418701172 

Expected Behavior

I was expecting the prediction time for a single point to be at least one order of magnitude lower than that of 3k points, so that I can predict rollouts in a reasonable time.

System information

Please complete the following information:

  • GPyTorch version 1.3.1
  • PyTorch version 1.7.0
  • Mac OS Catalina

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:10 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
gpleisscommented, Feb 26, 2021

These are the times I get:

GPyTorch 1.4 (GPU)

Train X shape: torch.Size([5, 13279, 18])
GPyTorch training time 1.6287736892700195
predict torch.Size([5, 3320, 18]) in 0.15840530395507812
predict 2nd time torch.Size([5, 3320, 18]) in 0.006852388381958008
predict single point torch.Size([5, 1, 18]) in 0.002796173095703125

CPU

GPyTorch training time 11.376405954360962
predict torch.Size([5, 3320, 18]) in 0.8048131465911865
predict 2nd time torch.Size([5, 3320, 18]) in 0.2591075897216797
predict single point torch.Size([5, 1, 18]) in 0.18338346481323242

GPyTorch 1.3 (GPU)

GPyTorch training time 1.689206600189209
predict torch.Size([5, 3320, 18]) in 0.38948655128479004
predict 2nd time torch.Size([5, 3320, 18]) in 0.001882791519165039
predict single point torch.Size([5, 1, 18]) in 0.00189208984375

CPU

GPyTorch training time 12.399611473083496
predict torch.Size([5, 3320, 18]) in 3.049643039703369
predict 2nd time torch.Size([5, 3320, 18]) in 0.30733466148376465
predict single point torch.Size([5, 1, 18]) in 0.22333765029907227

So things should definitely be faster on CPU.

0reactions
monabfcommented, Feb 23, 2021

@gpleiss yes, even with the new release I still have similar results with the original codeā€¦ Any idea why Iā€™m not seeing that improvement? Iā€™m puzzled.

On CPU:

GPyTorch training time 36.57296395301819
predict torch.Size([5, 3320, 18]) in 2.4842982292175293
predict 2nd time torch.Size([5, 3320, 18]) in 0.8440001010894775
predict single point torch.Size([5, 18]) in 0.5074729919433594

GPy training time 60.12309408187866
predict torch.Size([3320, 18]) in 0.07687807083129883
predict 2nd time torch.Size([3320, 18]) in 0.06141304969787598
predict single point torch.Size([1, 18]) in 0.0006778240203857422 

On GPU:

GPyTorch training time 0.9482972621917725
predict torch.Size([5, 3320, 18]) in 0.07795548439025879
predict 2nd time torch.Size([5, 3320, 18]) in 0.0073430538177490234
predict single point torch.Size([5, 18]) in 0.005500316619873047 

GPy training time 109.57454776763916
predict torch.Size([3320, 18]) in 0.13934540748596191
predict 2nd time torch.Size([3320, 18]) in 0.14101672172546387
predict single point torch.Size([1, 18]) in 0.0014951229095458984 
Read more comments on GitHub >

github_iconTop Results From Across the Web

[Question] Sparse GPs for Batch Independent MultiOutputs ...
I implemented a sparse + batch independent multioutput GP model class as ... comparing training and prediction time with GPyTorch and GPy.
Read more >
How to Develop LSTM Models for Time Series Forecasting
The objective of this tutorial is to provide standalone examples of each model on each type of time series problem as a template...
Read more >
A COMPARATIVE STUDY BETWEEN ALGORITHMS FOR ...
In this study most major algorithms together with recent innovations for time series forecasting is trained and evaluated on two datasets from the...
Read more >
Time Series Forecasting with Recurrent Neural Networks
In this post, we'll review three advanced techniques for improving the performance and generalization power of recurrent neural networks.
Read more >
Being Bayesian and thinking deep: time-series prediction with ...
In this post we are going to tackle the problem of time series prediction the input, while being able to explore underlying nonlinearĀ ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found