Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Grid Kernel implementation is taking more time over vanilla Kernel

See original GitHub issue

Problem

I am trying to use Grid Kernels over default Kernals to speed up GPR training. My understanding is that Grid kernels implementation would exploit the tensor algebra and reduce the computational complexity drastically as Cholesky decomposition is used on individual matrices in the Kronecker product. I have disabled fast computations using context manager so that Cholesky decomposition is used instead of Conjugate gradient. I am observing that time taken for each step of training with Grid Kernal is significantly higher than when using default Kernal when fast computations are disabled. When I enable the fast computations (default settings), the computational time is less with Grid Kernal which is expected.

It would be helpful if someone can point me how to use Cholesky decomposition and still get speed up while using Grid Kernel.

Code to reproduce

Below code is taken from the Grid_GP_Regression tutorial. I am using the same data to compare the computational times.

import gpytorch
import torch
import math
import timeit

def train_GPR(model, likelihood, train_x, train_y, training_iter = 10, chol_flag = True):
    # Find optimal model hyperparameters
    model.train()
    likelihood.train()

    # Use the adam optimizer
    optimizer = torch.optim.Adam(model.parameters(), lr=0.1)  # Includes GaussianLikelihood parameters

    # "Loss" for GPs - the marginal log likelihood
    mll = gpytorch.mlls.ExactMarginalLogLikelihood(likelihood, model)

    for i in range(training_iter):
        # Zero gradients from previous iteration
        optimizer.zero_grad()

        start_time = timeit.default_timer()
        
        if chol_flag:
            with gpytorch.settings.max_cholesky_size(11000), \
                gpytorch.settings.fast_computations(covar_root_decomposition=False, log_prob=False, solves=False):
                # Output from model
                output = model(train_x)
                # Calc loss and backprop gradients
                loss = -mll(output, train_y)
        else:
            # Output from model
            output = model(train_x)
            # Calc loss and backprop gradients
            loss = -mll(output, train_y)

        loss.backward()
        optimizer.step()

        time_taken = timeit.default_timer() - start_time

        print('Iter %d/%d - step time: %.6f s' % (i + 1, training_iter, time_taken))


#################################
### GRID GPR data
#################################

grid_bounds = [(0, 1), (0, 2)]
grid_size = 50
grid = torch.zeros(grid_size, len(grid_bounds))
for i in range(len(grid_bounds)):
    grid_diff = float(grid_bounds[i][1] - grid_bounds[i][0]) / (grid_size - 2)
    grid[:, i] = torch.linspace(grid_bounds[i][0] - grid_diff, grid_bounds[i][1] + grid_diff, grid_size)

train_x = gpytorch.utils.grid.create_data_from_grid(grid)
train_y = torch.sin((train_x[:, 0] + train_x[:, 1]) * (2 * math.pi)) + torch.randn_like(train_x[:, 0]).mul(0.01)


### Model

class GridGPRegressionModel(gpytorch.models.ExactGP):
    def __init__(self, grid, train_x, train_y, likelihood):
        super(GridGPRegressionModel, self).__init__(train_x, train_y, likelihood)
        num_dims = train_x.size(-1)
        self.mean_module = gpytorch.means.ConstantMean()
        self.covar_module = gpytorch.kernels.GridKernel(gpytorch.kernels.RBFKernel(), grid=grid)

    def forward(self, x):
        mean_x = self.mean_module(x)
        covar_x = self.covar_module(x)
        return gpytorch.distributions.MultivariateNormal(mean_x, covar_x)

likelihood = gpytorch.likelihoods.GaussianLikelihood()
model = GridGPRegressionModel(grid, train_x, train_y, likelihood)

training_iter = 10

print('Train GPR model using Grid kernal ')
train_GPR(model, likelihood, train_x, train_y, training_iter = 10, chol_flag = True)

#################################
### GPR data
#################################

# same as the grid GPR data 
train_x = train_x
train_y = train_y

### Model

class GPRegressionModel(gpytorch.models.ExactGP):
    def __init__(self, train_x, train_y, likelihood):
        super(GPRegressionModel, self).__init__(train_x, train_y, likelihood)
        num_dims = train_x.size(-1)
        self.mean_module = gpytorch.means.ConstantMean()
        self.covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel())

    def forward(self, x):
        mean_x = self.mean_module(x)
        covar_x = self.covar_module(x)
        return gpytorch.distributions.MultivariateNormal(mean_x, covar_x)

likelihood = gpytorch.likelihoods.GaussianLikelihood()
model      = GPRegressionModel(train_x, train_y, likelihood)

print('Train GPR model without using GRID kernal')
train_GPR(model, likelihood, train_x, train_y, training_iter = 10, chol_flag = True)

Output by disabling fast computations (`chol_flag = True` while calling `train_GPR`)

Train GPR model using **Grid kernal** 
Iter 1/10 - step time: 1.335075 s
Iter 2/10 - step time: 0.775082 s
Iter 3/10 - step time: 0.743960 s
Iter 4/10 - step time: 0.787821 s
Iter 5/10 - step time: 0.789339 s
Iter 6/10 - step time: 0.783592 s
Iter 7/10 - step time: 0.755363 s
Iter 8/10 - step time: 0.752616 s
Iter 9/10 - step time: 0.762754 s
Iter 10/10 - step time: 0.766521 s

Train GPR model **without using GRID kernal**
Iter 1/10 - step time: 0.290277 s
Iter 2/10 - step time: 0.278644 s
Iter 3/10 - step time: 0.275472 s
Iter 4/10 - step time: 0.295164 s
Iter 5/10 - step time: 0.275632 s
Iter 6/10 - step time: 0.303502 s
Iter 7/10 - step time: 0.278693 s
Iter 8/10 - step time: 0.294948 s
Iter 9/10 - step time: 0.276134 s
Iter 10/10 - step time: 0.299402 s

Output enabling fast computations (`chol_flag = False` while calling `train_GPR` )

Train GPR model using **Grid kernal** 
Iter 1/10 - step time: 0.627863 s
Iter 2/10 - step time: 0.024604 s
Iter 3/10 - step time: 0.024323 s
Iter 4/10 - step time: 0.024040 s
Iter 5/10 - step time: 0.024330 s
Iter 6/10 - step time: 0.023434 s
Iter 7/10 - step time: 0.023406 s
Iter 8/10 - step time: 0.023706 s
Iter 9/10 - step time: 0.023484 s
Iter 10/10 - step time: 0.023516 s

Train GPR model **without using GRID kernal**
Iter 1/10 - step time: 0.086213 s
Iter 2/10 - step time: 0.074629 s
Iter 3/10 - step time: 0.073672 s
Iter 4/10 - step time: 0.070543 s
Iter 5/10 - step time: 0.072324 s
Iter 6/10 - step time: 0.070381 s
Iter 7/10 - step time: 0.073170 s
Iter 8/10 - step time: 0.073682 s
Iter 9/10 - step time: 0.072639 s
Iter 10/10 - step time: 0.072873 s

Issue Analytics

State:
Created 2 years ago
Comments:7 (2 by maintainers)

Top GitHub Comments

3reactions

jacobrgardnercommented, Aug 25, 2021

I wonder if a solution here might be to refactor lazy tensors (or linear operators) to have _iterative_solve and _direct_solve so that it’s more obvious and intuitive in all situations exactly what is happening? Then “fast computations” (which feels a bit preachy anyways) should be refactored to be a setting that represents what it actually is: should we do solves using an iterative method or a direct method? If a direct method is chosen, we’ll still always do it the best way we can.

Right now, I feel like there are a lot of gotchas, and even different functions have different behaviors under different settings (e.g., inv_matmul could currently be slow even when inv_quad_logdet is fast).

1reaction

jacobrgardnercommented, Aug 25, 2021

Kind of related, but note that currently some functions will run slow with fast_computations off. In particular since KronckerProductAddedDiagLazyTensor doesn’t override _cholesky or inherit from a lazy tensor that does like KroneckerProductLazyTensor, this’ll cause problems with InvMatmul which currently just explicitly calls lazy_tsr.cholesky and doesn’t care that you’ve overridden root_decomposition:

https://github.com/cornellius-gp/gpytorch/blob/7648de148691635d634f1179cc80e7311b1d1864/gpytorch/functions/_inv_matmul.py#L16-L17

So we’d get really slow behavior with fast_computations off when we go to compute predictions either way I think.

Top Results From Across the Web

Why does my kernel take too long occasionally?

I measured the separate kernels and the behavior is somewhat random. Sometimes both kernels take a long time and sometimes it's the second ......

Virtual Machines for Grid Computing - cs.wisc.edu - University of ...

Condor calls such applications “Vanilla jobs”, and. does it's best to run them. With kernel-level checkpointing,. Condor could put these applications on ......

Why threads implemented in kernel space are slow?

Yes. But the call only involves adding an entry to the thread table with certain attributes (which is also the case in user...

06 - Dynamic Parallelism.key - Uni Bremen

Flow control and kernel scheduling can be performed on the GPU. • I.e., task parallelism on the GPU (a grid = execution of...

Working with the real-time kernel for Red Hat Enterprise Linux

The real-time kernel capability has existed for more than a decade in the ... MRG," where MRG stood for Messaging, Real-time, and Grid....