Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Bug] in version 0.3.6 , the kissGP example cannot be put on CUDA

See original GitHub issue

🐛 Bug

Hi, I directly installed gpytorch alpha version (compatible with my pytorch version, see below), and started with this tutorial for kissGP

However upon running the code on GPU (w/ only 1 gpu, i’m on a laptop) one encounters this

To reproduce

Code snippet to reproduce

ll = gpt.likelihoods.GaussianLikelihood().to('cuda:0')
m = GPRegressionModel(x_tr, y_tr, ll)
m = m.to('cuda:0')

# Find optimal model hyperparameters
m.train()
ll.train()

# Use the adam optimizer
opt = th.optim.Adam(
                        [{'params': m.parameters()},],
                        lr=0.1)
# Includes GaussianLikelihood parameters
# "Loss" for GPs - the marginal log likelihood
mll = gpt.mlls.ExactMarginalLogLikelihood(ll, m)

training_iterations = 30
for i in range(training_iterations):
    opt.zero_grad()
    output = m(x_tr)
    print(x_tr.device, output, y_tr.device)
    loss = -mll(output, y_tr)
    loss.backward()
    print('Iter %d/%d - Loss: %.3f' % (i + 1, training_iterations, loss.item()))
    opt.step()

Stack trace/error message

~/SJTU/research_code/TCEP/GP_scoring/gpytorch_local/gpytorch/kernels/rbf_kernel.py in forward(self, x1, x2, diag, **params)
     80             x2,
     81             self.lengthscale,
---> 82             lambda x1, x2: self.covar_dist(
     83                 x1, x2, square_dist=True, diag=False, dist_postprocess_func=postprocess_rbf, postprocess=False, **params
     84             ),

~/SJTU/research_code/TCEP/GP_scoring/gpytorch_local/gpytorch/functions/rbf_covariance.py in forward(ctx, x1, x2, lengthscale, sq_dist_func)
     10             raise ValueError("RBFCovariance cannot handle multiple lengthscales")
     11         needs_grad = any(ctx.needs_input_grad)
---> 12         x1_ = x1.div(lengthscale)
     13         x2_ = x2.div(lengthscale)
     14         unitless_sq_dist = sq_dist_func(x1_, x2_)

RuntimeError: expected device cpu and dtype Float but got device cuda:0 and dtype Float

Expected Behavior

I expected all the model parameters to be on GPU, however the basic model you give is

class GPRegressionModel(gpt.models.ExactGP):
    def __init__(self, train_x, train_y, likelihood):
        super(GPRegressionModel, self).__init__(train_x, train_y, likelihood)
        
        # SKI requires a grid size hyperparameter.
        #This util can help with that. Here we are using a grid
        #that has the same number of points as the training data
        #(a ratio of 1.0).
        #Performance can be sensitive to this parameter,
        #so you may want to adjust it for your own problem
        #on a validation set.
        grid_size = gpt.utils.grid.choose_grid_size(train_x,1.0)
        
        self.mean_module = gpt.means.ConstantMean()
        self.covar_module = gpt.kernels.GridInterpolationKernel(
            gpt.kernels.ScaleKernel(gpt.kernels.RBFKernel()),
            grid_size=grid_size, num_dims=1,
        )

    def forward(self, x):
        mean_x = self.mean_module(x)
        covar_x = self.covar_module(x)
        return gpt.distributions.MultivariateNormal(mean_x, covar_x)

which returns a gpt.distributions.MultivariateNormal which cannot be put on GPU. This is solved when I clone the code and modify as follows

class RBFCovariance(torch.autograd.Function):
    @staticmethod
    def forward(ctx, x1, x2, lengthscale, sq_dist_func):
        if any(ctx.needs_input_grad[:2]):
            raise RuntimeError("RBFCovariance cannot compute gradients with " "respect to x1 and x2")
        if lengthscale.size(-1) > 1:
            raise ValueError("RBFCovariance cannot handle multiple lengthscales")
        needs_grad = any(ctx.needs_input_grad)
        x1_ = x1.to('cuda:0').div(lengthscale)
        x2_ = x2.to('cuda:0').div(lengthscale)
        unitless_sq_dist = sq_dist_func(x1_, x2_)
        # clone because inplace operations will mess with what's saved for backward
        unitless_sq_dist_ = unitless_sq_dist.clone() if needs_grad else unitless_sq_dist
        covar_mat = unitless_sq_dist_.div_(-2.0).exp_()
        if needs_grad:
            d_output_d_input = unitless_sq_dist.mul_(covar_mat).div_(lengthscale)
            ctx.save_for_backward(d_output_d_input)
        return covar_mat

BUT then the interpolate method is not put on GPU either

~/SJTU/research_code/TCEP/GP_scoring/gpytorch_local/gpytorch/utils/interpolation.py in interpolate(self, x_grid, x_target, interp_points, eps)
    112 
    113             # get the interp. coeff. based on distances to interpolating points
--> 114             scaled_dist = lower_pt_rel_dists.unsqueeze(-1) + interp_points_flip.unsqueeze(-2)
    115             dim_interp_values = self._cubic_interpolation_kernel(scaled_dist)
    116 

RuntimeError: expected device cuda:0 and dtype Float but got device cpu and dtype Float

System information

Please complete the following information:

0.3.6
1.2.0
Ubuntu 18.04

Additional context

Issue Analytics

State:
Created 4 years ago
Comments:10 (2 by maintainers)

Top GitHub Comments

1reaction

KeAWangcommented, Dec 2, 2019

Actually, I’ve seen a similar bug myself as well. It might have been from when I redid the GridKernel and GridInterpolationKernel. I’ll look into this

0reactions

KeAWangcommented, Dec 5, 2019

Use this branch https://github.com/cornellius-gp/gpytorch/pull/983 for now which fixes the issue. It’ll be merged in soon

Top Results From Across the Web

CUDA C++ Best Practices Guide

CUDA C++ Best Practices Guide. The programming guide to using the CUDA Toolkit to obtain the best performance from NVIDIA GPUs.

Changelog — PyTorch Lightning 1.8.6 documentation

Trainer queries the CUDA devices through NVML if available to avoid initializing CUDA before forking, which eliminates the need for the PL_DISABLE_FORK ...

Latest Updates | CryoSPARC

Fixed bug where running "Clear Intermediate Results" on a project caused the outputs of Curate Exposures jobs within the project to be inadvertently...

Characterizing and Detecting CUDA Program Bugs - arXiv

The design of Memory Model can be used in Simulee to detect CUDA synchronization bugs, i.e., data race, redundant barrier function, and barrier...

UNIFIED MEMORY ON P100 - OLCF

Starting with Kepler and CUDA 6. 1/11/2017 ... Code example explained. Pages allocated before they are used – cannot oversubscribe GPU.