Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Bug] Inconsistent behavior between running with CPU and GPU tensors

See original GitHub issue

🐛 Bug

I’ve been trying to learn how to use GPyTorch by playing around with the IndependentModelList example code on a toy example. I’ve noticed some odd behavior and crashes when attempting to use CUDA acceleration. In the program below, I have 2 options to play with: n_train and cuda.

When I leave CUDA off and n_train = 500, the program runs fine and I get a nice fit. The loss constantly goes down as well.

When I turn on CUDA with n_train = 500, the program crashes with the error RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)

When I turn on CUDA with n_train = 250, the model trains but the fit is noticable worse and the loss increases after several training iterations (with no changes to the optimizer at all).

I’m not sure what’s going on, I can use CPU-only but this behavior with GPU acceleration is unexpected. Monitoring my GPU memory usage with nvidia-smi, I only see about 10% VRAM usage and 5% GPU utilization.

To reproduce

** Code snippet to reproduce **

import gpytorch
import matplotlib.pyplot as plt
import numpy as np
import torch

cuda = True
n_train = 500
n_test = 100

class ExactGPModel(gpytorch.models.ExactGP):
    def __init__(self, train_x, train_y, likelihood):
        super().__init__(train_x, train_y, likelihood)
        self.mean_module = gpytorch.means.ConstantMean()
        self.covar_module = gpytorch.kernels.ScaleKernel(
            gpytorch.kernels.RBFKernel(ard_num_dims=train_x.shape[-1],
                                       lengthscale_constraint=gpytorch.constraints.GreaterThan(1e-4))
        )

    def forward(self, x):
        mean_x = self.mean_module(x)
        covar_x = self.covar_module(x)
        return gpytorch.distributions.MultivariateNormal(mean_x, covar_x)

def f(x):
    return np.stack([
        2 * x[:, 0] - 3 * x[:, 1],
        2 * np.sin(x[:, 0]) - 3 * np.cos(x[:, 1])
    ], axis=-1)

# Set up training data
train_x = np.append(np.random.uniform(low=2.0, high=4.0, size=[n_train // 2]),
                    np.random.uniform(low=5.0, high=7.0, size=[n_train // 2]))
train_x = np.stack([train_x, 2 * np.ones(train_x.shape)], axis=-1)
train_y = f(train_x) + np.stack([2.5e-1 * train_x[:, 0] * np.random.randn(len(train_x[:, 0])),
                                 2.5e-1 * train_x[:, 0] * np.random.randn(len(train_x[:, 0]))], axis=-1)
train_x = torch.from_numpy(train_x).float()
train_y = torch.from_numpy(train_y).float()

# Set up testing data
test_x = np.stack([np.linspace(0, 10, n_test), 2 * np.ones(n_test)], axis=-1)
test_y = f(test_x)
test_x = torch.from_numpy(test_x).float()
test_y = torch.from_numpy(test_y).float()

if cuda:
    train_x = train_x.to("cuda")
    train_y = train_y.to("cuda")
    test_x = test_x.to("cuda")

models, likelihoods = [], []
for dim in range(train_y.shape[-1]):
    # Create model for each output dimension
    likelihood = gpytorch.likelihoods.GaussianLikelihood()
    model = ExactGPModel(train_x, train_y[:, dim], likelihood)
    models.append(model)
    likelihoods.append(likelihood)
model = gpytorch.models.IndependentModelList(*models)
likelihood = gpytorch.likelihoods.LikelihoodList(*likelihoods)

if cuda:
    model.cuda()
    likelihood.cuda()

model.train()
likelihood.train()

opt = torch.optim.Adam(model.parameters(), lr=0.1)
mll = gpytorch.mlls.SumMarginalLogLikelihood(likelihood, model)

training_iterations = 50
for i in range(training_iterations):
    opt.zero_grad()
    output = model(*model.train_inputs)
    # Calc loss and backprop gradients
    loss = -mll(output, model.train_targets)
    loss.backward()
    print('Iter %d/%d - Loss: %.3f' % (i + 1, training_iterations, loss.item()))
    opt.step()

model.eval()

with torch.no_grad(), gpytorch.settings.fast_pred_var():
    predictions = likelihood(*model(test_x, test_x))

    pred_y, pred_ub, pred_lb = [], [], []
    for submodel, prediction in zip(model.models, predictions):
        mean = prediction.mean
        lower, upper = prediction.confidence_region()

        if cuda:
            mean = mean.cpu()
            lower = lower.cpu()
            upper = upper.cpu()

        pred_y.append(mean.numpy())
        pred_ub.append(upper.numpy())
        pred_lb.append(lower.numpy())
    pred_y = np.array(pred_y).T
    pred_ub = np.array(pred_ub).T
    pred_lb = np.array(pred_lb).T

train_x = train_x.cpu().numpy()
train_y = train_y.cpu().numpy()
test_x = test_x.cpu().numpy()
test_y = test_y.cpu().numpy()

fig, axs = plt.subplots(1, 2)
axs[0].scatter(train_x[:, 0], train_y[:, 0], marker="*", s=4, c="r")
axs[0].plot(test_x[:, 0], test_y[:, 0], c="k", label="ground truth")
axs[0].plot(test_x[:, 0], pred_y[:, 0], color="tab:blue", label="pred. mean")
axs[0].fill_between(test_x[:, 0], pred_lb[:, 0], pred_ub[:, 0], alpha=0.4, color="tab:blue", label="pred. 2$\sigma$ bound")
axs[0].legend()
axs[1].scatter(train_x[:, 0], train_y[:, 1], marker="*", s=4, c="r")
axs[1].plot(test_x[:, 0], test_y[:, 1], c="k", label="ground truth")
axs[1].plot(test_x[:, 0], pred_y[:, 1], color="tab:blue", label="pred. mean")
axs[1].fill_between(test_x[:, 0], pred_lb[:, 1], pred_ub[:, 1], alpha=0.4, color="tab:blue", label="pred. 2$\sigma$ bound")
axs[1].legend()
plt.show()

** Stack trace/error message **

Traceback (most recent call last):
  File "scripts/test_gp.py", line 78, in <module>
    loss.backward()
  File "/home/archie/trail/hybrid-mbrl/venv/lib/python3.8/site-packages/torch/tensor.py", line 245, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/home/archie/trail/hybrid-mbrl/venv/lib/python3.8/site-packages/torch/autograd/__init__.py", line 145, in backward
    Variable._execution_engine.run_backward(
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)

Expected Behavior

Model training behavior should not change between CPU and GPU. Model fit on same amount of data should be similar between CPU and GPU.

System information

Please complete the following information:

GPyTorch Version 1.4.2
PyTorch Version 1.8.1+cu111
Ubuntu 20.04
Nvidia driver version 465.27
GPU: Nvidia Titan Xp

Additional context

Add any other context about the problem here.

Issue Analytics

State:
Created 2 years ago
Comments:5

Top GitHub Comments

1reaction

archieleecommented, Jun 14, 2021

I downgraded to PyTorch with CUDA 10.2 (1.8.1+cu102), which seems to have fixed things. Not sure if it’s an issue with the new CUDA version.

0reactions

archieleecommented, Jun 29, 2021

I think I’m going to close this issue, seems like things are fine with the different CUDA version and it’s not clear whether this is GPyTorch’s fault.