question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. ItĀ collects links to all the places you might be looking at while hunting down a tough bug.

And, if youā€™re still stuck at the end, weā€™re happy to hop on a call to see how we can help out.

[Bug] Inconsistent behavior between running with CPU and GPU tensors

See original GitHub issue

šŸ› Bug

Iā€™ve been trying to learn how to use GPyTorch by playing around with the IndependentModelList example code on a toy example. Iā€™ve noticed some odd behavior and crashes when attempting to use CUDA acceleration. In the program below, I have 2 options to play with: n_train and cuda.

When I leave CUDA off and n_train = 500, the program runs fine and I get a nice fit. The loss constantly goes down as well.

When I turn on CUDA with n_train = 500, the program crashes with the error RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)

When I turn on CUDA with n_train = 250, the model trains but the fit is noticable worse and the loss increases after several training iterations (with no changes to the optimizer at all).

Iā€™m not sure whatā€™s going on, I can use CPU-only but this behavior with GPU acceleration is unexpected. Monitoring my GPU memory usage with nvidia-smi, I only see about 10% VRAM usage and 5% GPU utilization.

To reproduce

** Code snippet to reproduce **

import gpytorch
import matplotlib.pyplot as plt
import numpy as np
import torch

cuda = True
n_train = 500
n_test = 100

class ExactGPModel(gpytorch.models.ExactGP):
    def __init__(self, train_x, train_y, likelihood):
        super().__init__(train_x, train_y, likelihood)
        self.mean_module = gpytorch.means.ConstantMean()
        self.covar_module = gpytorch.kernels.ScaleKernel(
            gpytorch.kernels.RBFKernel(ard_num_dims=train_x.shape[-1],
                                       lengthscale_constraint=gpytorch.constraints.GreaterThan(1e-4))
        )

    def forward(self, x):
        mean_x = self.mean_module(x)
        covar_x = self.covar_module(x)
        return gpytorch.distributions.MultivariateNormal(mean_x, covar_x)

def f(x):
    return np.stack([
        2 * x[:, 0] - 3 * x[:, 1],
        2 * np.sin(x[:, 0]) - 3 * np.cos(x[:, 1])
    ], axis=-1)

# Set up training data
train_x = np.append(np.random.uniform(low=2.0, high=4.0, size=[n_train // 2]),
                    np.random.uniform(low=5.0, high=7.0, size=[n_train // 2]))
train_x = np.stack([train_x, 2 * np.ones(train_x.shape)], axis=-1)
train_y = f(train_x) + np.stack([2.5e-1 * train_x[:, 0] * np.random.randn(len(train_x[:, 0])),
                                 2.5e-1 * train_x[:, 0] * np.random.randn(len(train_x[:, 0]))], axis=-1)
train_x = torch.from_numpy(train_x).float()
train_y = torch.from_numpy(train_y).float()

# Set up testing data
test_x = np.stack([np.linspace(0, 10, n_test), 2 * np.ones(n_test)], axis=-1)
test_y = f(test_x)
test_x = torch.from_numpy(test_x).float()
test_y = torch.from_numpy(test_y).float()

if cuda:
    train_x = train_x.to("cuda")
    train_y = train_y.to("cuda")
    test_x = test_x.to("cuda")

models, likelihoods = [], []
for dim in range(train_y.shape[-1]):
    # Create model for each output dimension
    likelihood = gpytorch.likelihoods.GaussianLikelihood()
    model = ExactGPModel(train_x, train_y[:, dim], likelihood)
    models.append(model)
    likelihoods.append(likelihood)
model = gpytorch.models.IndependentModelList(*models)
likelihood = gpytorch.likelihoods.LikelihoodList(*likelihoods)

if cuda:
    model.cuda()
    likelihood.cuda()

model.train()
likelihood.train()

opt = torch.optim.Adam(model.parameters(), lr=0.1)
mll = gpytorch.mlls.SumMarginalLogLikelihood(likelihood, model)

training_iterations = 50
for i in range(training_iterations):
    opt.zero_grad()
    output = model(*model.train_inputs)
    # Calc loss and backprop gradients
    loss = -mll(output, model.train_targets)
    loss.backward()
    print('Iter %d/%d - Loss: %.3f' % (i + 1, training_iterations, loss.item()))
    opt.step()

model.eval()

with torch.no_grad(), gpytorch.settings.fast_pred_var():
    predictions = likelihood(*model(test_x, test_x))

    pred_y, pred_ub, pred_lb = [], [], []
    for submodel, prediction in zip(model.models, predictions):
        mean = prediction.mean
        lower, upper = prediction.confidence_region()

        if cuda:
            mean = mean.cpu()
            lower = lower.cpu()
            upper = upper.cpu()

        pred_y.append(mean.numpy())
        pred_ub.append(upper.numpy())
        pred_lb.append(lower.numpy())
    pred_y = np.array(pred_y).T
    pred_ub = np.array(pred_ub).T
    pred_lb = np.array(pred_lb).T

train_x = train_x.cpu().numpy()
train_y = train_y.cpu().numpy()
test_x = test_x.cpu().numpy()
test_y = test_y.cpu().numpy()

fig, axs = plt.subplots(1, 2)
axs[0].scatter(train_x[:, 0], train_y[:, 0], marker="*", s=4, c="r")
axs[0].plot(test_x[:, 0], test_y[:, 0], c="k", label="ground truth")
axs[0].plot(test_x[:, 0], pred_y[:, 0], color="tab:blue", label="pred. mean")
axs[0].fill_between(test_x[:, 0], pred_lb[:, 0], pred_ub[:, 0], alpha=0.4, color="tab:blue", label="pred. 2$\sigma$ bound")
axs[0].legend()
axs[1].scatter(train_x[:, 0], train_y[:, 1], marker="*", s=4, c="r")
axs[1].plot(test_x[:, 0], test_y[:, 1], c="k", label="ground truth")
axs[1].plot(test_x[:, 0], pred_y[:, 1], color="tab:blue", label="pred. mean")
axs[1].fill_between(test_x[:, 0], pred_lb[:, 1], pred_ub[:, 1], alpha=0.4, color="tab:blue", label="pred. 2$\sigma$ bound")
axs[1].legend()
plt.show()

** Stack trace/error message **

Traceback (most recent call last):
  File "scripts/test_gp.py", line 78, in <module>
    loss.backward()
  File "/home/archie/trail/hybrid-mbrl/venv/lib/python3.8/site-packages/torch/tensor.py", line 245, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/home/archie/trail/hybrid-mbrl/venv/lib/python3.8/site-packages/torch/autograd/__init__.py", line 145, in backward
    Variable._execution_engine.run_backward(
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)

Expected Behavior

Model training behavior should not change between CPU and GPU. Model fit on same amount of data should be similar between CPU and GPU.

System information

Please complete the following information:

  • GPyTorch Version 1.4.2
  • PyTorch Version 1.8.1+cu111
  • Ubuntu 20.04
  • Nvidia driver version 465.27
  • GPU: Nvidia Titan Xp

Additional context

Add any other context about the problem here.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:5

github_iconTop GitHub Comments

1reaction
archieleecommented, Jun 14, 2021

I downgraded to PyTorch with CUDA 10.2 (1.8.1+cu102), which seems to have fixed things. Not sure if itā€™s an issue with the new CUDA version.

0reactions
archieleecommented, Jun 29, 2021

I think Iā€™m going to close this issue, seems like things are fine with the different CUDA version and itā€™s not clear whether this is GPyTorchā€™s fault.

Read more comments on GitHub >

github_iconTop Results From Across the Web

torch.max inconsistent on cpu/gpu for tensors with equal ...
Bug torch.max seems to be acting differently on CPU/GPU for tensors with equal elements, for example, with all 0 elements. torch.max on CPU...
Read more >
Tensor view not consistent between GPU and CPU
I created a minimal working example of the bug/issue that I have that can be run easily. import torch device = "cuda:0" class...
Read more >
TensorRT 8.4.1 Release Notes - NVIDIA Documentation Center
TensorRT now supports compute capability 9.0 deep learning kernels for FP32, TF32, FP16, and INT8, using the H100 Tensor Cores and deliveringĀ ...
Read more >
CUDA Error: Device-Side Assert Triggered: Solved | Built In
Inconsistency between the number of labels/classes and the number of output units; The input of the loss function may be incorrect.
Read more >
PyTorch 1.5.1 Bug Fix Release | Exxact Blog
Highlights of this bug fix release: important fixes for torch.multinomial, nn.Conv2d, cuda asserts and fixes performance / memoryĀ ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found