[Bug] Inconsistent behavior between running with CPU and GPU tensors
See original GitHub issueš Bug
Iāve been trying to learn how to use GPyTorch by playing around with the IndependentModelList
example code on a toy example. Iāve noticed some odd behavior and crashes when attempting to use CUDA acceleration. In the program below, I have 2 options to play with: n_train
and cuda
.
When I leave CUDA off and n_train = 500
, the program runs fine and I get a nice fit. The loss constantly goes down as well.
When I turn on CUDA with n_train = 500
, the program crashes with the error RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)
When I turn on CUDA with n_train = 250
, the model trains but the fit is noticable worse and the loss increases after several training iterations (with no changes to the optimizer at all).
Iām not sure whatās going on, I can use CPU-only but this behavior with GPU acceleration is unexpected. Monitoring my GPU memory usage with nvidia-smi
, I only see about 10% VRAM usage and 5% GPU utilization.
To reproduce
** Code snippet to reproduce **
import gpytorch
import matplotlib.pyplot as plt
import numpy as np
import torch
cuda = True
n_train = 500
n_test = 100
class ExactGPModel(gpytorch.models.ExactGP):
def __init__(self, train_x, train_y, likelihood):
super().__init__(train_x, train_y, likelihood)
self.mean_module = gpytorch.means.ConstantMean()
self.covar_module = gpytorch.kernels.ScaleKernel(
gpytorch.kernels.RBFKernel(ard_num_dims=train_x.shape[-1],
lengthscale_constraint=gpytorch.constraints.GreaterThan(1e-4))
)
def forward(self, x):
mean_x = self.mean_module(x)
covar_x = self.covar_module(x)
return gpytorch.distributions.MultivariateNormal(mean_x, covar_x)
def f(x):
return np.stack([
2 * x[:, 0] - 3 * x[:, 1],
2 * np.sin(x[:, 0]) - 3 * np.cos(x[:, 1])
], axis=-1)
# Set up training data
train_x = np.append(np.random.uniform(low=2.0, high=4.0, size=[n_train // 2]),
np.random.uniform(low=5.0, high=7.0, size=[n_train // 2]))
train_x = np.stack([train_x, 2 * np.ones(train_x.shape)], axis=-1)
train_y = f(train_x) + np.stack([2.5e-1 * train_x[:, 0] * np.random.randn(len(train_x[:, 0])),
2.5e-1 * train_x[:, 0] * np.random.randn(len(train_x[:, 0]))], axis=-1)
train_x = torch.from_numpy(train_x).float()
train_y = torch.from_numpy(train_y).float()
# Set up testing data
test_x = np.stack([np.linspace(0, 10, n_test), 2 * np.ones(n_test)], axis=-1)
test_y = f(test_x)
test_x = torch.from_numpy(test_x).float()
test_y = torch.from_numpy(test_y).float()
if cuda:
train_x = train_x.to("cuda")
train_y = train_y.to("cuda")
test_x = test_x.to("cuda")
models, likelihoods = [], []
for dim in range(train_y.shape[-1]):
# Create model for each output dimension
likelihood = gpytorch.likelihoods.GaussianLikelihood()
model = ExactGPModel(train_x, train_y[:, dim], likelihood)
models.append(model)
likelihoods.append(likelihood)
model = gpytorch.models.IndependentModelList(*models)
likelihood = gpytorch.likelihoods.LikelihoodList(*likelihoods)
if cuda:
model.cuda()
likelihood.cuda()
model.train()
likelihood.train()
opt = torch.optim.Adam(model.parameters(), lr=0.1)
mll = gpytorch.mlls.SumMarginalLogLikelihood(likelihood, model)
training_iterations = 50
for i in range(training_iterations):
opt.zero_grad()
output = model(*model.train_inputs)
# Calc loss and backprop gradients
loss = -mll(output, model.train_targets)
loss.backward()
print('Iter %d/%d - Loss: %.3f' % (i + 1, training_iterations, loss.item()))
opt.step()
model.eval()
with torch.no_grad(), gpytorch.settings.fast_pred_var():
predictions = likelihood(*model(test_x, test_x))
pred_y, pred_ub, pred_lb = [], [], []
for submodel, prediction in zip(model.models, predictions):
mean = prediction.mean
lower, upper = prediction.confidence_region()
if cuda:
mean = mean.cpu()
lower = lower.cpu()
upper = upper.cpu()
pred_y.append(mean.numpy())
pred_ub.append(upper.numpy())
pred_lb.append(lower.numpy())
pred_y = np.array(pred_y).T
pred_ub = np.array(pred_ub).T
pred_lb = np.array(pred_lb).T
train_x = train_x.cpu().numpy()
train_y = train_y.cpu().numpy()
test_x = test_x.cpu().numpy()
test_y = test_y.cpu().numpy()
fig, axs = plt.subplots(1, 2)
axs[0].scatter(train_x[:, 0], train_y[:, 0], marker="*", s=4, c="r")
axs[0].plot(test_x[:, 0], test_y[:, 0], c="k", label="ground truth")
axs[0].plot(test_x[:, 0], pred_y[:, 0], color="tab:blue", label="pred. mean")
axs[0].fill_between(test_x[:, 0], pred_lb[:, 0], pred_ub[:, 0], alpha=0.4, color="tab:blue", label="pred. 2$\sigma$ bound")
axs[0].legend()
axs[1].scatter(train_x[:, 0], train_y[:, 1], marker="*", s=4, c="r")
axs[1].plot(test_x[:, 0], test_y[:, 1], c="k", label="ground truth")
axs[1].plot(test_x[:, 0], pred_y[:, 1], color="tab:blue", label="pred. mean")
axs[1].fill_between(test_x[:, 0], pred_lb[:, 1], pred_ub[:, 1], alpha=0.4, color="tab:blue", label="pred. 2$\sigma$ bound")
axs[1].legend()
plt.show()
** Stack trace/error message **
Traceback (most recent call last):
File "scripts/test_gp.py", line 78, in <module>
loss.backward()
File "/home/archie/trail/hybrid-mbrl/venv/lib/python3.8/site-packages/torch/tensor.py", line 245, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/home/archie/trail/hybrid-mbrl/venv/lib/python3.8/site-packages/torch/autograd/__init__.py", line 145, in backward
Variable._execution_engine.run_backward(
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)
Expected Behavior
Model training behavior should not change between CPU and GPU. Model fit on same amount of data should be similar between CPU and GPU.
System information
Please complete the following information:
- GPyTorch Version 1.4.2
- PyTorch Version 1.8.1+cu111
- Ubuntu 20.04
- Nvidia driver version 465.27
- GPU: Nvidia Titan Xp
Additional context
Add any other context about the problem here.
Issue Analytics
- State:
- Created 2 years ago
- Comments:5
Top GitHub Comments
I downgraded to PyTorch with CUDA 10.2 (
1.8.1+cu102
), which seems to have fixed things. Not sure if itās an issue with the new CUDA version.I think Iām going to close this issue, seems like things are fine with the different CUDA version and itās not clear whether this is GPyTorchās fault.