Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Bug] cublas RuntimeError in fantasy update test on CUDA

See original GitHub issue

🐛 Bug

Running test.examples.test_simple_gp_regression.TestSimpleGPRegression.test_fantasy_updates routinely results in the following cublas error: RuntimeError: cublas runtime error : an invalid numeric value was used as an argument

This only happens for the cuda test, the cpu test runs fine. Also, anecdotally, I haven’t seen this happen on all runs / all types of machines, but it happening pretty consistently.

To reproduce

Run the test on a cuda machine.

** Stack trace/error message **

> test_fantasy_updates_cuda (test.examples.test_simple_gp_regression.TestSimpleGPRegression) ... ERROR
>
> ======================================================================
> ERROR: test_fantasy_updates_cuda (test.examples.test_simple_gp_regression.TestSimpleGPRegression)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>   File "/data/users/balandat/fbsource/fbcode/buck-out/dev/gen/pytorch/gpytorch/test_gpytorch_examples#binary,link-tree/test/examples/test_simple_gp_regression.py", line 265, in test_fantasy_updates_cuda
>     self.test_fantasy_updates(cuda=True)
>   File "/data/users/balandat/fbsource/fbcode/buck-out/dev/gen/pytorch/gpytorch/test_gpytorch_examples#binary,link-tree/test/examples/test_simple_gp_regression.py", line 308, in test_fantasy_updates
>     test_function_predictions.mean.sum().backward()
>   File "/data/users/balandat/fbsource/fbcode/buck-out/dev/gen/pytorch/gpytorch/test_gpytorch_examples#binary,link-tree/torch/tensor.py", line 118, in backward
>     torch.autograd.backward(self, gradient, retain_graph, create_graph)
>   File "/data/users/balandat/fbsource/fbcode/buck-out/dev/gen/pytorch/gpytorch/test_gpytorch_examples#binary,link-tree/torch/autograd/__init__.py", line 93, in backward
>     allow_unreachable=True)  # allow_unreachable flag
> RuntimeError: cublas runtime error : an invalid numeric value was used as an argument at caffe2/aten/src/THC/THCBlas.cu:120
>
>
> ActivityProfiler - start thread
>  ** On entry to SGER   parameter number 7 had an illegal value