question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

cuda tests fail when CUDA is available but not configured

See original GitHub issue

I’m testing the build of the new release 3.1.1.

All tests accessing cuda are failing. This is not entirely surprising in itself. My system has nvidia drivers available and has a switchable nvidia card accessible via bumblebee (primusrun). But I have not specifically configured my system to execute CUDA. So it’s not surprising that CUDA_ERROR_NO_DEVICE is found. For me the nvidia card that I have at hand is for experimentation, not for routine operation. The main video card is intel.

What’s the best way to handle this situation? How can a non-CUDA build be enforced when CUDA is otherwise “available”.

An example test log is:

ERROR: testAllgather (test_cco_buf.TestCCOBufInplaceSelf)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/projects/python/build/mpi4py/test/test_cco_buf.py", line 382, in testAllgather
    buf = array(-1, typecode, (size, count))
  File "/projects/python/build/mpi4py/test/arrayimpl.py", line 459, in __init__
    self.array = numba.cuda.device_array(shape, typecode)
  File "/usr/lib/python3/dist-packages/numba/cuda/cudadrv/devices.py", line 223, in _require_cuda_context
    with _runtime.ensure_context():
  File "/usr/lib/python3.9/contextlib.py", line 117, in __enter__
    return next(self.gen)
  File "/usr/lib/python3/dist-packages/numba/cuda/cudadrv/devices.py", line 121, in ensure_context
    with driver.get_active_context():
  File "/usr/lib/python3/dist-packages/numba/cuda/cudadrv/driver.py", line 393, in __enter__
    driver.cuCtxGetCurrent(byref(hctx))
  File "/usr/lib/python3/dist-packages/numba/cuda/cudadrv/driver.py", line 280, in __getattr__
    self.initialize()
  File "/usr/lib/python3/dist-packages/numba/cuda/cudadrv/driver.py", line 240, in initialize
    raise CudaSupportError("Error at driver init: \n%s:" % e)
numba.cuda.cudadrv.error.CudaSupportError: Error at driver init:
[100] Call to cuInit results in CUDA_ERROR_NO_DEVICE:
-------------------- >> begin captured logging << --------------------
numba.cuda.cudadrv.driver: INFO: init
numba.cuda.cudadrv.driver: DEBUG: call driver api: cuInit
numba.cuda.cudadrv.driver: ERROR: Call to cuInit results in CUDA_ERROR_NO_DEVICE
--------------------- >> end captured logging << ---------------------

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:43 (43 by maintainers)

github_iconTop GitHub Comments

1reaction
drew-parsonscommented, Aug 19, 2021

It seems spawn trouble has been a long-running saga! I’ll deactivate them for now and check again later with future OpenMPI releases.

1reaction
dalcinlcommented, Aug 15, 2021

@drew-parsons I guess you are using Open MPI, right? Dynamic process management has always been semi-broken. I would suggest to just disable these tests if they are giving trouble of behave erratically. Hopefully, things will be much better in upcoming release Open MPI 5.x, are mpi4py tests are passing .

Read more comments on GitHub >

github_iconTop Results From Across the Web

Failed in test CUDA v3.8.0 - GPU - Julia Discourse
This generally means an issue with your driver, and not CUDA.jl. Unless of course you say it didn't occur with CUDA.jl 3.5, but...
Read more >
Pytorch says that CUDA is not available (on Ubuntu)
The initial message I got was that the GPU is currently in use by another application. But when I looked at nvidia-smi ,...
Read more >
run 'configure' on a machine that has the CUDA compiler 'nvcc ...
I am trying to train a deep neural network acoustic model using cuda. Cuda is installed on the machine and the sample Cuda...
Read more >
Installation Guide for Linux - NVIDIA Documentation Center
The installation instructions for the CUDA Toolkit on Linux. CUDA® is a parallel computing platform and programming model invented by ...
Read more >
Installation — CuPy 11.4.0 documentation
If you have multiple versions of CUDA Toolkit installed, CuPy will ... SciPy and Optuna are optional dependencies and will not be installed...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found