Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

cuda tests fail when CUDA is available but not configured

See original GitHub issue

I’m testing the build of the new release 3.1.1.

All tests accessing cuda are failing. This is not entirely surprising in itself. My system has nvidia drivers available and has a switchable nvidia card accessible via bumblebee (primusrun). But I have not specifically configured my system to execute CUDA. So it’s not surprising that CUDA_ERROR_NO_DEVICE is found. For me the nvidia card that I have at hand is for experimentation, not for routine operation. The main video card is intel.

What’s the best way to handle this situation? How can a non-CUDA build be enforced when CUDA is otherwise “available”.

An example test log is:

ERROR: testAllgather (test_cco_buf.TestCCOBufInplaceSelf)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/projects/python/build/mpi4py/test/test_cco_buf.py", line 382, in testAllgather
    buf = array(-1, typecode, (size, count))
  File "/projects/python/build/mpi4py/test/arrayimpl.py", line 459, in __init__
    self.array = numba.cuda.device_array(shape, typecode)
  File "/usr/lib/python3/dist-packages/numba/cuda/cudadrv/devices.py", line 223, in _require_cuda_context
    with _runtime.ensure_context():
  File "/usr/lib/python3.9/contextlib.py", line 117, in __enter__
    return next(self.gen)
  File "/usr/lib/python3/dist-packages/numba/cuda/cudadrv/devices.py", line 121, in ensure_context
    with driver.get_active_context():
  File "/usr/lib/python3/dist-packages/numba/cuda/cudadrv/driver.py", line 393, in __enter__
    driver.cuCtxGetCurrent(byref(hctx))
  File "/usr/lib/python3/dist-packages/numba/cuda/cudadrv/driver.py", line 280, in __getattr__
    self.initialize()
  File "/usr/lib/python3/dist-packages/numba/cuda/cudadrv/driver.py", line 240, in initialize
    raise CudaSupportError("Error at driver init: \n%s:" % e)
numba.cuda.cudadrv.error.CudaSupportError: Error at driver init:
[100] Call to cuInit results in CUDA_ERROR_NO_DEVICE:
-------------------- >> begin captured logging << --------------------
numba.cuda.cudadrv.driver: INFO: init
numba.cuda.cudadrv.driver: DEBUG: call driver api: cuInit
numba.cuda.cudadrv.driver: ERROR: Call to cuInit results in CUDA_ERROR_NO_DEVICE
--------------------- >> end captured logging << ---------------------

Issue Analytics

State:
Created 2 years ago
Comments:43 (43 by maintainers)

Top GitHub Comments

1reaction

drew-parsonscommented, Aug 19, 2021

It seems spawn trouble has been a long-running saga! I’ll deactivate them for now and check again later with future OpenMPI releases.

1reaction

dalcinlcommented, Aug 15, 2021

@drew-parsons I guess you are using Open MPI, right? Dynamic process management has always been semi-broken. I would suggest to just disable these tests if they are giving trouble of behave erratically. Hopefully, things will be much better in upcoming release Open MPI 5.x, are mpi4py tests are passing .