Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

RuntimeError: Unknown: no kernel image is available for execution on the device

See original GitHub issue

Hi, I got the following error, any suggestion? Thanks.

2021-10-25 23:38:32.305016: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-25 23:38:38.368438: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2021-10-25 23:38:38.420023: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-10-25 23:38:38.421306: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: 
pciBusID: 0000:41:00.0 name: RTX A6000 computeCapability: 8.6
coreClock: 1.8GHz coreCount: 84 deviceMemorySize: 47.54GiB deviceMemoryBandwidth: 715.34GiB/s
2021-10-25 23:38:38.421351: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-10-25 23:38:38.422580: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 1 with properties: 
pciBusID: 0000:43:00.0 name: RTX A6000 computeCapability: 8.6
coreClock: 1.8GHz coreCount: 84 deviceMemorySize: 47.54GiB deviceMemoryBandwidth: 715.34GiB/s
2021-10-25 23:38:38.422597: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-25 23:38:38.502267: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2021-10-25 23:38:38.502329: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
2021-10-25 23:38:38.548584: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
2021-10-25 23:38:38.568160: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
2021-10-25 23:38:38.662025: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.11
2021-10-25 23:38:38.691719: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11
2021-10-25 23:38:38.696615: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2021-10-25 23:38:38.696718: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-10-25 23:38:38.698052: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-10-25 23:38:38.699305: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-10-25 23:38:38.702210: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-10-25 23:38:38.703830: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0, 1
ColabFold on Linux
WARNING: For a typical Google-Colab-GPU (16G) session, the max total length is ~1400 residues. You are at 1625! Run Alphafold may crash.
homooligomer: '1'
total_length: '1625'
working_directory: 'prediction_test_37769'
running mmseqs2
  0%|          | 0/150 [elapsed: 00:00 remaining: ?]
  0%|          | 0/5 [elapsed: 00:00 remaining: ?]
2021-10-25 23:44:59.124108: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-10-25 23:44:59.127144: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-10-25 23:44:59.127179: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264]      
2021-10-25 23:44:59.200029: I tensorflow/core/platform/profile_utils/cpu_utils.cc:114] CPU Frequency: 2699835000 Hz
2021-10-25 23:45:03.682089: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_asm_compiler.cc:63] cuLinkAddData fails. This is usually caused by stale driver version.
2021-10-25 23:45:03.682142: E external/org_tensorflow/tensorflow/compiler/xla/service/gpu/gpu_compiler.cc:985] The CUDA linking API did not work. Please use XLA_FLAGS=--xla_gpu_force_compilation_parallelism=1 to bypass it, but expect to get longer compilation time due to the lack of multi-threading.
Traceback (most recent call last):
  File "runner.py", line 662, in <module>
    prediction_result, (r, t) = cf.to(model_runner.predict(processed_feature_dict, random_seed=seed),"cpu")
  File "/data/colabfold/alphafold/model/model.py", line 134, in predict
    result, recycles = self.apply(self.params, jax.random.PRNGKey(random_seed), feat)
  File "/data/colabfold/colabfold-conda/lib/python3.7/site-packages/jax/_src/random.py", line 122, in PRNGKey
    key = prng.seed_with_impl(impl, seed)
  File "/data/colabfold/colabfold-conda/lib/python3.7/site-packages/jax/_src/prng.py", line 203, in seed_with_impl
    return PRNGKeyArray(impl, impl.seed(seed))
  File "/data/colabfold/colabfold-conda/lib/python3.7/site-packages/jax/_src/prng.py", line 241, in threefry_seed
    k1 = convert(lax.shift_right_logical(seed_arr, lax._const(seed_arr, 32)))
  File "/data/colabfold/colabfold-conda/lib/python3.7/site-packages/jax/_src/lax/lax.py", line 408, in shift_right_logical
    return shift_right_logical_p.bind(x, y)
  File "/data/colabfold/colabfold-conda/lib/python3.7/site-packages/jax/core.py", line 272, in bind
    out = top_trace.process_primitive(self, tracers, params)
  File "/data/colabfold/colabfold-conda/lib/python3.7/site-packages/jax/core.py", line 624, in process_primitive
    return primitive.impl(*tracers, **params)
  File "/data/colabfold/colabfold-conda/lib/python3.7/site-packages/jax/interpreters/xla.py", line 312, in apply_primitive
    **params)
  File "/data/colabfold/colabfold-conda/lib/python3.7/site-packages/jax/_src/util.py", line 187, in wrapper
    return cached(config._trace_context(), *args, **kwargs)
  File "/data/colabfold/colabfold-conda/lib/python3.7/site-packages/jax/_src/util.py", line 180, in cached
    return f(*args, **kwargs)
  File "/data/colabfold/colabfold-conda/lib/python3.7/site-packages/jax/interpreters/xla.py", line 335, in xla_primitive_callable
    prim.name, donated_invars, *arg_specs)
  File "/data/colabfold/colabfold-conda/lib/python3.7/site-packages/jax/interpreters/xla.py", line 654, in _xla_callable_uncached
    *arg_specs).compile().unsafe_call
  File "/data/colabfold/colabfold-conda/lib/python3.7/site-packages/jax/interpreters/xla.py", line 770, in compile
    self.name, self.hlo(), *self.compile_args)
  File "/data/colabfold/colabfold-conda/lib/python3.7/site-packages/jax/interpreters/xla.py", line 798, in from_xla_computation
    compiled = compile_or_get_cached(backend, xla_computation, options)
  File "/data/colabfold/colabfold-conda/lib/python3.7/site-packages/jax/interpreters/xla.py", line 87, in compile_or_get_cached
    return backend_compile(backend, computation, compile_options)
  File "/data/colabfold/colabfold-conda/lib/python3.7/site-packages/jax/interpreters/xla.py", line 369, in backend_compile
    return backend.compile(built_c, compile_options=options)
RuntimeError: Unknown: no kernel image is available for execution on the device
in external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_asm_compiler.cc(66): 'status'

Issue Analytics

State:
Created 2 years ago
Comments:6 (2 by maintainers)

Top GitHub Comments

2reactions

donghuachensucommented, Oct 27, 2021

After my CUDA version has been updated to 11.5 (nvidia-smi and nvcc), the problem disappeared and your localcolabfold worked on my workstation. Thanks!

0reactions

donghuachensucommented, Oct 27, 2021

Thank you for your localcolabfold!