Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

GPU mode error

See original GitHub issue

Hi,

I got the following error:

I’m using Docker version 1.1.0 gpu NVIDIA GeForce RTX 3090

Any suggestion or advice?

Thanks in advance. Amin

`2021-05-06 16:56:50.765879: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1 I0506 16:56:52.008759 140393620989696 call_variants.py:338] Shape of input examples: [100, 221, 6] 2021-05-06 16:56:52.013998: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2021-05-06 16:56:52.046181: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2100000000 Hz 2021-05-06 16:56:52.053674: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x47507d0 initialized for platform Host (this does not guarantee that XLA will be used). Devices: 2021-05-06 16:56:52.053727: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version 2021-05-06 16:56:52.058754: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1 2021-05-06 16:56:52.188018: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x47b9240 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices: 2021-05-06 16:56:52.188089: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): NVIDIA GeForce RTX 3090, Compute Capability 8.6 2021-05-06 16:56:52.191811: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: pciBusID: 0000:20:00.0 name: NVIDIA GeForce RTX 3090 computeCapability: 8.6 coreClock: 1.695GHz coreCount: 82 deviceMemorySize: 23.70GiB deviceMemoryBandwidth: 871.81GiB/s 2021-05-06 16:56:52.191885: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1 2021-05-06 16:56:52.195656: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10 2021-05-06 16:56:52.199014: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10 2021-05-06 16:56:52.199715: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10 2021-05-06 16:56:52.203305: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10 2021-05-06 16:56:52.205473: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10 2021-05-06 16:56:52.211828: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7 2021-05-06 16:56:52.216068: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0 2021-05-06 16:56:52.216108: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1 2021-05-06 16:58:21.551842: E tensorflow/core/common_runtime/session.cc:91] Failed to create session: Internal: CUDA runtime implicit initialization on GPU:0 failed. Status: device kernel image is invalid 2021-05-06 16:58:21.551943: E tensorflow/c/c_api.cc:2184] Internal: CUDA runtime implicit initialization on GPU:0 failed. Status: device kernel image is invalid Traceback (most recent call last): File “/tmp/Bazel.runfiles_1q2x77gk/runfiles/com_google_deepvariant/deepvariant/call_variants.py”, line 502, in <module> tf.compat.v1.app.run() File “/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py”, line 40, in run _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef) File “/tmp/Bazel.runfiles_1q2x77gk/runfiles/absl_py/absl/app.py”, line 299, in run _run_main(main, args) File “/tmp/Bazel.runfiles_1q2x77gk/runfiles/absl_py/absl/app.py”, line 250, in _run_main sys.exit(main(argv)) File “/tmp/Bazel.runfiles_1q2x77gk/runfiles/com_google_deepvariant/deepvariant/call_variants.py”, line 492, in main use_tpu=FLAGS.use_tpu, File “/tmp/Bazel.runfiles_1q2x77gk/runfiles/com_google_deepvariant/deepvariant/call_variants.py”, line 393, in call_variants with tf.compat.v1.Session(config=config) as sess: File “/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py”, line 1586, in init super(Session, self).init(target, graph, config=config) File “/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py”, line 701, in init self._session = tf_session.TF_NewSessionRef(self._graph._c_graph, opts) tensorflow.python.framework.errors_impl.InternalError: CUDA runtime implicit initialization on GPU:0 failed. Status: device kernel image is invalid

real 1m31.942s user 1m31.967s sys 0m8.062s `

Issue Analytics

State:
Created 2 years ago
Comments:6 (3 by maintainers)

Top GitHub Comments

2reactions

gunjanbaidcommented, May 11, 2021

Hi @aardes, my understanding is that cuDNN v8, CUDA 11, TF 2.5, and Python 3.8 will be needed for RTX 3090. Our code is currently not ready to be upgraded to Python 3.8, but this is something we are looking into for future releases.

0reactions

aardescommented, May 11, 2021

Looking forward to it, thanks