Unable to import libcuda.so.1 when using TFX on GPUs
See original GitHub issueI am running TFX on KFP. I added a GPU to my workload by doing the following.
def use_gpu():
def _set_gpu_spec(task):
task.set_gpu_limit(1)
pipeline_operator_funcs = kubeflow_dag_runner.get_default_pipeline_operator_funcs()
pipeline_operator_funcs.append(use_gpu())
config = kubeflow_dag_runner.KubeflowDagRunnerConfig(
pipeline_operator_funcs=pipeline_operator_funcs,
kubeflow_metadata_config=kubeflow_dag_runner
.get_default_kubeflow_metadata_config(),
tfx_image=tfx_image,
)
kubeflow_dag_runner.KubeflowDagRunner(config=config).run(pipeline)
However, when I run this on Kubeflow, I get the following errors: 2020-03-25 14:19:06.119947: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library ‘libcuda.so.1’; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory 2020-03-25 14:19:06.119993: E tensorflow/stream_executor/cuda/cuda_driver.cc:351] failed call to cuInit: UNKNOWN ERROR (303)
This is using tfx==0.21.0, which uses tensorflow==2.1.0. Note, that if I run my workload on KFP without TFX (using KFP DSL), it runs on GPU.
Thanks
Issue Analytics
- State:
- Created 3 years ago
- Comments:6 (2 by maintainers)
Top Results From Across the Web
Tensorflow cannot open libcuda.so.1 - Stack Overflow
libcuda.so.1 is a symlink to a file that is specific to the version of your NVIDIA drivers. It may be pointing to the...
Read more >Jetson Xavier NX - Tensorflow 2 container slower on GPU ...
I found-out that NVidia provides a Docker image based on L4T with Tensorflow 1 installed. I used it's Dockerfile and created a similar ......
Read more >could not open file to read numa node - You.com - You.com
1. I install jax, jaxlib-cuda102 on WSL with cuda10.2. ... Successfully opened dynamic library libcuda.so.1 2021-02-08 16:32:26.902834: E ...
Read more >Training Keras models with TensorFlow Cloud
We'll get started by installing TensorFlow Cloud, and importing the ... opened dynamic library libcuda.so.1 2021-07-27 22:07:19.524654: I ...
Read more >Failed to load the native TensorFlow runtime. ImportError
ImportError: libcuda.so.1: cannot open share,背景笔者一个项目,在实体机下面 ... from tensorflow.python.pywrap_tensorflow_internal import *
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Hello, just curious if there’s been any more thought on this front around publishing an image like tensorflow/tfx-gpu. It seems to me to be a fairly common use case to want to use tfx/kubeflow with GPUs.
Are you satisfied with the resolution of your issue? Yes No