question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Unable to import libcuda.so.1 when using TFX on GPUs

See original GitHub issue

I am running TFX on KFP. I added a GPU to my workload by doing the following.

def use_gpu():
  def _set_gpu_spec(task):
    task.set_gpu_limit(1)

pipeline_operator_funcs = kubeflow_dag_runner.get_default_pipeline_operator_funcs()
pipeline_operator_funcs.append(use_gpu())
  config = kubeflow_dag_runner.KubeflowDagRunnerConfig(
      pipeline_operator_funcs=pipeline_operator_funcs,
      kubeflow_metadata_config=kubeflow_dag_runner
      .get_default_kubeflow_metadata_config(),
      tfx_image=tfx_image,
  )
kubeflow_dag_runner.KubeflowDagRunner(config=config).run(pipeline)

However, when I run this on Kubeflow, I get the following errors: 2020-03-25 14:19:06.119947: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library ‘libcuda.so.1’; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory 2020-03-25 14:19:06.119993: E tensorflow/stream_executor/cuda/cuda_driver.cc:351] failed call to cuInit: UNKNOWN ERROR (303)

This is using tfx==0.21.0, which uses tensorflow==2.1.0. Note, that if I run my workload on KFP without TFX (using KFP DSL), it runs on GPU.

Thanks

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:6 (2 by maintainers)

github_iconTop GitHub Comments

7reactions
ialdencootscommented, Sep 25, 2020

Hello, just curious if there’s been any more thought on this front around publishing an image like tensorflow/tfx-gpu. It seems to me to be a fairly common use case to want to use tfx/kubeflow with GPUs.

0reactions
google-ml-butler[bot]commented, Aug 24, 2022

Are you satisfied with the resolution of your issue? Yes No

Read more comments on GitHub >

github_iconTop Results From Across the Web

Tensorflow cannot open libcuda.so.1 - Stack Overflow
libcuda.so.1 is a symlink to a file that is specific to the version of your NVIDIA drivers. It may be pointing to the...
Read more >
Jetson Xavier NX - Tensorflow 2 container slower on GPU ...
I found-out that NVidia provides a Docker image based on L4T with Tensorflow 1 installed. I used it's Dockerfile and created a similar ......
Read more >
could not open file to read numa node - You.com - You.com
1. I install jax, jaxlib-cuda102 on WSL with cuda10.2. ... Successfully opened dynamic library libcuda.so.1 2021-02-08 16:32:26.902834: E ...
Read more >
Training Keras models with TensorFlow Cloud
We'll get started by installing TensorFlow Cloud, and importing the ... opened dynamic library libcuda.so.1 2021-07-27 22:07:19.524654: I ...
Read more >
Failed to load the native TensorFlow runtime. ImportError
ImportError: libcuda.so.1: cannot open share,背景笔者一个项目,在实体机下面 ... from tensorflow.python.pywrap_tensorflow_internal import *
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found