question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

cml workflow with gpu fails with LD_LIBRARY_PATH error

See original GitHub issue

Using a cml GitHub workflow with docker://dvcorg/cml:0-dvc2-base1-gpu container fails to utilise GPU due to LD_LIBRARY_PATH error:

2021-07-12 10:52:46.210323: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2021-07-12 10:52:47.495637: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-12 10:52:47.496273: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: 
pciBusID: 0000:00:1e.0 name: Tesla T4 computeCapability: 7.5
coreClock: 1.59GHz coreCount: 40 deviceMemorySize: 14.75GiB deviceMemoryBandwidth: 298.08GiB/s
2021-07-12 10:52:47.496789: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2021-07-12 10:52:47.496915: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcublas.so.10'; dlerror: libcublas.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2021-07-12 10:52:47.498149: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2021-07-12 10:52:47.498508: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2021-07-12 10:52:47.501456: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2021-07-12 10:52:47.501637: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusparse.so.10'; dlerror: libcusparse.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2021-07-12 10:52:47.501773: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudnn.so.7'; dlerror: libcudnn.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2021-07-12 10:52:47.501792: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1598] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:1
  • Comments:17 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
btjones-mecommented, Jul 14, 2021

Thank you for your measured response @0x2b3bfa0, that was quite a confusing and unhelpful message!

1reaction
btjones-mecommented, Jul 14, 2021

Ah - my mistake, actually what I did was the following, actually the reverse of what is mentioned in that comment (I missed this). https://stackoverflow.com/a/67642774

Read more comments on GitHub >

github_iconTop Results From Across the Web

troubles caused by tensorflow image's LD_LIBRARY_PATH
In OpenDCOS, before mesos-agent startup, it sets its executor's environment variable LD_LIBRARY_PATH to "/opt/mesosphere/lib", so that executor ...
Read more >
Bug listing with status RESOLVED with resolution UPSTREAM ...
Bug :6292 - "loadkeys broken, or kernel memory garbled!!! ... Bug:53710 - "nvidia drivers failing with USE=pie xorg-x11" status:RESOLVED resolution:UPSTREAM ...
Read more >
GPU Accelaration and libnvidia-ml.so - Ansys Learning Forum
Hi,. I want to run a simulation with GPU acceleration but got following error in Linux environment, can you help me about how...
Read more >
8.4. Emulating Your OpenCL Kernel - Intel
To emulate your kernel, perform the following steps: Required: Modify your host program to select the emulator OpenCL platform. Select the ...
Read more >
Intoli Joins the NVIDIA Inception Program
[ec2-user@ip-172-31-6-82 ~]$ google-chrome-stable --headless --disable-gpu --print-to-pdf https://www.orf.at [1029/160518.999058:ERROR:bus.cc(422)] Failed ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found