question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Jax profiler won't work with Cuda 11.5

See original GitHub issue

I’m trying to use the Jax profiler however TensorFlow throws the following error

2022-01-21 00:07:31.649491: W external/org_tensorflow/tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcupti.so.11.4'; dlerror: libcupti.so.11.4: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-11.5/targets/x86_64-linux/lib/

When I use locate libcupti.so, the following paths are provided, this is why I’m using a different directory than the one provided on the jax readme

/usr/local/cuda-11.5/targets/x86_64-linux/lib/libcupti.so
/usr/local/cuda-11.5/targets/x86_64-linux/lib/libcupti.so.11.5
/usr/local/cuda-11.5/targets/x86_64-linux/lib/libcupti.so.2021.3.1

While I can’t find any documentation anywhere, I’m guessing that TensorFlow doesn’t support Cuda 11.5 yet as it is trying to find the 11.4 version. I just want to confirm this is not a jax issue before I post an issue on the TensorFlow Github

If my guess is correct, is it possible to rename the libcupti.so.11.5 to 11.4 as a total bodge job?? Or must I change over to Cuda 11.4 to get the profiler to work

Thanks for any support

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:5 (1 by maintainers)

github_iconTop GitHub Comments

1reaction
pseudo-rnd-thoughtscommented, Jan 21, 2022

I built Jaxlib from source and the profiler no longer throws the error So I can confirm, this is an issue is with the pip install that uses 11.4

Is there anything I can do to help fix the error?

0reactions
pseudo-rnd-thoughtscommented, Feb 8, 2022

Sorry, I didn’t see this You are correct, it does run the profiler even with the old version so there isn’t actually an issue

I think my issue was that I was using “/tmp/” as my path rather than “tmp/” which is why I couldn’t find the results

Read more comments on GitHub >

github_iconTop Results From Across the Web

Profiling JAX programs - JAX documentation - Read the Docs
If the code you'd like to profile isn't already running (e.g. if you started the profiler server in a Python shell), run it...
Read more >
Nsight Compute - NVIDIA Documentation Center
NVIDIA Nsight Compute is an interactive kernel profiler for CUDA applications. It provides detailed performance metrics and API debugging ...
Read more >
TensorFlow on the HPC Clusters
Example Job. Test the installation of the GPU version of TensorFlow by running a short job. First, download the necessary data. The compute...
Read more >
Using CUDA-enabled packages on non-NixOS systems? - Help
1 and friends into /var/run/opengl-driver/lib but that only gets me new problems: https://github.com/google/jax/issues/9644. So how does this ...
Read more >
HappyWhale Flax/JAX TPU&GPU - ResNet Baseline | Kaggle
[Basic] Why TPU is not significantly faster than GPU?? ... %%capture # upgrade jax, jaxlib, flax is essential, otherwise it may fail to...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found