Jax profiler won't work with Cuda 11.5
See original GitHub issueI’m trying to use the Jax profiler however TensorFlow throws the following error
2022-01-21 00:07:31.649491: W external/org_tensorflow/tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcupti.so.11.4'; dlerror: libcupti.so.11.4: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-11.5/targets/x86_64-linux/lib/
When I use locate libcupti.so
, the following paths are provided, this is why I’m using a different directory than the one provided on the jax readme
/usr/local/cuda-11.5/targets/x86_64-linux/lib/libcupti.so
/usr/local/cuda-11.5/targets/x86_64-linux/lib/libcupti.so.11.5
/usr/local/cuda-11.5/targets/x86_64-linux/lib/libcupti.so.2021.3.1
While I can’t find any documentation anywhere, I’m guessing that TensorFlow doesn’t support Cuda 11.5 yet as it is trying to find the 11.4 version. I just want to confirm this is not a jax issue before I post an issue on the TensorFlow Github
If my guess is correct, is it possible to rename the libcupti.so.11.5 to 11.4 as a total bodge job?? Or must I change over to Cuda 11.4 to get the profiler to work
Thanks for any support
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (1 by maintainers)
Top Results From Across the Web
Profiling JAX programs - JAX documentation - Read the Docs
If the code you'd like to profile isn't already running (e.g. if you started the profiler server in a Python shell), run it...
Read more >Nsight Compute - NVIDIA Documentation Center
NVIDIA Nsight Compute is an interactive kernel profiler for CUDA applications. It provides detailed performance metrics and API debugging ...
Read more >TensorFlow on the HPC Clusters
Example Job. Test the installation of the GPU version of TensorFlow by running a short job. First, download the necessary data. The compute...
Read more >Using CUDA-enabled packages on non-NixOS systems? - Help
1 and friends into /var/run/opengl-driver/lib but that only gets me new problems: https://github.com/google/jax/issues/9644. So how does this ...
Read more >HappyWhale Flax/JAX TPU&GPU - ResNet Baseline | Kaggle
[Basic] Why TPU is not significantly faster than GPU?? ... %%capture # upgrade jax, jaxlib, flax is essential, otherwise it may fail to...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
I built Jaxlib from source and the profiler no longer throws the error So I can confirm, this is an issue is with the pip install that uses 11.4
Is there anything I can do to help fix the error?
Sorry, I didn’t see this You are correct, it does run the profiler even with the old version so there isn’t actually an issue
I think my issue was that I was using “/tmp/” as my path rather than “tmp/” which is why I couldn’t find the results