Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Installing from source using Conda and CUDA could be improved

See original GitHub issue

Thanks to all contributors for their efforts in creating and open sourcing the library. I would like to add my 2 cents of installation process involving building from source for whatever that’s worth.

I always like to install things in conda envs so that there is no clash between different software version or requirement libraries.

MWE:

conda create -n jax python scipy cudnn cudatoolkit
conda list

Now the installation process:

python build/build.py --enable_cuda --cuda_path  ~/miniconda3/envs/jax/lib/ --cudnn_path ~/miniconda3/envs/jax/include

2 Problems arise:

1. nvcc cannot be found in path ~/miniconda3/envs/jax/lib/ bin 
actually the path is wrong, it should have been ~/miniconda3/envs/jax/bin.
Anyways, I copy nvcc from system wide installation /opt/cuda/bin/nvcc into ~/miniconda3/envs/jax/lib/bin.
So far so good.

2. re-running build it complains about cuda.h
Cuda Configuration Error: Cannot find cuda.h under ~/miniconda3/envs/jax/lib 
FAILED: Build did NOT complete successfully (4 packages loaded, 16 targets

ok, let's copy /opt/cuda/include/cuda.h into ~/miniconda3/envs/jax/lib 
re-running build after removing completely rm -rf ~/.cache/bazel
gives again the same error about not being able to find cuda.h.
At this point I am out of ideas.

Anyone else having other ideas on how to resolve this?

Issue Analytics

State:
Created 5 years ago
Comments:22 (6 by maintainers)

Top GitHub Comments

7reactions

kirk86commented, Feb 8, 2019

@hawkinsp thanks a lot, especially for your patience.

Would providing prebuilt Conda packages solve your problem adequately?

I think that would be useful for a lot of ppl, not just me and would actually make jax more popular so that ppl could give it a try and report back on things that worked and things that didn’t work. That would facilitate better development of jax IMHO.

5reactions

hawkinspcommented, Feb 8, 2019

The point is that you still need some system-wide install to get nvcc at build time. cudatoolkit isn’t enough. (I guess that’s what is happening for PyTorch.) TF doesn’t really support mixing two different installs, I think. I’m sure PRs would be welcome.

I think it would be better to either:

remove the nvcc dependency from XLA, as outlined above, or
even better, we provide prebuilt Conda packages so you can then install just cudatoolkit to run JAX, even if it’s not sufficient to build it.

(But note there’s a minor hurdle to the latter, too. XLA needs two unusual things from the CUDA runtime — libdevice, which happily the cudatoolkit package does have, and ptxas, which it does not. It is possible to fall back to using the NVidia kernel driver’s version of ptxas, and XLA will do just that, but we’ve found it is often buggy and the driver-hosted copy deadlocks from time to time.)

Would providing prebuilt Conda packages solve your problem adequately?