Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to specify CUDA version in a conda package?

See original GitHub issue

How should a package maintainer specify a dependency on a specific CUDA version like 9.2 or 10.0?

As an example, here is how PyTorch does things today:

CUDA 8.0: conda install pytorch torchvision cuda80 -c pytorch
CUDA 9.2: conda install pytorch torchvision -c pytorch
CUDA 10.0: conda install pytorch torchvision cuda100 -c pytorch
No CUDA: conda install pytorch-cpu torchvision-cpu -c pytorch

I believe that NVIDIA and Anaconda handle things differently. I have zero thoughts on which way is correct, but I thought it would be useful to start such a conversation around this. My hope is that we can come to some consensus on packaging conventions that can help users avoid broken environments more easily and provide a good pattern for future package maintainers to follow.

cc @jjhelmus @msarahan @nehaljwani @stuartarchibald @seibert @sklam @soumith @kkraus14 @mike-wendt @datametrician

Issue Analytics

State:
Created 5 years ago
Reactions:9
Comments:44 (25 by maintainers)

Top GitHub Comments

21reactions

seibertcommented, Dec 20, 2018

CUDA drivers (the part that conda cannot install) are backward compatible with applications compiled with older versions of CUDA. So, for example, the CUDA 9.2 build of PyTorch would only require that CUDA >= 9.2 is present on the system. This backward compatibility also extends to the cudatoolkit (the userspace libraries supplied by NVIDIA which Anaconda already packages), where a conda environment with cudatoolkit 8.0 would work just fine with a system that has the CUDA 9.2 drivers.

So, on one hand, there is motivation (much like glibc) to pick and arbitrary old CUDA and build everything with that, and rely on driver backward compatibility. Aside from new CUDA language features (which project may choose to ignore for compatibility reasons), building with newer CUDA versions can also improve performance as well as add native support for newer hardware. A package compiled for CUDA 8 will not run on Volta GPUs without a lengthy JIT recompilation of all the CUDA functions in the project, which happens automatically, but can still be a bad user experience. As an example, TensorFlow compiled with CUDA 8 can take 10+ minutes to start up on a Volta GPU.

These two conflicting desires for compatibility and performance explain why it makes sense to compile packages with a range of CUDA versions (right now, I’d say 8.0-10 or 9.0 to 10.0 would be the best choice), but still leaves the burden on the user to know which CUDA version they need.

Because nearly all CUDA projects require the CUDA toolkit libraries, and Anaconda packages them, we use the cudatoolkit package as our CUDA version marker. So for packages in Anaconda that require CUDA, we make them depend on a specific cudatoolkit version. This allows you to force a specific CUDA version this way:

conda install pytorch cudatoolkit=8.0

And that will get you a PyTorch compiled with CUDA 8, rather than something else.

The CUDA driver provides a C API to query what maximum version of CUDA is supported by the driver, so a few months ago I wrote a self-contained Python function for detecting what version of CUDA (if any) is present on the system:

https://gist.github.com/seibert/52a204395cdc84eeeaf0ce05464a636b

This was for the conda team to potentially incorporate into conda as a “marker” (I think that is the right term), so that conda could include a cuda package with a version given by this function in the dependency solver. That would then give everyone a standard way to refer to the system CUDA dependency.

I don’t know where this work is on the roadmap for conda (@msarahan?), but if there is additional work needed on the conda side to get this to the finish line, I’m happy to help. It would go a long way toward unifying the various approaches as well as improving the user experience.

2reactions

jjhelmuscommented, Feb 3, 2019

The addition of the micro version was intentional. NVIDIA labels CUDA releases with a micro version and I think in the past has released multiple micro versions for a given major.minor version. With the previous cudatoolkit packages there was not method to differentiate these changes. The addition of the micro version to cudatoolkit 10.0.130 is more specific and allows for updates if a new micro version is released. Package builder and users should still specify the version by the major.minor version, e.g. conda install cudatoolkit=10.0, conda will automatically provide the micro version.