Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Cuda 11 support?

See original GitHub issue

We have 4,000 NVIDIA A100 GPUS and would like to use deepSpeed on them. Thing is, during setup.py:

 [WARNING]  sparse_attn requires CUDA version 10.1+, does not currently support >=11 or <10.1
 [WARNING]  sparse_attn requires CUDA version 10.1+, does not currently support >=11 or <10.1

By the way, the llvm line is wrong, too.

Issue Analytics

State:
Created 3 years ago
Reactions:6
Comments:15 (5 by maintainers)

Top GitHub Comments

1reaction

surakcommented, Feb 26, 2021

Other patches I have are:

Pip tries to install a newer triton, and the version does not really matter:

--- DeepSpeed-0.3.11/requirements/requirements-sparse_attn.txt.orig	2021-02-24 23:11:00.212886868 +0100
+++ DeepSpeed-0.3.11/requirements/requirements-sparse_attn.txt	2021-02-24 23:11:08.221726647 +0100
@@ -1 +1 @@
-triton==0.2.3
+triton>=0.2.3

The “or” kinda fails when llvm 10 is present.

--- DeepSpeed-0.3.11/op_builder/sparse_attn.py.orig	2021-02-24 23:01:30.222302088 +0100
+++ DeepSpeed-0.3.11/op_builder/sparse_attn.py	2021-02-24 23:03:24.696006596 +0100
@@ -21,7 +21,7 @@
 
     def is_compatible(self):
         # Check to see if llvm and cmake are installed since they are dependencies
-        required_commands = ['llvm-config|llvm-config-9', 'cmake']
+        required_commands = ['llvm-config', 'cmake']
         command_status = list(map(self.command_exists, required_commands))
         deps_compatible = all(command_status)

Tensorboard already changed

--- DeepSpeed-0.3.11/requirements/requirements.txt.orig	2021-02-24 20:30:14.442256660 +0100
+++ DeepSpeed-0.3.11/requirements/requirements.txt	2021-02-24 20:30:51.209512682 +0100
@@ -1,6 +1,6 @@
 torch>=1.2
 torchvision>=0.4.0
 tqdm
-tensorboardX==1.8
+tensorboardX>=1.8
 ninja

1reaction

jeffracommented, Feb 25, 2021

@surak: we’re actively working on adding support for a100 + cuda 11 for sparse attention. Will hopefully update soon on this thread. Regarding v100 + cuda 11 we suspect this will work as is but have not had a chance or access to a machine with this config to test it out fully. Would you like to give it a try? if so here’s a branch that allows this config: https://github.com/microsoft/DeepSpeed/tree/sparse-attn-cuda11