question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

We have 4,000 NVIDIA A100 GPUS and would like to use deepSpeed on them. Thing is, during setup.py:

 [WARNING]  sparse_attn requires CUDA version 10.1+, does not currently support >=11 or <10.1
 [WARNING]  sparse_attn requires CUDA version 10.1+, does not currently support >=11 or <10.1

By the way, the llvm line is wrong, too.

Issue Analytics

  • State:open
  • Created 3 years ago
  • Reactions:6
  • Comments:15 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
surakcommented, Feb 26, 2021

Other patches I have are:

Pip tries to install a newer triton, and the version does not really matter:

--- DeepSpeed-0.3.11/requirements/requirements-sparse_attn.txt.orig	2021-02-24 23:11:00.212886868 +0100
+++ DeepSpeed-0.3.11/requirements/requirements-sparse_attn.txt	2021-02-24 23:11:08.221726647 +0100
@@ -1 +1 @@
-triton==0.2.3
+triton>=0.2.3

The “or” kinda fails when llvm 10 is present.

--- DeepSpeed-0.3.11/op_builder/sparse_attn.py.orig	2021-02-24 23:01:30.222302088 +0100
+++ DeepSpeed-0.3.11/op_builder/sparse_attn.py	2021-02-24 23:03:24.696006596 +0100
@@ -21,7 +21,7 @@
 
     def is_compatible(self):
         # Check to see if llvm and cmake are installed since they are dependencies
-        required_commands = ['llvm-config|llvm-config-9', 'cmake']
+        required_commands = ['llvm-config', 'cmake']
         command_status = list(map(self.command_exists, required_commands))
         deps_compatible = all(command_status)

Tensorboard already changed

--- DeepSpeed-0.3.11/requirements/requirements.txt.orig	2021-02-24 20:30:14.442256660 +0100
+++ DeepSpeed-0.3.11/requirements/requirements.txt	2021-02-24 20:30:51.209512682 +0100
@@ -1,6 +1,6 @@
 torch>=1.2
 torchvision>=0.4.0
 tqdm
-tensorboardX==1.8
+tensorboardX>=1.8
 ninja
1reaction
jeffracommented, Feb 25, 2021

@surak: we’re actively working on adding support for a100 + cuda 11 for sparse attention. Will hopefully update soon on this thread. Regarding v100 + cuda 11 we suspect this will work as is but have not had a chance or access to a machine with this config to test it out fully. Would you like to give it a try? if so here’s a branch that allows this config: https://github.com/microsoft/DeepSpeed/tree/sparse-attn-cuda11

Read more comments on GitHub >

github_iconTop Results From Across the Web

CUDA Compatibility :: NVIDIA Data Center GPU Driver ...
The CUDA driver maintains backward compatibility to continue support of applications built on older toolkits. Using a compatible minor driver ...
Read more >
Does CUDA 11 work with older GPUs? - Reddit
CUDA 11.2 support all the way back to compute capability 3.5, so you are good. In the compute capability tables in the 11.2...
Read more >
Nvidia GPUs sorted by CUDA cores - gists · GitHub
GPU CUDA cores Memory Processor frequency Compute Capability CU... GeForce GTX TITAN Z 5760 12 GB 705 / 876 3.5 unti... NVIDIA TITAN Xp 3840...
Read more >
CUDA - Wikipedia
CUDA -powered GPUs also support programming frameworks such as OpenMP, OpenACC and OpenCL; and HIP by compiling such code to CUDA. CUDA was...
Read more >
CUDA Compatibility Drivers - HEAVY.AI Docs
Use the following commands to install the CUDA 11 compatibility drivers on Ubuntu: ... In the service section, add or update the environment...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found