Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Error with sparse attention

See original GitHub issue

I get this error when I enable sparse attention: RuntimeError: Unable to JIT load the sparse_attn op due to it not being compatible due to hardware/software issue. My nvcc --version says

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Wed_Jul_22_19:09:09_PDT_2020
Cuda compilation tools, release 11.0, V11.0.221
Build cuda_11.0_bu.TC445_37.28845127_0

And I am running the script on 4 RTX 2080Ti’s

Issue Analytics

State:
Created 3 years ago
Comments:11 (4 by maintainers)

Top GitHub Comments

1reaction

kswamy15commented, Apr 13, 2021

Sparse Attention kernels are written in Triton and currently only work on Tesla V100; we will be soon upgrading to handle Ampere as well. However, it is not compatible with GeForce RTX.

Why is GeForce RTX being left out? or are you working to handle RTX later? Is RTX very different compute platform compared to V100?

1reaction

RezaYazdaniAminabadicommented, Feb 16, 2021

Hi @ShivanshuPurohit

Thanks for sending the report. The sparse-attention code should be compiled through JIT and there is no need to be pre-installed. However, I am seeing that the CUDA version for torch is 10.2 and for the nvcc is 11.0. I think there might be some compatibility issue there! Can you make the CUDA versions similar and try it again? Thanks.

Top Results From Across the Web

Understanding BigBird's Block Sparse Attention - Hugging Face

BigBird relies on block sparse attention instead of normal attention (i.e. BERT's attention) and can handle sequences up to a length of 4096...

DeepSpeed Sparse Attention

In this tutorial we describe how to use DeepSpeed Sparse Attention (SA) and its building-block kernels. The easiest way to use SA is...

Sparse Attention with Learning to Hash - OpenReview

To overcome these issues, this paper proposes a new strategy for sparse attention, namely LHA (Learning-to-Hash Attention), which directly learns separate ...

sparse attention and its relation with attention mask

Can anyone please explain in a clear way what is the usage of mask in attention for sparse attention? I just can not...

Efficient Content-Based Sparse Attention with Routing ...

We show that our model outperforms comparable sparse attention models on language modeling on Wikitext-103 (15.8 vs 18.3 perplexity), as well as on...