question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Error with sparse attention

See original GitHub issue

I get this error when I enable sparse attention: RuntimeError: Unable to JIT load the sparse_attn op due to it not being compatible due to hardware/software issue. My nvcc --version says

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Wed_Jul_22_19:09:09_PDT_2020
Cuda compilation tools, release 11.0, V11.0.221
Build cuda_11.0_bu.TC445_37.28845127_0

And I am running the script on 4 RTX 2080Ti’s

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:11 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
kswamy15commented, Apr 13, 2021

Sparse Attention kernels are written in Triton and currently only work on Tesla V100; we will be soon upgrading to handle Ampere as well. However, it is not compatible with GeForce RTX.

Why is GeForce RTX being left out? or are you working to handle RTX later? Is RTX very different compute platform compared to V100?

1reaction
RezaYazdaniAminabadicommented, Feb 16, 2021

Hi @ShivanshuPurohit

Thanks for sending the report. The sparse-attention code should be compiled through JIT and there is no need to be pre-installed. However, I am seeing that the CUDA version for torch is 10.2 and for the nvcc is 11.0. I think there might be some compatibility issue there! Can you make the CUDA versions similar and try it again? Thanks.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Understanding BigBird's Block Sparse Attention - Hugging Face
BigBird relies on block sparse attention instead of normal attention (i.e. BERT's attention) and can handle sequences up to a length of 4096...
Read more >
DeepSpeed Sparse Attention
In this tutorial we describe how to use DeepSpeed Sparse Attention (SA) and its building-block kernels. The easiest way to use SA is...
Read more >
Sparse Attention with Learning to Hash - OpenReview
To overcome these issues, this paper proposes a new strategy for sparse attention, namely LHA (Learning-to-Hash Attention), which directly learns separate ...
Read more >
sparse attention and its relation with attention mask
Can anyone please explain in a clear way what is the usage of mask in attention for sparse attention? I just can not...
Read more >
Efficient Content-Based Sparse Attention with Routing ...
We show that our model outperforms comparable sparse attention models on language modeling on Wikitext-103 (15.8 vs 18.3 perplexity), as well as on...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found