Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How can I use Deepspeed Sparse Attention?

See original GitHub issue

I just installed Deepspeed well.

And then the next step is same as usual? I mean, just change the --attn_types arguments and run the command like this? python train_dalle.py --vae_path ./vae.pt --image_text_folder /path/to/data --attn_types full,sparse Or do I have to run the command line like this? deepspeed train_dalle.py [args...] --deepspeed

Issue Analytics

State:
Created 2 years ago
Comments:5 (5 by maintainers)

Top GitHub Comments

1reaction

wcshin-gitcommented, May 19, 2021

@afiaka87 Thank you for your reply! I used the dockerfile in this repo and it works now! Thank you!

0reactions

afiaka87commented, May 17, 2021

@wcshin-git most likely your versions are off. Even on the V100 - you will need to be using cuda toolkit 10.1, python 3.7 (can’t user later versions), and pytorch 1.6.0. These three requirements are essential to getting it to work. Normally when stuff requires a specific CUDA version I recommend conda - unfortunately conda doesn’t help here as it struggles to replace your systems nvcc binary.

I assume you’re using a rental service or one of the cloud offerings. Do they have the ability to start a machine with a docker container? If so - pytorch’s official pytorch/pytorch-1.6-dev is what you want for the best compatibility. Make sure you get the dev image, not runtime - it comes with the needed nvcc binary.

We also have a docker image in the project files - there are instructions for building it in the README.md, but you will need to have both docker and nvidia-docker installed on the machine in order to build the image.

If none of that is an option - I’m afraid you’re somewhat on your own. Many have tried and failed to install deepspeed with sparse attention, myself included. I am happy to give advice but please try to get docker working first if you can - there are instructions in our README.md and a google search for “linux install nvidia-docker” will get you the instructions for installing docker for GPUs.

If using vast.ai - you will actually be inside of a docker container no matter what you do. As such it’s pretty easy to break an entire instance and have to nuke it. Make sure you’ve dialed the process down as best you can before doing anything significant like downloading a large dataset, as you may need to destroy the instance to try again. I personally have gotten this working on vast.ai with a multitude of setups which they offer, but using a V100 is the absolute easiest way. Trying on an RTX 3000 series card or an A100/A6000 simply won’t work without changing the DeepSpeed codebase itself and even then you’re not really guaranteed a proper implementation.

Best of luck!

Top Results From Across the Web

DeepSpeed Sparse Attention

In this tutorial we describe how to use DeepSpeed Sparse Attention (SA) and its building-block kernels. The easiest way to use SA is...

DeepSpeed/sparse_self_attention.py at master - GitHub

DeepSpeed is a deep learning optimization library that makes distributed training ... """Implements an efficient Sparse Self Attention of Transformer layer ...

DeepSpeed: Extreme-scale model training for everyone

Attention -based deep learning models, such as Transformers, are highly effective in capturing relationships between tokens in an input sequence ...

AI千集-AI智能创作平台-openoker/DeepSpeed: DeepSpeed是一个 ...

In this tutorial we describe how to use DeepSpeed Sparse Attention (SA) and its building-block kernels. The easiest way to use SA is...

Fit More and Train Faster With ZeRO via DeepSpeed and ...

If you use the Hugging Face Trainer, as of transformers v4.2.0 you have the ... These include DeepSpeed Sparse Attention and 1-bit Adam, ......