How can I use Deepspeed Sparse Attention?
See original GitHub issueI just installed Deepspeed well.
And then the next step is same as usual?
I mean, just change the --attn_types
arguments
and
run the command like this?
python train_dalle.py --vae_path ./vae.pt --image_text_folder /path/to/data --attn_types full,sparse
Or do I have to run the command line like this?
deepspeed train_dalle.py [args...] --deepspeed
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (5 by maintainers)
Top Results From Across the Web
DeepSpeed Sparse Attention
In this tutorial we describe how to use DeepSpeed Sparse Attention (SA) and its building-block kernels. The easiest way to use SA is...
Read more >DeepSpeed/sparse_self_attention.py at master - GitHub
DeepSpeed is a deep learning optimization library that makes distributed training ... """Implements an efficient Sparse Self Attention of Transformer layer ...
Read more >DeepSpeed: Extreme-scale model training for everyone
Attention -based deep learning models, such as Transformers, are highly effective in capturing relationships between tokens in an input sequence ...
Read more >AI千集-AI智能创作平台-openoker/DeepSpeed: DeepSpeed是一个 ...
In this tutorial we describe how to use DeepSpeed Sparse Attention (SA) and its building-block kernels. The easiest way to use SA is...
Read more >Fit More and Train Faster With ZeRO via DeepSpeed and ...
If you use the Hugging Face Trainer, as of transformers v4.2.0 you have the ... These include DeepSpeed Sparse Attention and 1-bit Adam, ......
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@afiaka87 Thank you for your reply! I used the dockerfile in this repo and it works now! Thank you!
@wcshin-git most likely your versions are off. Even on the V100 - you will need to be using cuda toolkit 10.1, python 3.7 (can’t user later versions), and pytorch 1.6.0. These three requirements are essential to getting it to work. Normally when stuff requires a specific CUDA version I recommend conda - unfortunately conda doesn’t help here as it struggles to replace your systems
nvcc
binary.I assume you’re using a rental service or one of the cloud offerings. Do they have the ability to start a machine with a docker container? If so - pytorch’s official
pytorch/pytorch-1.6-dev
is what you want for the best compatibility. Make sure you get the dev image, not runtime - it comes with the needednvcc
binary.We also have a docker image in the project files - there are instructions for building it in the README.md, but you will need to have both docker and nvidia-docker installed on the machine in order to build the image.
If none of that is an option - I’m afraid you’re somewhat on your own. Many have tried and failed to install deepspeed with sparse attention, myself included. I am happy to give advice but please try to get docker working first if you can - there are instructions in our README.md and a google search for “linux install nvidia-docker” will get you the instructions for installing docker for GPUs.
If using vast.ai - you will actually be inside of a docker container no matter what you do. As such it’s pretty easy to break an entire instance and have to nuke it. Make sure you’ve dialed the process down as best you can before doing anything significant like downloading a large dataset, as you may need to destroy the instance to try again. I personally have gotten this working on vast.ai with a multitude of setups which they offer, but using a V100 is the absolute easiest way. Trying on an RTX 3000 series card or an A100/A6000 simply won’t work without changing the DeepSpeed codebase itself and even then you’re not really guaranteed a proper implementation.
Best of luck!