question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How can I use Deepspeed Sparse Attention?

See original GitHub issue

I just installed Deepspeed well. image

And then the next step is same as usual? I mean, just change the --attn_types arguments and run the command like this? python train_dalle.py --vae_path ./vae.pt --image_text_folder /path/to/data --attn_types full,sparse Or do I have to run the command line like this? deepspeed train_dalle.py [args...] --deepspeed

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:5 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
wcshin-gitcommented, May 19, 2021

@afiaka87 Thank you for your reply! I used the dockerfile in this repo and it works now! Thank you!

0reactions
afiaka87commented, May 17, 2021

@wcshin-git most likely your versions are off. Even on the V100 - you will need to be using cuda toolkit 10.1, python 3.7 (can’t user later versions), and pytorch 1.6.0. These three requirements are essential to getting it to work. Normally when stuff requires a specific CUDA version I recommend conda - unfortunately conda doesn’t help here as it struggles to replace your systems nvcc binary.

I assume you’re using a rental service or one of the cloud offerings. Do they have the ability to start a machine with a docker container? If so - pytorch’s official pytorch/pytorch-1.6-dev is what you want for the best compatibility. Make sure you get the dev image, not runtime - it comes with the needed nvcc binary.

We also have a docker image in the project files - there are instructions for building it in the README.md, but you will need to have both docker and nvidia-docker installed on the machine in order to build the image.

If none of that is an option - I’m afraid you’re somewhat on your own. Many have tried and failed to install deepspeed with sparse attention, myself included. I am happy to give advice but please try to get docker working first if you can - there are instructions in our README.md and a google search for “linux install nvidia-docker” will get you the instructions for installing docker for GPUs.

If using vast.ai - you will actually be inside of a docker container no matter what you do. As such it’s pretty easy to break an entire instance and have to nuke it. Make sure you’ve dialed the process down as best you can before doing anything significant like downloading a large dataset, as you may need to destroy the instance to try again. I personally have gotten this working on vast.ai with a multitude of setups which they offer, but using a V100 is the absolute easiest way. Trying on an RTX 3000 series card or an A100/A6000 simply won’t work without changing the DeepSpeed codebase itself and even then you’re not really guaranteed a proper implementation.

Best of luck!

Read more comments on GitHub >

github_iconTop Results From Across the Web

DeepSpeed Sparse Attention
In this tutorial we describe how to use DeepSpeed Sparse Attention (SA) and its building-block kernels. The easiest way to use SA is...
Read more >
DeepSpeed/sparse_self_attention.py at master - GitHub
DeepSpeed is a deep learning optimization library that makes distributed training ... """Implements an efficient Sparse Self Attention of Transformer layer ...
Read more >
DeepSpeed: Extreme-scale model training for everyone
Attention -based deep learning models, such as Transformers, are highly effective in capturing relationships between tokens in an input sequence ...
Read more >
AI千集-AI智能创作平台-openoker/DeepSpeed: DeepSpeed是一个 ...
In this tutorial we describe how to use DeepSpeed Sparse Attention (SA) and its building-block kernels. The easiest way to use SA is...
Read more >
Fit More and Train Faster With ZeRO via DeepSpeed and ...
If you use the Hugging Face Trainer, as of transformers v4.2.0 you have the ... These include DeepSpeed Sparse Attention and 1-bit Adam, ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found