DeepSpeed and Generate Method
See original GitHub issueHi @lucidrains
I’m currently testing the generate
function of the TrainingWrapper
class.
When I use DeepSpeed and I try to generate a sequence it gives me the following error:
AttributeError: 'DeepSpeedLight' object has no attribute 'generate'
Is it because Generation can only be done outside DeepSpeed Engine?
Thank you very much, once again! 😃
Issue Analytics
- State:
- Created 4 years ago
- Comments:14 (8 by maintainers)
Top Results From Across the Web
DeepSpeed Integration - Hugging Face
Create or load the DeepSpeed configuration to be used as a master configuration; Create the TrainingArguments object based on these values. Do note...
Read more >Getting Started - DeepSpeed
In addition to wrapping the model, DeepSpeed can construct and manage the training optimizer, data loader, and the learning rate scheduler ...
Read more >ZeRO — DeepSpeed 0.8.0 documentation - Read the Docs
initialize() to build the optimizer for you. ZeRO Configurations¶. All the settings for DeepSpeed ZeRO are set with the DeepSpeedZeroConfig. The dictionary ...
Read more >DeepSpeed Compression: A composable library for extreme ...
For example, popular compression methods such as quantize-aware training ... We build our work on top of DeepSpeed inference, which provides ...
Read more >DeepSpeed Deep Dive — Model Implementations for ...
This technique has no relation with the ZeRO technology and ... To make sure it is the case that MII and DeepSpeed-Inference provide...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I understand @lucidrains. The frustration is at max when something, where you put so much effort, does not work as expected.
However, I’m going to keep doing experiments and test the library behavior under different circumstances and see how it goes. It may be the case that for longer sequences this is the only way we can address the memory issue. For instance, I do understand that Reformer, as you say, do not learn as well as full attention, but in many cases, for longer sequences, you cannot even think to use a full attention transformer. At the end of the day, it’s better to have a decrease in performance rather than being unable to perform the experiments and develop the solution.
With all that said, I still believe there is much value in your work, and in the time you generously spent in this project. Me as well as all the community we are very grateful to you.
Thank you again for your tips & tricks!
Cheers, Cal
Hi @lucidrains ! Thank you for your great news! No need to apologize, this is the hearth of open source: all the users looking your code help to improve it and discover mistakes here and there, at the end of the day we are all human being.
I also plan to use the Sinkhorn repo you made, and I’m very curious to see it’s performance. If you would have to choose, for long sequences which technique would you recommend? Reformer or your implementation of the Sinkhorn attention? I saw that in the latter you also add reversibility and chunking techniques. Which repo would you use for Enc/Dec architecture?
Thank you so much for your effort!