Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

DeepSpeed and Generate Method

See original GitHub issue

Hi @lucidrains

I’m currently testing the generate function of the TrainingWrapper class. When I use DeepSpeed and I try to generate a sequence it gives me the following error: AttributeError: 'DeepSpeedLight' object has no attribute 'generate'

Is it because Generation can only be done outside DeepSpeed Engine?

Thank you very much, once again! 😃

Issue Analytics

State:
Created 4 years ago
Comments:14 (8 by maintainers)

Top GitHub Comments

3reactions

CalogeroZarbocommented, Mar 21, 2020

I understand @lucidrains. The frustration is at max when something, where you put so much effort, does not work as expected.

However, I’m going to keep doing experiments and test the library behavior under different circumstances and see how it goes. It may be the case that for longer sequences this is the only way we can address the memory issue. For instance, I do understand that Reformer, as you say, do not learn as well as full attention, but in many cases, for longer sequences, you cannot even think to use a full attention transformer. At the end of the day, it’s better to have a decrease in performance rather than being unable to perform the experiments and develop the solution.

With all that said, I still believe there is much value in your work, and in the time you generously spent in this project. Me as well as all the community we are very grateful to you.

Thank you again for your tips & tricks!

Cheers, Cal

2reactions

CalogeroZarbocommented, Apr 20, 2020

Hi @lucidrains ! Thank you for your great news! No need to apologize, this is the hearth of open source: all the users looking your code help to improve it and discover mistakes here and there, at the end of the day we are all human being.

I also plan to use the Sinkhorn repo you made, and I’m very curious to see it’s performance. If you would have to choose, for long sequences which technique would you recommend? Reformer or your implementation of the Sinkhorn attention? I saw that in the latter you also add reversibility and chunking techniques. Which repo would you use for Enc/Dec architecture?

Thank you so much for your effort!

Top Results From Across the Web

DeepSpeed Integration - Hugging Face

Create or load the DeepSpeed configuration to be used as a master configuration; Create the TrainingArguments object based on these values. Do note...

Getting Started - DeepSpeed

In addition to wrapping the model, DeepSpeed can construct and manage the training optimizer, data loader, and the learning rate scheduler ...

ZeRO — DeepSpeed 0.8.0 documentation - Read the Docs

initialize() to build the optimizer for you. ZeRO Configurations¶. All the settings for DeepSpeed ZeRO are set with the DeepSpeedZeroConfig. The dictionary ...

DeepSpeed Compression: A composable library for extreme ...

For example, popular compression methods such as quantize-aware training ... We build our work on top of DeepSpeed inference, which provides ...

DeepSpeed Deep Dive — Model Implementations for ...

This technique has no relation with the ZeRO technology and ... To make sure it is the case that MII and DeepSpeed-Inference provide...