question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Extending Encoder Decoder to GPT-2

See original GitHub issue

Adding GPT2 initialization for EncoderDecoder model as pointed out in the issue below.

Currently, only Bert works as a decoder. We might add GPT2 in a couple of weeks. Note that no model has cross-attention layers if it is not already an encoder-decoder model (like Bart or T5) and in this case it does not make sense to use the encoder-decoder wrapper. The model is initialized with random weights for the cross attention layers which will have to be fine-tuned. I agree, that this should be made clearer in the documentation!

_Originally posted by @patrickvonplaten in https://github.com/huggingface/transformers/issues/4517#issuecomment-638058577_

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:1
  • Comments:14 (5 by maintainers)

github_iconTop GitHub Comments

4reactions
djw1809commented, Jul 8, 2020

Got sidetracked with other research - coming back to it in several days, working on my end, just need to play nice with the rest of the repo.

On Tue, Jul 7, 2020 at 3:32 PM Mihai Ilie notifications@github.com wrote:

@patrickvonplaten https://github.com/patrickvonplaten Hello Patrick, I am watching with much interest EncodeDecoder from transformers 😃 . Any updates on supporting GPT2 with EncodeDecoder ?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/huggingface/transformers/issues/4961#issuecomment-655170674, or unsubscribe https://github.com/notifications/unsubscribe-auth/AG3PODZXYPBB33F4CBSNZLDR2OO7VANCNFSM4N4QTZQA .

– Dylan Weber, Research Assistant | PhD Candidate School of Math and Statistical Sciences WXLR642/BYENG593 Arizona State University

4reactions
patrickvonplatencommented, Jun 12, 2020

It’s on the roadmap 😃

Read more comments on GitHub >

github_iconTop Results From Across the Web

Leveraging Pre-trained Language Model Checkpoints for ...
In essence, an encoder-decoder model is the combination of a stand-alone encoder, such as BERT, and a stand-alone decoder model, such as GPT2....
Read more >
Generating captions with ViT and GPT2 using 🤗 Transformers
Generating captions with ViT and GPT2 using 🤗 Transformers. Using Encoder Decoder models in HF to combine vision and text.
Read more >
Why does GPT-2 Exclude the Transformer Encoder?
It works just like a traditional language model as it takes word vectors as input and produces estimates for the probability of the...
Read more >
Understanding the GPT-2 Source Code Part 2 - Medium
An Explanation for Byte Pair Encoding Tokenization. bpe_tokens.extend(self.encoder[bpe_token] for bpe_token in self.bpe(token).split(' ')).
Read more >
arXiv:2010.07576v1 [cs.CL] 15 Oct 2020
architecture with both encoder and decoder du- plicated from a pretrained language ... to GPT2-sw, despite it extends the latter one using.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found