Extending Encoder Decoder to GPT-2
See original GitHub issueAdding GPT2 initialization for EncoderDecoder model as pointed out in the issue below.
Currently, only Bert works as a decoder. We might add GPT2 in a couple of weeks. Note that no model has
cross-attention
layers if it is not already an encoder-decoder model (like Bart or T5) and in this case it does not make sense to use the encoder-decoder wrapper. The model is initialized with random weights for the cross attention layers which will have to be fine-tuned. I agree, that this should be made clearer in the documentation!
_Originally posted by @patrickvonplaten in https://github.com/huggingface/transformers/issues/4517#issuecomment-638058577_
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:14 (5 by maintainers)
Top Results From Across the Web
Leveraging Pre-trained Language Model Checkpoints for ...
In essence, an encoder-decoder model is the combination of a stand-alone encoder, such as BERT, and a stand-alone decoder model, such as GPT2....
Read more >Generating captions with ViT and GPT2 using 🤗 Transformers
Generating captions with ViT and GPT2 using 🤗 Transformers. Using Encoder Decoder models in HF to combine vision and text.
Read more >Why does GPT-2 Exclude the Transformer Encoder?
It works just like a traditional language model as it takes word vectors as input and produces estimates for the probability of the...
Read more >Understanding the GPT-2 Source Code Part 2 - Medium
An Explanation for Byte Pair Encoding Tokenization. bpe_tokens.extend(self.encoder[bpe_token] for bpe_token in self.bpe(token).split(' ')).
Read more >arXiv:2010.07576v1 [cs.CL] 15 Oct 2020
architecture with both encoder and decoder du- plicated from a pretrained language ... to GPT2-sw, despite it extends the latter one using.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Got sidetracked with other research - coming back to it in several days, working on my end, just need to play nice with the rest of the repo.
On Tue, Jul 7, 2020 at 3:32 PM Mihai Ilie notifications@github.com wrote:
– Dylan Weber, Research Assistant | PhD Candidate School of Math and Statistical Sciences WXLR642/BYENG593 Arizona State University
It’s on the roadmap 😃