Pad token for GPT2 and OpenAIGPT models
See original GitHub issue❓ Questions & Help
I noticed that out of all the models pad_token
is not set for only OpenAIGPTModel
and GPT-2Model
.
I get a warning: Using pad_token, but it is not set yet.
and pad_token_id
is None
Is there any specific reason why is that so?
If not, what is the appropriate padding token to be used for these models?
Thanks
Issue Analytics
- State:
- Created 4 years ago
- Reactions:6
- Comments:9 (3 by maintainers)
Top Results From Across the Web
OpenAI GPT2
GPT-2 is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather than the left....
Read more >CLRP📚: GPT2 implementation 👩
OpenAI GPT-2 model was proposed in Language Models are Unsupervised ... Define PAD Token = EOS Token = 50256 tokenizer.pad_token = tokenizer.eos_token.
Read more >Asking to pad but the tokenizer does not have a padding ...
When tokenizing the sentences of my dataset. The tokenizing code is. SEQ_LEN = MAX_LEN #(50) for pretrained_weights, model_name in MODELS: print ...
Read more >Text generation with GPT-2
Tokenizer: they store the vocabulary of each model and include methods to encode and decode strings in a list of token embeddings indexes...
Read more >OpenAI GPT2 — adapter-transformers documentation
OpenAI GPT-2 model was proposed in Language Models are Unsupervised ... Since this class does classification on the last token, it requires to...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Because GPT2 and GPT are causal LM you don’t need to pad shorter sentences in batches. It is important though that the loss on these “unnecessary” tokens is not calculated. You should set all lables corresponding to “PADDED” tokens to
-100
. In the code snippet you can see in themap_to_encoder_decoder_inputs
function how thelabels
are set to -100 forattention_mask = 0
: https://huggingface.co/patrickvonplaten/bert2gpt2-cnn_dailymail-fp16#training-script