Config specifies max_position_embeddings as 1024
See original GitHub issueHi!
I noticed that the PRIMERA configs specifies max_position_embeddings: 1024
. Is this intentional? AFAICT the HuggingFace library treats this as the maximum position embedding size of the encoder, or max_encoder_position_embeddings
, which for PRIMERA is 4096
.
E.g. in their run_summarization.py
script, they appear to treat max_position_embeddings
as max_encoder_position_embeddings
as they compare it to the max_source_length
.
So I am wondering if max_position_embeddings
should be set to 4096 in the PRIMERA configs, else it causes problems when trying to use with existing HF example scripts.
Issue Analytics
- State:
- Created a year ago
- Comments:6
Top Results From Across the Web
perplexity too big for gpt2 wikitext evaluation · Issue #6 - GitHub
... GPT2BPETokenizer --max-position-embeddings 1024 the re. ... Evaluate GPT2 model WARNING: No training data specified using world size: 1 ...
Read more >How to train a Language Model with Megatron-LM
We will try to break down the different steps for training a GPT2 model in this framework, this includes: Environment setup; Data preprocessing ......
Read more >megatron-lm - PyPI
We have provided an example of how to configure Megatron to run GPT-3 with 175 billion parameters on 1024 GPUs. The script is...
Read more >arXiv:2202.01145v1 [cs.CL] 2 Feb 2022
Meaning is defined by the company it keeps. However, company is two-fold: It's based on the identity of tokens and also on their...
Read more >KERPLE: Kernelized Relative Positional Embedding for ...
for any c ∈ R. The second equality defines a bias kernel which is positive definite using ... to 32, seq-length to 512,...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Hi John and Jay,
Thanks for pointing out the issue. It is not intentional to set
max_position_embedding
to be1024
, it’s just set by default.I have not used
run_summarization.py
before, so I didn’t know that in that script they treatmax_position_embedding
as the maximum position embedding size for the encoder, which is specified asmax_encoder_position_embeddings
in our case.I’ll update it in the config files, to make it consistent with the script.
I can confirm that this affects
run_summarization.py
and is inconsistent with the semantics of other Huggingface configs.