question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Config specifies max_position_embeddings as 1024

See original GitHub issue

Hi!

I noticed that the PRIMERA configs specifies max_position_embeddings: 1024. Is this intentional? AFAICT the HuggingFace library treats this as the maximum position embedding size of the encoder, or max_encoder_position_embeddings, which for PRIMERA is 4096.

E.g. in their run_summarization.py script, they appear to treat max_position_embeddings as max_encoder_position_embeddings as they compare it to the max_source_length.

So I am wondering if max_position_embeddings should be set to 4096 in the PRIMERA configs, else it causes problems when trying to use with existing HF example scripts.

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:6

github_iconTop GitHub Comments

1reaction
Wendy-Xiaocommented, May 12, 2022

Hi John and Jay,

Thanks for pointing out the issue. It is not intentional to set max_position_embedding to be 1024, it’s just set by default.

I have not used run_summarization.py before, so I didn’t know that in that script they treat max_position_embedding as the maximum position embedding size for the encoder, which is specified as max_encoder_position_embeddings in our case.

I’ll update it in the config files, to make it consistent with the script.

1reaction
jaydedcommented, May 10, 2022

I can confirm that this affects run_summarization.py and is inconsistent with the semantics of other Huggingface configs.

Read more comments on GitHub >

github_iconTop Results From Across the Web

perplexity too big for gpt2 wikitext evaluation · Issue #6 - GitHub
... GPT2BPETokenizer --max-position-embeddings 1024 the re. ... Evaluate GPT2 model WARNING: No training data specified using world size: 1 ...
Read more >
How to train a Language Model with Megatron-LM
We will try to break down the different steps for training a GPT2 model in this framework, this includes: Environment setup; Data preprocessing ......
Read more >
megatron-lm - PyPI
We have provided an example of how to configure Megatron to run GPT-3 with 175 billion parameters on 1024 GPUs. The script is...
Read more >
arXiv:2202.01145v1 [cs.CL] 2 Feb 2022
Meaning is defined by the company it keeps. However, company is two-fold: It's based on the identity of tokens and also on their...
Read more >
KERPLE: Kernelized Relative Positional Embedding for ...
for any c ∈ R. The second equality defines a bias kernel which is positive definite using ... to 32, seq-length to 512,...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found