Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Config specifies max_position_embeddings as 1024

See original GitHub issue

Hi!

I noticed that the PRIMERA configs specifies max_position_embeddings: 1024. Is this intentional? AFAICT the HuggingFace library treats this as the maximum position embedding size of the encoder, or max_encoder_position_embeddings, which for PRIMERA is 4096.

E.g. in their run_summarization.py script, they appear to treat max_position_embeddings as max_encoder_position_embeddings as they compare it to the max_source_length.

So I am wondering if max_position_embeddings should be set to 4096 in the PRIMERA configs, else it causes problems when trying to use with existing HF example scripts.

Issue Analytics

State:
Created a year ago
Comments:6

Top GitHub Comments

1reaction

Wendy-Xiaocommented, May 12, 2022

Hi John and Jay,

Thanks for pointing out the issue. It is not intentional to set max_position_embedding to be 1024, it’s just set by default.

I have not used run_summarization.py before, so I didn’t know that in that script they treat max_position_embedding as the maximum position embedding size for the encoder, which is specified as max_encoder_position_embeddings in our case.

I’ll update it in the config files, to make it consistent with the script.

1reaction

jaydedcommented, May 10, 2022

I can confirm that this affects run_summarization.py and is inconsistent with the semantics of other Huggingface configs.

Top Results From Across the Web

perplexity too big for gpt2 wikitext evaluation · Issue #6 - GitHub

... GPT2BPETokenizer --max-position-embeddings 1024 the re. ... Evaluate GPT2 model WARNING: No training data specified using world size: 1 ...

How to train a Language Model with Megatron-LM

We will try to break down the different steps for training a GPT2 model in this framework, this includes: Environment setup; Data preprocessing ......

megatron-lm - PyPI

We have provided an example of how to configure Megatron to run GPT-3 with 175 billion parameters on 1024 GPUs. The script is...

arXiv:2202.01145v1 [cs.CL] 2 Feb 2022

Meaning is defined by the company it keeps. However, company is two-fold: It's based on the identity of tokens and also on their...

KERPLE: Kernelized Relative Positional Embedding for ...

for any c ∈ R. The second equality defines a bias kernel which is positive definite using ... to 32, seq-length to 512,...