question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Pre-training parameters

See original GitHub issue

Hi,

I’m currently training a BERT model from scratch using the same parameters as specified in scripts/cheatsheet.txt.

@ibeltagy Could you confirm that these parameters are up-to-date 🤔

Loss seems to be fine, but I’m just wondering why training both 128 and 512 seq len models is a lot of faster with 3B tokens on a v3-8 TPU than your reported training time.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:5

github_iconTop GitHub Comments

1reaction
ibeltagycommented, Nov 7, 2019

250 tfrecords, each file is 800-900MB (around 4000 papers)

0reactions
stefan-itcommented, Nov 7, 2019

Thanks Iz ❤️ Just a last question on that pre-training topic: what was the number of tfrecords (and their corresponding text size per shard) 🤔

Read more comments on GitHub >

github_iconTop Results From Across the Web

Choosing the right parameters for pre-training BERT using TPU
Choosing the right parameters for pre-training BERT using TPU · 1. Do lower Case · 2. Maximum Sequence length · 3. Train Batch...
Read more >
Pretrained models — transformers 2.9.1 documentation
Here is the full list of the currently provided pretrained models together with a ... (New, recommended) 12-layer, 768-hidden, 12-heads, 110M parameters.
Read more >
BERT Pre-training - DeepSpeed
utils.py : Training parameters and checkpoints saving/loading utilities. Argument ParsingPermalink. We first need to add DeepSpeed's argument parsing to train.
Read more >
15.10. Pretraining BERT - Dive into Deep Learning
The original BERT has two versions, where the base model has 110 million parameters and the large model has 340 million parameters. After...
Read more >
What Does Pre-training a Neural Network Mean? - Baeldung
Then using the parameters or model from this training to train another model on a different task or dataset. This gives the model...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found