Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Pre-training parameters

See original GitHub issue

Hi,

I’m currently training a BERT model from scratch using the same parameters as specified in scripts/cheatsheet.txt.

@ibeltagy Could you confirm that these parameters are up-to-date 🤔

Loss seems to be fine, but I’m just wondering why training both 128 and 512 seq len models is a lot of faster with 3B tokens on a v3-8 TPU than your reported training time.

Issue Analytics

State:
Created 4 years ago
Comments:5

Top GitHub Comments

1reaction

ibeltagycommented, Nov 7, 2019

250 tfrecords, each file is 800-900MB (around 4000 papers)

0reactions

stefan-itcommented, Nov 7, 2019

Thanks Iz ❤️ Just a last question on that pre-training topic: what was the number of tfrecords (and their corresponding text size per shard) 🤔

Top Results From Across the Web

Choosing the right parameters for pre-training BERT using TPU

Choosing the right parameters for pre-training BERT using TPU · 1. Do lower Case · 2. Maximum Sequence length · 3. Train Batch...

Pretrained models — transformers 2.9.1 documentation

Here is the full list of the currently provided pretrained models together with a ... (New, recommended) 12-layer, 768-hidden, 12-heads, 110M parameters.

BERT Pre-training - DeepSpeed

utils.py : Training parameters and checkpoints saving/loading utilities. Argument ParsingPermalink. We first need to add DeepSpeed's argument parsing to train.

15.10. Pretraining BERT - Dive into Deep Learning

The original BERT has two versions, where the base model has 110 million parameters and the large model has 340 million parameters. After...

What Does Pre-training a Neural Network Mean? - Baeldung

Then using the parameters or model from this training to train another model on a different task or dataset. This gives the model...