question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

80min training time to fine-tune BERT-base on the SQuAD dataset instead of 24min?

See original GitHub issue

I just fine-tuned BERT-base on the SQuAD dataset with an AWS EC2 p3.2xlarge Deep Learning AMI with a single Tesla V100 16GB:

I used the config in your README:

export SQUAD_DIR=/path/to/SQUAD

python run_squad.py \
  --bert_model bert-base-uncased \
  --do_train \
  --do_predict \
  --do_lower_case \
  --train_file $SQUAD_DIR/train-v1.1.json \
  --predict_file $SQUAD_DIR/dev-v1.1.json \
  --train_batch_size 12 \
  --learning_rate 3e-5 \
  --num_train_epochs 2.0 \
  --max_seq_length 384 \
  --doc_stride 128 \
  --output_dir /tmp/debug_squad/

It took 80min. According to your README:

This example code fine-tunes BERT on the SQuAD dataset. It runs in 24 min (with BERT-base) or 68 min (with BERT-large) on a single tesla V100 16GB.

How to explain this difference? Is there any way to accelerate the training to 24min as well? Thanks

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

2reactions
thomwolfcommented, Feb 13, 2019

You should use 16bit training (--fp16 argument). You can use the dynamic loss scaling or tune the loss scale yourself if the results are not the best.

0reactions
thomwolfcommented, Jun 20, 2019

You can have a look at the readme examples but it should be a lot higher, around 88-90. Maybe your batch size is too small, look at the readme for more information.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Examples — transformers 2.8.0 documentation - Hugging Face
This takes about half an hour to train on a single K80 GPU and about one minute for the evaluation to run. It...
Read more >
Fine-Tune Transformer Models For Question Answering On ...
BERT was trained on unlabeled data by masking words and training the model to predict these masked words based on context. BERT was...
Read more >
pytorch-pretrained-BERT: The Big-&-Extending-Repository-of ...
This repository contains op-for-op PyTorch reimplementations, pre-trained models and fine-tuning examples for: Google's BERT model,; OpenAI's GPT model,; Google ...
Read more >
Parameter-Efficient Transfer Learning for NLP - arXiv
The two most common transfer learning techniques in NLP are feature-based transfer and fine-tuning. Instead, we present an alternative transfer method based ......
Read more >
Fine tuning a Question Answering model using SQuAD and ...
I have been able to get >84% accuracy on the train dataset and BERT-base-uncased so it is definitely possible to get this kind...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found