Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Cannot use trained BERT model from a trained checkpoint

See original GitHub issue

I trained the BERT and got the model.ckpt.data, model.ckpt.meta. model.ckpt.index in the output directory along with predictions.json, etc.

python run_squad.py \
  --vocab_file=$BERT_LARGE_DIR/vocab.txt \
  --bert_config_file=$BERT_LARGE_DIR/bert_config.json \
  --init_checkpoint=$BERT_LARGE_DIR/bert_model.ckpt \
  --do_train=True \
  --train_file=$SQUAD_DIR/train-v2.0.json \
  --do_predict=True \
  --predict_file=$SQUAD_DIR/dev-v2.0.json \
  --train_batch_size=24 \
  --learning_rate=3e-5 \
  --num_train_epochs=2.0 \
  --max_seq_length=384 \
  --doc_stride=128 \
  --output_dir=gs://some_bucket/squad_large/ \
  --use_tpu=True \
  --tpu_name=$TPU_NAME \
  --version_2_with_negative=True

I tried to copy the model.ckpt.meta, model.ckpt.index, model.ckpt.data to the BERT directory and changed the run_squad.py flags as follows to only predict the answer and not train using a dataset:

python run_squad.py \
  --vocab_file=$BERT_LARGE_DIR/vocab.txt \
  --bert_config_file=$BERT_LARGE_DIR/bert_config.json \
  --init_checkpoint=$BERT_LARGE_DIR/model.ckpt \
  --do_train=False \
  --train_file=$SQUAD_DIR/train-v2.0.json \
  --do_predict=True \
  --predict_file=$SQUAD_DIR/dev-v2.0.json \
  --train_batch_size=24 \
  --learning_rate=3e-5 \
  --num_train_epochs=2.0 \
  --max_seq_length=384 \
  --doc_stride=128 \
  --output_dir=gs://some_bucket/squad_large/ \
  --use_tpu=True \
  --tpu_name=$TPU_NAME \
  --version_2_with_negative=True

It throws bucket directory/model.ckpt does not exist error.

Issue Analytics

State:
Created 4 years ago
Comments:5

Top GitHub Comments

1reaction

JeevaTMcommented, Jul 10, 2019

Is it supposed to create a new model.ckpt-# file each time I run a train? I trained on two sample datasets and got 2 model.ckpt but it’s not creating anymore. Thanks for your help!

Yes, satyapraffulRCG. It is supposed to create checkpoints for each training. For SQUAD 2.0, there was 11 checkpoints.

0reactions

satyapraffulRCGcommented, Jul 9, 2019

Is it supposed to create a new model.ckpt-# file each time I run a train? I trained on two sample datasets and got 2 model.ckpt but it’s not creating anymore. Thanks for your help!

Top Results From Across the Web

How to use trained BERT model checkpoints for prediction?

It throws bucket directory/model.ckpt does not exist error. How to utilize the checkpoints generated after training and use it for prediction?

Models - Hugging Face Course

This is a model checkpoint that was trained by the authors of BERT themselves; you can find more details about it in its...

How to load the pre-trained BERT model from local/colab ...

I want to train the bert masked language model on custom corpus ,i followed the step shared in BERT githhub "github.com/google-research/bert# ...

Transfer learning and fine-tuning | TensorFlow Core

You either use the pretrained model as is or use transfer learning to ... dlerror: libnvinfer.so.7: cannot open shared object file: No such...

pytorch-pretrained-bert - PyPI

Here is a quick-start example using BertTokenizer , BertModel and BertForMaskedLM class with Google AI's pre-trained Bert base uncased model. See the doc ......