question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Load Biobert pre-trained weights into Bert model with Pytorch bert hugging face run_classifier.py code

See original GitHub issue

These are the steps I followed to get Biobert working with the existing Bert hugging face pytorch code.

  1. I downloaded the pre-trained weights ‘biobert_pubmed_pmc.tar.gz’ from the Releases page.

  2. I ran this command to convert the tf checkpoint to pytorch model

python pytorch-pretrained-BERT/pytorch_pretrained_bert/convert_tf_checkpoint_to_pytorch.py --tf_checkpoint_path="biobert/pubmed_pmc_470k/biobert_model.ckpt.index" --bert_config_file="biobert/pubmed_pmc_470k/bert_config.json" --pytorch_dump_path="biobert/pubmed_pmc_470k/Pytorch/biobert.model"

This created a file ‘biobert.model’ in the specified path.

  1. As mentioned in this link , I compressed ‘biobert.model’ created above and ‘biobert/pubmed_pmc_470k/bert_config.json’ together into a biobert_model.tar.gz

  2. I then ran the run_classifier.py of hugging face bert with the following command, using the tar.gz created above.

python pytorch-pretrained-BERT/examples/run_classifier.py --data_dir="Data/" --bert_model="biobert_model.tar.gz" --task_name="qqp" --output_dir="OutputModels/Pretrained/" --do_train --do_eval --do_lower_case

I get the error

'UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte' 

in the line

tokenizer = BertTokenizer.from_pretrained(args.bert_model, do_lower_case=args.do_lower_case)

Am I doing something wrong?

I just wanted to run run_classifier.py code provided by hugging face with biobert pretrained weights in the same way that we run bert with it. Is there a way to do this?

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:12 (1 by maintainers)

github_iconTop GitHub Comments

36reactions
stefan-itcommented, Aug 5, 2019

Try to pass the extracted folder of your converted bioBERT model to the --model_name_or_path 😃

Here’s a short example:

  • Download the BioBERT v1.1 (+ PubMed 1M) model (or any other model) from the bioBERT repo
  • Extract the downloaded file, e.g. with tar -xzf biobert_v1.1_pubmed.tar.gz
  • Convert the bioBERT model TensorFlow checkpoint to a PyTorch and PyTorch-Transformers compatible one: pytorch_transformers bert biobert_v1.1_pubmed/model.ckpt-1000000 biobert_v1.1_pubmed/bert_config.json biobert_v1.1_pubmed/pytorch_model.bin
  • Move config mv biobert_v1.1_pubmed/bert_config.json biobert_v1.1_pubmed/config.json

Then pass the folder name to the --model_name_or_path argument. You can run this simple script to check, if everything works:

from pytorch_transformers import BertModel
model = BertModel.from_pretrained('biobert_v1.1_pubmed')
6reactions
nipunsadvilkarcommented, Mar 26, 2020

@stefan-it As per new transformers-cli third command would change as follows:

transformers-cli convert --model_type bert \
--tf_checkpoint biobert_v1.1_pubmed/model.ckpt-1000000 \
--config biobert_v1.1_pubmed/bert_config.json \
--pytorch_dump_output biobert_v1.1_pubmed/pytorch_model.bin
Read more comments on GitHub >

github_iconTop Results From Across the Web

BERT - Hugging Face
Check out the from_pretrained() method to load the model weights. Bert Model with two heads on top as done during the pretraining: a...
Read more >
(beta) Dynamic Quantization on BERT - PyTorch
In this tutorial, we will apply the dynamic quantization on a BERT model, closely following the BERT model from the HuggingFace Transformers examples....
Read more >
Can not find the pytorch model when loading BERT model in ...
The reason for the error seems to be that the pre-trained model weight files are not available or loadable. You can try that...
Read more >
Convert Tensorflow models to Transformer models - Medium
Convert the Tensorflow model to the HuggingFace Transformers model using ... Tensorflow BioBERT model into a pre-trained PyTorch model.
Read more >
PyTorch Pretrained Bert - Model Zoo
Six PyTorch models ( torch.nn.Module ) for Bert with pre-trained weights (in the modeling.py file):. BertModel - raw BERT Transformer model (fully ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found