Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Load Biobert pre-trained weights into Bert model with Pytorch bert hugging face run_classifier.py code

See original GitHub issue

These are the steps I followed to get Biobert working with the existing Bert hugging face pytorch code.

I downloaded the pre-trained weights ‘biobert_pubmed_pmc.tar.gz’ from the Releases page.
I ran this command to convert the tf checkpoint to pytorch model

python pytorch-pretrained-BERT/pytorch_pretrained_bert/convert_tf_checkpoint_to_pytorch.py --tf_checkpoint_path="biobert/pubmed_pmc_470k/biobert_model.ckpt.index" --bert_config_file="biobert/pubmed_pmc_470k/bert_config.json" --pytorch_dump_path="biobert/pubmed_pmc_470k/Pytorch/biobert.model"

This created a file ‘biobert.model’ in the specified path.

As mentioned in this link , I compressed ‘biobert.model’ created above and ‘biobert/pubmed_pmc_470k/bert_config.json’ together into a biobert_model.tar.gz
I then ran the run_classifier.py of hugging face bert with the following command, using the tar.gz created above.

python pytorch-pretrained-BERT/examples/run_classifier.py --data_dir="Data/" --bert_model="biobert_model.tar.gz" --task_name="qqp" --output_dir="OutputModels/Pretrained/" --do_train --do_eval --do_lower_case

I get the error

'UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte'

in the line

tokenizer = BertTokenizer.from_pretrained(args.bert_model, do_lower_case=args.do_lower_case)

Am I doing something wrong?

I just wanted to run run_classifier.py code provided by hugging face with biobert pretrained weights in the same way that we run bert with it. Is there a way to do this?

Issue Analytics

State:
Created 4 years ago
Comments:12 (1 by maintainers)

Top GitHub Comments

36reactions

stefan-itcommented, Aug 5, 2019

Try to pass the extracted folder of your converted bioBERT model to the --model_name_or_path 😃

Here’s a short example:

Download the BioBERT v1.1 (+ PubMed 1M) model (or any other model) from the bioBERT repo
Extract the downloaded file, e.g. with tar -xzf biobert_v1.1_pubmed.tar.gz
Convert the bioBERT model TensorFlow checkpoint to a PyTorch and PyTorch-Transformers compatible one: pytorch_transformers bert biobert_v1.1_pubmed/model.ckpt-1000000 biobert_v1.1_pubmed/bert_config.json biobert_v1.1_pubmed/pytorch_model.bin
Move config mv biobert_v1.1_pubmed/bert_config.json biobert_v1.1_pubmed/config.json

Then pass the folder name to the --model_name_or_path argument. You can run this simple script to check, if everything works:

from pytorch_transformers import BertModel
model = BertModel.from_pretrained('biobert_v1.1_pubmed')

6reactions

nipunsadvilkarcommented, Mar 26, 2020

@stefan-it As per new transformers-cli third command would change as follows:

transformers-cli convert --model_type bert \
--tf_checkpoint biobert_v1.1_pubmed/model.ckpt-1000000 \
--config biobert_v1.1_pubmed/bert_config.json \
--pytorch_dump_output biobert_v1.1_pubmed/pytorch_model.bin

Top Results From Across the Web

BERT - Hugging Face

Check out the from_pretrained() method to load the model weights. Bert Model with two heads on top as done during the pretraining: a...

(beta) Dynamic Quantization on BERT - PyTorch

In this tutorial, we will apply the dynamic quantization on a BERT model, closely following the BERT model from the HuggingFace Transformers examples....

Can not find the pytorch model when loading BERT model in ...

The reason for the error seems to be that the pre-trained model weight files are not available or loadable. You can try that...

Convert Tensorflow models to Transformer models - Medium

Convert the Tensorflow model to the HuggingFace Transformers model using ... Tensorflow BioBERT model into a pre-trained PyTorch model.

PyTorch Pretrained Bert - Model Zoo

Six PyTorch models ( torch.nn.Module ) for Bert with pre-trained weights (in the modeling.py file):. BertModel - raw BERT Transformer model (fully ...