Load Biobert pre-trained weights into Bert model with Pytorch bert hugging face run_classifier.py code
See original GitHub issueThese are the steps I followed to get Biobert working with the existing Bert hugging face pytorch code.
-
I downloaded the pre-trained weights ‘biobert_pubmed_pmc.tar.gz’ from the Releases page.
-
I ran this command to convert the tf checkpoint to pytorch model
python pytorch-pretrained-BERT/pytorch_pretrained_bert/convert_tf_checkpoint_to_pytorch.py --tf_checkpoint_path="biobert/pubmed_pmc_470k/biobert_model.ckpt.index" --bert_config_file="biobert/pubmed_pmc_470k/bert_config.json" --pytorch_dump_path="biobert/pubmed_pmc_470k/Pytorch/biobert.model"
This created a file ‘biobert.model’ in the specified path.
-
As mentioned in this link , I compressed ‘biobert.model’ created above and ‘biobert/pubmed_pmc_470k/bert_config.json’ together into a biobert_model.tar.gz
-
I then ran the run_classifier.py of hugging face bert with the following command, using the tar.gz created above.
python pytorch-pretrained-BERT/examples/run_classifier.py --data_dir="Data/" --bert_model="biobert_model.tar.gz" --task_name="qqp" --output_dir="OutputModels/Pretrained/" --do_train --do_eval --do_lower_case
I get the error
'UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte'
in the line
tokenizer = BertTokenizer.from_pretrained(args.bert_model, do_lower_case=args.do_lower_case)
Am I doing something wrong?
I just wanted to run run_classifier.py code provided by hugging face with biobert pretrained weights in the same way that we run bert with it. Is there a way to do this?
Issue Analytics
- State:
- Created 4 years ago
- Comments:12 (1 by maintainers)

Top Related StackOverflow Question
Try to pass the extracted folder of your converted bioBERT model to the
--model_name_or_path😃Here’s a short example:
tar -xzf biobert_v1.1_pubmed.tar.gzpytorch_transformers bert biobert_v1.1_pubmed/model.ckpt-1000000 biobert_v1.1_pubmed/bert_config.json biobert_v1.1_pubmed/pytorch_model.binmv biobert_v1.1_pubmed/bert_config.json biobert_v1.1_pubmed/config.jsonThen pass the folder name to the
--model_name_or_pathargument. You can run this simple script to check, if everything works:@stefan-it As per new
transformers-clithird command would change as follows: