question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Reading Bert model using ClassificationModel is failing with utf-8' codec can't decode byte 0x80 in position 0: invalid start byte

See original GitHub issue

Describe the bug A clear and concise description of what the bug is. Please specify the class causing the issue.

So I saved the classification model as a pickle file and I am trying to read it back it’s failing with the following error

‘utf-8’ codec can’t decode byte 0x80 in position 0: invalid start byte this is because the it was read using r mode in model_args.py line number 96 if read it using rb this should fix the problem

To Reproduce Steps to reproduce the behavior: Save a classification model and try read it back

Expected behavior A clear and concise description of what you expected to happen.

Screenshots If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS databrics cluster

Additional context Add any other context about the problem here.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:10 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
ThilinaRajapaksecommented, Aug 26, 2020

The saved model consists of all the files in the output directory. As the documentation says, it should be the path to the directory itself and not a particular file.

The example from the docs.

You don’t need to manually save the model. It gets saved during training according to the parameters you set (e.g. save_steps, save_model_every_epoch).

If your model files are in /dbfs/FileStore/tables/, then you would load them like so.

model = ClassificationModel("roberta", "/dbfs/FileStore/tables/")
0reactions
stale[bot]commented, Nov 19, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Read more comments on GitHub >

github_iconTop Results From Across the Web

'utf-8' codec can't decode byte 0x80 in position 0: invalid start ...
When I load the pretrained model from the local bin file, there is a decoding problem.
Read more >
UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in ...
The trick here is to understand the encoding of your data all the way through your code. At the moment, you're leaving too...
Read more >
'utf-8' codec can't decode byte 0x80 in position 0: invalid start ...
It went through without any error. Now, when I try to load it, I get an error: model = gensim.models.KeyedVectors.load_word2vec_format ...
Read more >
What is BERT (Language Model) and How Does It Work?
BERT, which stands for Bidirectional Encoder Representations from Transformers, is based on Transformers, a deep learning model in which every output ...
Read more >
BERT - Hugging Face
We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found