question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Error While Getting BERT Embeddings From File

See original GitHub issue

Hey,

Hope you are well.

Description

I was trying to take some text from some JSON files and then create topic models for those files. In particular, I was trying to calculate the topic distribution for each of the documents.

I followed mostly from what was mentioned on this issue ticket regarding the Pandas DataFrame and got it to work yesterday. However, today it is continuously giving me a weird error - “ValueError: Wrong shape for input_ids (shape torch.Size([1040])) or attention_mask (shape torch.Size([1040]))”

The error is when I try to get BERT embeddings from the file, in particular, when I run the following command:

training_bert = bert_embeddings_from_file(“pre_documents.txt”, “bert-base-nli-mean-tokens”)

Below are the exact crash details.

ValueError                                Traceback (most recent call last)
<ipython-input-12-7cd62b310f6c> in <module>()
      6 handler.prepare()
      7 
----> 8 training_bert = bert_embeddings_from_file("pre_documents.txt", "bert-base-nli-mean-tokens") # I'm assuming the tweets are in english here.
      9 
     10 training_dataset = CTMDataset(handler.bow, training_bert, handler.idx2token)

7 frames
/usr/local/lib/python3.6/dist-packages/transformers/modeling_utils.py in get_extended_attention_mask(self, attention_mask, input_shape, device)
    260             raise ValueError(
    261                 "Wrong shape for input_ids (shape {}) or attention_mask (shape {})".format(
--> 262                     input_shape, attention_mask.shape
    263                 )
    264             )

ValueError: Wrong shape for input_ids (shape torch.Size([1040])) or attention_mask (shape torch.Size([1040]))

For reference, I’m using Google Colab.

I’m pretty confused and any help would be greatly appreciated!

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
joaorafaelmcommented, Sep 2, 2020

FYI, seems like breaking changes on transformers lib > 3.0.2 (https://github.com/UKPLab/sentence-transformers/issues/398):

!pip uninstall transformers -y
!pip install transformers==3.0.2

this should fix the errors… working notebook: https://github.com/joaorafaelm/notebooks/blob/master/multilang_topic_model.ipynb

0reactions
pspahwacommented, Sep 3, 2020

Hey,

Yeah, the issue seems to be solved now. Thanks a lot!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Error while using BERT transforemer embeddings · Issue #1793
I think this error is thrown when the tokenizer model does not specify a maximum length. Could you try again with the current...
Read more >
how to resolve errors in bert embedding installation
This is the error while import bert-embeddings it gets installed but while importing it gives error like above and the numpy version it ......
Read more >
bert-embeddings - PyPI
Create positional embeddings based on TinyBERT or similar bert models. ... This is a non-permanent solution to error when installing PyTorch ...
Read more >
Vectorization & Embeddings[ELMo, BERT/GPT] - Kaggle
The data is collated from Stanford Dataset and the sentiment of the text corpus is either positive or negative. We will be analysing...
Read more >
Text Classification with BERT Tokenizer and TF 2.0 in Python
BERT is a text representation technique similar to Word Embeddings. In this article, we'll be using BERT and TensorFlow 2.0 for text ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found