Error While Getting BERT Embeddings From File
See original GitHub issueHey,
Hope you are well.
Description
I was trying to take some text from some JSON files and then create topic models for those files. In particular, I was trying to calculate the topic distribution for each of the documents.
I followed mostly from what was mentioned on this issue ticket regarding the Pandas DataFrame and got it to work yesterday. However, today it is continuously giving me a weird error - “ValueError: Wrong shape for input_ids (shape torch.Size([1040])) or attention_mask (shape torch.Size([1040]))”
The error is when I try to get BERT embeddings from the file, in particular, when I run the following command:
training_bert = bert_embeddings_from_file(“pre_documents.txt”, “bert-base-nli-mean-tokens”)
Below are the exact crash details.
ValueError Traceback (most recent call last)
<ipython-input-12-7cd62b310f6c> in <module>()
6 handler.prepare()
7
----> 8 training_bert = bert_embeddings_from_file("pre_documents.txt", "bert-base-nli-mean-tokens") # I'm assuming the tweets are in english here.
9
10 training_dataset = CTMDataset(handler.bow, training_bert, handler.idx2token)
7 frames
/usr/local/lib/python3.6/dist-packages/transformers/modeling_utils.py in get_extended_attention_mask(self, attention_mask, input_shape, device)
260 raise ValueError(
261 "Wrong shape for input_ids (shape {}) or attention_mask (shape {})".format(
--> 262 input_shape, attention_mask.shape
263 )
264 )
ValueError: Wrong shape for input_ids (shape torch.Size([1040])) or attention_mask (shape torch.Size([1040]))
For reference, I’m using Google Colab.
I’m pretty confused and any help would be greatly appreciated!
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (2 by maintainers)

Top Related StackOverflow Question
FYI, seems like breaking changes on transformers lib > 3.0.2 (https://github.com/UKPLab/sentence-transformers/issues/398):
this should fix the errors… working notebook: https://github.com/joaorafaelm/notebooks/blob/master/multilang_topic_model.ipynb
Hey,
Yeah, the issue seems to be solved now. Thanks a lot!