Error in roberta.extract_features_aligned_to_words()
See original GitHub issueThrew errors when run the following commands to extract features aligned to words,
roberta = torch.hub.load('pytorch/fairseq', 'roberta.large')
roberta.eval()
ss = 'There were 28 apples in the house. There are 54 apples in the garden.'
roberta.extract_features_aligned_to_words(ss)
Error messages are as following,
~/.cache/torch/hub/pytorch_fairseq_master/fairseq/models/roberta/hub_interface.py in extract_features_aligned_to_words(self, sentence, return_all_hiddens)
125 features = self.extract_features(bpe_toks, return_all_hiddens=return_all_hiddens)
126 features = features.squeeze(0)
--> 127 aligned_feats = alignment_utils.align_features_to_words(self, features, alignment)
128
129 # wrap in spaCy Doc
~/.cache/torch/hub/pytorch_fairseq_master/fairseq/models/roberta/alignment_utils.py in align_features_to_words(roberta, features, alignment)
92 output.append(weighted_features[j])
93 output = torch.stack(output)
---> 94 assert torch.all(torch.abs(output.sum(dim=0) - features.sum(dim=0)) < 1e-4)
95 return output
Issue Analytics
- State:
- Created 4 years ago
- Reactions:3
- Comments:7 (1 by maintainers)
Top Results From Across the Web
Error in roberta.extract_features_aligned_to_words() #1106
The problem is that we assert that the sum of the "aligned" version matches the sum of the original BPE version. Since the...
Read more >RoBERTa - Hugging Face
RoBERTa has the same architecture as BERT, but uses a byte-level BPE as a tokenizer ... Check out the from_pretrained() method to load...
Read more >error received after loading Roberta and XLM_Roberta ...
My Python code seems to work just fine with bert-base and bert-large models , so I want to understand how I might need...
Read more >RoBERTa - PyTorch
An open source machine learning framework that accelerates the path from research prototyping to production deployment.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I face the same problem in my usage. I think the assertion is applied in order to make sure the weighted sum works well. But there might be some numerical error after all the calculations. And I think 1e-4 is just a threshold to ensure the numerical error is not too big.
In my case, I just enlarge the threshold from 1e-4 to 1e-3, and it fixes my problem.
I have the same issue and it’s really hard to figure out which spaces are needed to be removed. And in my case, I do care about alignments as I’m looking to extract embeddings for some specific tokens.
I created a custom function because I don’t want to use spacy tokens and I already have gold tokens available.
Consider the below code:
This code works for simple sentences:
Outputs:
torch.Size([16, 1024])
But when I use another sentence such as:
Then I get the same error:
And it’s not clear to me if there are any extra spaces in the sentence.
Any help here?