How to generate BERT/Roberta word/sentence embedding?
See original GitHub issueI know the stanford operation.
tokenizer = RobertaTokenizer.from_pretrained('roberta-large')
model = RobertaModel.from_pretrained('roberta-large')
input_ids = torch.tensor(tokenizer.encode("Hello, my <span class="highlighter highlight-on">dog</span> is cute", add_special_tokens=True)).unsqueeze(0) # Batch size 1
outputs = model(input_ids)
last_hidden_states = outputs[0] #(batch_size, input_len, embedding_size) But I need single vector for each sentence
But. I am working on improving RNN with incorporating Bert-like pretrain model embedding. How to get a sentence embedding so in this case(one vector for entire sentence)? Averaging or some transformation of the last_hidden_states? Is add_special_token
necessary? Any suggested papers to read?
Issue Analytics
- State:
- Created 4 years ago
- Reactions:6
- Comments:5
Top Results From Across the Web
sentence-transformers/nli-roberta-large
This model is deprecated. Please don't use it as it produces sentence embeddings of low quality. You can find recommended sentence embedding models...
Read more >How can I get RoBERTa word embeddings?
Given a sentence of the type 'Roberta is a heavily optimized version of BERT.', I need to get the embeddings for each of...
Read more >BERT Word Embeddings Tutorial
Creating word and sentence vectors from hidden states ... we will use BERT to extract features, namely word and sentence embedding vectors, ...
Read more >CLRP / How to get Text Embedding from RoBERTa
get CLS Token; pool RoBERTa output (RoBERTa output = word embeddings) ... BERTなど) を使って文章のベクトル化 (text embedding) を行う方法を紹介する。 RoBERTa ...
Read more >An Intuitive Explanation of Sentence-BERT
After the sentences were inputted to BERT, because of BERT's word-level embeddings, the most common way to generate a sentence embedding was by...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Hey @zjplab, for sentence embeddings, I’d recommend this library https://github.com/UKPLab/sentence-transformers along with their paper. They explain how they get their sentence embeddings as well as the pros and cons to several different methods of doing it. They have embeddings for bert/roberta and many more
Hi there. A few weeks or months ago, I wrote this notebook to introduce my colleagues to doing inference on LMs. In other words: how can I get a sentence representation out of them. You can have a look here. It should be self-explanatory.