Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to generate BERT/Roberta word/sentence embedding?

See original GitHub issue

I know the stanford operation.

tokenizer = RobertaTokenizer.from_pretrained('roberta-large')
model = RobertaModel.from_pretrained('roberta-large')

input_ids = torch.tensor(tokenizer.encode("Hello, my <span class="highlighter highlight-on">dog</span> is cute", add_special_tokens=True)).unsqueeze(0)  # Batch size 1
outputs = model(input_ids)

last_hidden_states = outputs[0] #(batch_size, input_len, embedding_size) But I need single vector for each sentence

But. I am working on improving RNN with incorporating Bert-like pretrain model embedding. How to get a sentence embedding so in this case(one vector for entire sentence)? Averaging or some transformation of the last_hidden_states? Is add_special_token necessary? Any suggested papers to read?

Issue Analytics

State:
Created 4 years ago
Reactions:6
Comments:5

Top GitHub Comments

23reactions

cformosacommented, Feb 24, 2020

Hey @zjplab, for sentence embeddings, I’d recommend this library https://github.com/UKPLab/sentence-transformers along with their paper. They explain how they get their sentence embeddings as well as the pros and cons to several different methods of doing it. They have embeddings for bert/roberta and many more

16reactions

BramVanroycommented, Feb 24, 2020

Hi there. A few weeks or months ago, I wrote this notebook to introduce my colleagues to doing inference on LMs. In other words: how can I get a sentence representation out of them. You can have a look here. It should be self-explanatory.

Top Results From Across the Web

sentence-transformers/nli-roberta-large

This model is deprecated. Please don't use it as it produces sentence embeddings of low quality. You can find recommended sentence embedding models...

How can I get RoBERTa word embeddings?

Given a sentence of the type 'Roberta is a heavily optimized version of BERT.', I need to get the embeddings for each of...

BERT Word Embeddings Tutorial

Creating word and sentence vectors from hidden states ... we will use BERT to extract features, namely word and sentence embedding vectors, ...

CLRP / How to get Text Embedding from RoBERTa

get CLS Token; pool RoBERTa output (RoBERTa output = word embeddings) ... BERTなど) を使って文章のベクトル化 (text embedding) を行う方法を紹介する。 RoBERTa ...

An Intuitive Explanation of Sentence-BERT

After the sentences were inputted to BERT, because of BERT's word-level embeddings, the most common way to generate a sentence embedding was by...