Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

combining TEXT.build_vocab with flair embeddings

See original GitHub issue

❓ Questions and Help

Description

Hi, we can use glove embedding when building vocab, using something like:

MIN_FREQ = 2

TEXT.build_vocab(train_data, 
                 min_freq = MIN_FREQ,
                 vectors = "glove.6B.300d",
                 unk_init = torch.Tensor.normal_)

We also can create embeddings using flair library, using for example:

embedding_types: List[TokenEmbeddings] = [
 
    WordEmbeddings('glove'),
 
    # comment in this line to use character embeddings
    #CharacterEmbeddings(),
 
    # comment in these lines to use flair embeddings
    FlairEmbeddings('news-forward'),
    FlairEmbeddings('news-backward'),
    ELMoEmbeddings(),
    BertEmbeddings('bert-base-uncased'),
]
 
embeddings: StackedEmbeddings = StackedEmbeddings(embeddings=embedding_types)

Could I use the above embeddings instead of glove in the above code? Is anything similar to this supported?

Issue Analytics

State:
Created 4 years ago
Comments:10 (6 by maintainers)

Top GitHub Comments

1reaction

mttkcommented, Nov 26, 2019

There are two options, both equally good: Option 1: Copy data from a tensor

vectors = # initialize the vector tensor or leave at None
embedding = nn.Embedding(num_tokens, embedding_size, padding_idx=0)
if vectors is not None: embedding.weight.data.copy_(vectors)
if freeze_vectors: embedding.weight.requires_grad = False

Option 2: use from_pretrained embedding = nn.Embedding.from_pretrained(vectors)

0reactions

mttkcommented, Jan 31, 2020

@Edresson the example code would work only for the non-contextualized embeddings, in the case you want to obtain the embedding matrix vectors. This would of course fail when applied on a concrete instance.

If you want to use flair contextual embeddings, I believe you need to load and use the dataset with their pipeline. There might be a workaround, but I’m not familiar with one at the moment as I don’t actively use flair.

Top Results From Across the Web

PyTorch: Loading word vectors into Field vocabulary vs ...

build_vocab () creates the vocabulary for your dataset with the corresponding embeddings and discards the rest of the embeddings, because those ...

Natural Language Processing Tutorial with SOTA 2020 ...

Abstractive: the model builds new sentences to offer a more coherent summary (human style). In this simple exercise we are single document based ......

Sentiment Analysis in Python using Machine Learning

Sentiment Analysis in Python using LSTM to classify whether users are saying positive, negative or neutral about the company or a brand.

CWPK #64: Embeddings, Summarization and Entity Recognition

It is now time for us to process the natural language of this text and to use it for creating word and document...

AllenNLP v2.10.1

It's even compatible with AI2 Tango! * If you like the framework aspect of AllenNLP, check out flair. It has multiple state-of-art NLP...