question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

combining TEXT.build_vocab with flair embeddings

See original GitHub issue

❓ Questions and Help

Description

Hi, we can use glove embedding when building vocab, using something like:

MIN_FREQ = 2

TEXT.build_vocab(train_data, 
                 min_freq = MIN_FREQ,
                 vectors = "glove.6B.300d",
                 unk_init = torch.Tensor.normal_)

We also can create embeddings using flair library, using for example:

embedding_types: List[TokenEmbeddings] = [
 
    WordEmbeddings('glove'),
 
    # comment in this line to use character embeddings
    #CharacterEmbeddings(),
 
    # comment in these lines to use flair embeddings
    FlairEmbeddings('news-forward'),
    FlairEmbeddings('news-backward'),
    ELMoEmbeddings(),
    BertEmbeddings('bert-base-uncased'),
]
 
embeddings: StackedEmbeddings = StackedEmbeddings(embeddings=embedding_types)

Could I use the above embeddings instead of glove in the above code? Is anything similar to this supported?

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:10 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
mttkcommented, Nov 26, 2019

There are two options, both equally good: Option 1: Copy data from a tensor

vectors = # initialize the vector tensor or leave at None
embedding = nn.Embedding(num_tokens, embedding_size, padding_idx=0)
if vectors is not None: embedding.weight.data.copy_(vectors)
if freeze_vectors: embedding.weight.requires_grad = False

Option 2: use from_pretrained embedding = nn.Embedding.from_pretrained(vectors)

0reactions
mttkcommented, Jan 31, 2020

@Edresson the example code would work only for the non-contextualized embeddings, in the case you want to obtain the embedding matrix vectors. This would of course fail when applied on a concrete instance.

If you want to use flair contextual embeddings, I believe you need to load and use the dataset with their pipeline. There might be a workaround, but I’m not familiar with one at the moment as I don’t actively use flair.

Read more comments on GitHub >

github_iconTop Results From Across the Web

PyTorch: Loading word vectors into Field vocabulary vs ...
build_vocab () creates the vocabulary for your dataset with the corresponding embeddings and discards the rest of the embeddings, because those ...
Read more >
Natural Language Processing Tutorial with SOTA 2020 ...
Abstractive: the model builds new sentences to offer a more coherent summary (human style). In this simple exercise we are single document based ......
Read more >
Sentiment Analysis in Python using Machine Learning
Sentiment Analysis in Python using LSTM to classify whether users are saying positive, negative or neutral about the company or a brand.
Read more >
CWPK #64: Embeddings, Summarization and Entity Recognition
It is now time for us to process the natural language of this text and to use it for creating word and document...
Read more >
AllenNLP v2.10.1
It's even compatible with AI2 Tango! * If you like the framework aspect of AllenNLP, check out flair. It has multiple state-of-art NLP...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found