combining TEXT.build_vocab with flair embeddings
See original GitHub issue❓ Questions and Help
Description
Hi, we can use glove embedding when building vocab, using something like:
MIN_FREQ = 2
TEXT.build_vocab(train_data,
min_freq = MIN_FREQ,
vectors = "glove.6B.300d",
unk_init = torch.Tensor.normal_)
We also can create embeddings using flair library, using for example:
embedding_types: List[TokenEmbeddings] = [
WordEmbeddings('glove'),
# comment in this line to use character embeddings
#CharacterEmbeddings(),
# comment in these lines to use flair embeddings
FlairEmbeddings('news-forward'),
FlairEmbeddings('news-backward'),
ELMoEmbeddings(),
BertEmbeddings('bert-base-uncased'),
]
embeddings: StackedEmbeddings = StackedEmbeddings(embeddings=embedding_types)
Could I use the above embeddings instead of glove in the above code? Is anything similar to this supported?
Issue Analytics
- State:
- Created 4 years ago
- Comments:10 (6 by maintainers)
Top Results From Across the Web
PyTorch: Loading word vectors into Field vocabulary vs ...
build_vocab () creates the vocabulary for your dataset with the corresponding embeddings and discards the rest of the embeddings, because those ...
Read more >Natural Language Processing Tutorial with SOTA 2020 ...
Abstractive: the model builds new sentences to offer a more coherent summary (human style). In this simple exercise we are single document based ......
Read more >Sentiment Analysis in Python using Machine Learning
Sentiment Analysis in Python using LSTM to classify whether users are saying positive, negative or neutral about the company or a brand.
Read more >CWPK #64: Embeddings, Summarization and Entity Recognition
It is now time for us to process the natural language of this text and to use it for creating word and document...
Read more >AllenNLP v2.10.1
It's even compatible with AI2 Tango! * If you like the framework aspect of AllenNLP, check out flair. It has multiple state-of-art NLP...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
There are two options, both equally good: Option 1: Copy data from a tensor
Option 2: use
from_pretrained
embedding = nn.Embedding.from_pretrained(vectors)
@Edresson the example code would work only for the non-contextualized embeddings, in the case you want to obtain the embedding matrix vectors. This would of course fail when applied on a concrete instance.
If you want to use flair contextual embeddings, I believe you need to load and use the dataset with their pipeline. There might be a workaround, but I’m not familiar with one at the moment as I don’t actively use flair.