build_vocab with custom word embedding
See original GitHub issueHi 😃 I want to use customized bio-word embedding to do some text classification.
And I can’t find how.
Some old tutorial says there is ‘wv_dir’ keyword argument, which I tried and failed :
TypeError Traceback (most recent call last)
<ipython-input-48-ac09f554719e> in <module>()
1 test_field = data.Field()
2 lang_data = datasets.LanguageModelingDataset(path='pr_data/processed_neg.txt',text_field=test_field)
----> 3 voc = torchtext.vocab.Vocab(wv_dir='bio_wordemb/PubMed-and-PMC-w2v.txt')
4
5 # test_field.build_vocab(lang_data,wv_dir='bio_wordemb/PubMed-and-PMC-w2v.txt')
TypeError: __init__() got an unexpected keyword argument 'wv_dir'
Just like we can load pretrained GloVe embedding using TEXTFIELD.build_vocab(data, vectors='glove.6B.100d')
, is there similar way to load customized embedding?
Any help would be much appreciated. Thanks!
Issue Analytics
- State:
- Created 6 years ago
- Reactions:4
- Comments:16 (1 by maintainers)
Top Results From Across the Web
PyTorch: Loading word vectors into Field vocabulary vs ...
I would like to create a PyTorch Embedding layer (a matrix of size V x D , where V is over vocabulary word...
Read more >torchtext.vocab - Read the Docs
vectors – one of or a list containing instantiations of the GloVe, CharNGram, or Vectors ... initialize out-of-vocabulary word vectors to zero vectors;...
Read more >models.word2vec – Word2vec embeddings — gensim
This module implements the word2vec family of algorithms, using highly optimized C routines, data streaming and Pythonic interfaces. The word2vec algorithms ...
Read more >Tutorial - How to train your custom word embedding - Kaggle
In this notebook, I will demonstrate how to train your custom word2vec using Gensim. For those who are new to word embeddings and...
Read more >Using fine-tuned Gensim Word2Vec Embeddings with ...
Torchtext handles creating vector embeddings for words in your dataset in the following way. ... from tqdm import tqdm_notebook# build vocab
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
This works,
xxx.vec
should be the standard word2vec format file.Yes, you are right! There is a bit more info here.