question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

build_vocab with custom word embedding

See original GitHub issue

Hi 😃 I want to use customized bio-word embedding to do some text classification.

And I can’t find how.

Some old tutorial says there is ‘wv_dir’ keyword argument, which I tried and failed :

TypeError                                 Traceback (most recent call last)
<ipython-input-48-ac09f554719e> in <module>()
      1 test_field = data.Field()
      2 lang_data = datasets.LanguageModelingDataset(path='pr_data/processed_neg.txt',text_field=test_field)
----> 3 voc = torchtext.vocab.Vocab(wv_dir='bio_wordemb/PubMed-and-PMC-w2v.txt')
      4 
      5 # test_field.build_vocab(lang_data,wv_dir='bio_wordemb/PubMed-and-PMC-w2v.txt')

TypeError: __init__() got an unexpected keyword argument 'wv_dir'

Just like we can load pretrained GloVe embedding using TEXTFIELD.build_vocab(data, vectors='glove.6B.100d'), is there similar way to load customized embedding?

Any help would be much appreciated. Thanks!

Issue Analytics

  • State:open
  • Created 6 years ago
  • Reactions:4
  • Comments:16 (1 by maintainers)

github_iconTop GitHub Comments

12reactions
beboundcommented, Dec 14, 2018

This works, xxx.vecshould be the standard word2vec format file.

from torchtext.vocab import Vectors

vectors = Vectors(name='xxx.vec', cache='./')
TEXT.build_vocab(train, val, test, vectors=vectors)
2reactions
pablogpscommented, Dec 14, 2018

Yes, you are right! There is a bit more info here.

Read more comments on GitHub >

github_iconTop Results From Across the Web

PyTorch: Loading word vectors into Field vocabulary vs ...
I would like to create a PyTorch Embedding layer (a matrix of size V x D , where V is over vocabulary word...
Read more >
torchtext.vocab - Read the Docs
vectors – one of or a list containing instantiations of the GloVe, CharNGram, or Vectors ... initialize out-of-vocabulary word vectors to zero vectors;...
Read more >
models.word2vec – Word2vec embeddings — gensim
This module implements the word2vec family of algorithms, using highly optimized C routines, data streaming and Pythonic interfaces. The word2vec algorithms ...
Read more >
Tutorial - How to train your custom word embedding - Kaggle
In this notebook, I will demonstrate how to train your custom word2vec using Gensim. For those who are new to word embeddings and...
Read more >
Using fine-tuned Gensim Word2Vec Embeddings with ...
Torchtext handles creating vector embeddings for words in your dataset in the following way. ... from tqdm import tqdm_notebook# build vocab
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found