question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

File Format for Loading pretrained embeddings

See original GitHub issue

Hi

Could you please help clarify my doubt

I understand that the function below loads the pre_trained embeddings The comment says augment the dictionary with words that have pretrained embedding

def augment_with_pretrained(dictionary, ext_emb_path, words):
    """
    Augment the dictionary with words that have a pretrained embedding.
    If `words` is None, we add every word that has a pretrained embedding
    to the dictionary, otherwise, we only add the words that are given by
    `words` (typically the words in the development and test sets.)
    """
    print 'Loading pretrained embeddings from %s...' % ext_emb_path
    assert os.path.isfile(ext_emb_path)

My doubt is , I have train, dev, test in conll 2003 format, its very clear, How should the pretrained embedding file be saved?

I am planning to use word2vec , glove models which take each word in sentence as input and give an vector representation of the each of the word in sentences.

How am I suppose to input these vectors to models ? Could you please direct me to the code section which reads this vector representation?

What should be the file format of pretrained embedding file?

How will the word_id pick the vector representation while training which part of the code will handle this ?

Should the pretrained embedding file be like word_id <tab> Vector representation of word ?

Many thanks for clarifying the doubt in advance

with regards Raghav

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:6 (1 by maintainers)

github_iconTop GitHub Comments

1reaction
raghavchalapathycommented, Sep 21, 2016

Thanks for your eloborate comments, The representation was output by this web API service instead, I have loaded pretrained models of both word2vec as binary model and read glove.txt into a dict. Many thanks for the response again

0reactions
1049451037commented, May 18, 2019

@raghavchalapathy Hi, I also met the problem. Could you provide a more detailed solution?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Loading Glove Pre-trained Word Embedding Model from ...
Use it as : model = load_glove_model(“path/to/txt/file/also/exclude/extension of filename.”) Alternative and Faster Way. Step 1: Once you have a text file, then ...
Read more >
Using pre-trained word embeddings
Load pre-trained word embeddings. Let's download pre-trained GloVe embeddings (a 822M zip file). You'll need to run the following commands:.
Read more >
Guide to Using Pre-trained Word Embeddings in NLP
In this article, we'll take a look at how you can use pre-trained word embeddings to classify text with TensorFlow. Full code included....
Read more >
Pretrained Embeddings - Wikipedia2Vec
We provide pretrained embeddings for 12 languages in binary and text format. The binary files can be loaded using the Wikipedia2Vec.load() method (see...
Read more >
python 3.x - Loading pre-trained word embeddings
I am trying to load the pre-trained ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found