File Format for Loading pretrained embeddings
See original GitHub issueHi
Could you please help clarify my doubt
I understand that the function below loads the pre_trained embeddings The comment says augment the dictionary with words that have pretrained embedding
def augment_with_pretrained(dictionary, ext_emb_path, words):
"""
Augment the dictionary with words that have a pretrained embedding.
If `words` is None, we add every word that has a pretrained embedding
to the dictionary, otherwise, we only add the words that are given by
`words` (typically the words in the development and test sets.)
"""
print 'Loading pretrained embeddings from %s...' % ext_emb_path
assert os.path.isfile(ext_emb_path)
My doubt is , I have train, dev, test in conll 2003 format, its very clear, How should the pretrained embedding file be saved?
I am planning to use word2vec , glove models which take each word in sentence as input and give an vector representation of the each of the word in sentences.
How am I suppose to input these vectors to models ? Could you please direct me to the code section which reads this vector representation?
What should be the file format of pretrained embedding file?
How will the word_id pick the vector representation while training which part of the code will handle this ?
Should the pretrained embedding file be like word_id <tab> Vector representation of word ?
Many thanks for clarifying the doubt in advance
with regards Raghav
Issue Analytics
- State:
- Created 7 years ago
- Comments:6 (1 by maintainers)
Top GitHub Comments
Thanks for your eloborate comments, The representation was output by this web API service instead, I have loaded pretrained models of both word2vec as binary model and read glove.txt into a dict. Many thanks for the response again
@raghavchalapathy Hi, I also met the problem. Could you provide a more detailed solution?