Replacing Deepmatcher embeddings with own embeddings
See original GitHub issueHi there,
I’m trying to replace the sequence embeddings for each attribute created by Deepmatcher with embeddings generated by Sentence BERT.
It seems the actual creation of the sequence embeddings happens in dataset.py and the final weighted sequence embeddings are stored in attr_embeddings[name]
Could I get away with simply replacing these vectors with my own or would that lead to issues? Are they in the same order as the original dataset (the examples)?
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:7
Top Results From Across the Web
Creating Embeddings of Heterogeneous Relational Datasets ...
There has been prior work that learn embeddings for specific tasks like entity matching (such as DeepER [14] and DeepMatcher [30]) and schema ......
Read more >Obtaining Embeddings | Machine Learning - Google Developers
This approach gets you an embedding well customized for your particular system, but may take longer than training the embedding separately.
Read more >Deep Learning for Entity Matching: A Design Space Exploration
A common approach is to replace infrequent words with a special token UNK, and use this to model OOV words. Another option is...
Read more >anhaidgroup/deepmatcher - GitHub
Python package for performing Entity and Text Matching using Deep Learning. - GitHub - anhaidgroup/deepmatcher: Python package for performing Entity and ...
Read more >Pre-trained Word Embeddings or Embedding Layer?
For pre-trained embedding experiments, I replace the parameters of this layer with pre-trained embeddings, maintaining the index and freeze ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
ziqizhang to solve this I save the model under the name “wiki.es.bin”, because deepmatcher looking for file called wiki.{}.bin
model_fast = fastText.train_unsupervised(“text.txt”) model_fast.save_model(‘wiki.es.bin’)
check class FastTextBinary(vocab.Vectors) in field.py
https://github.com/anhaidgroup/deepmatcher/blob/master/deepmatcher/data/field.py
@MocktaiLEngineer Thanks for the hint. Makes sense but I was not aware of that option.