question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Replacing Deepmatcher embeddings with own embeddings

See original GitHub issue

Hi there,

I’m trying to replace the sequence embeddings for each attribute created by Deepmatcher with embeddings generated by Sentence BERT.

It seems the actual creation of the sequence embeddings happens in dataset.py and the final weighted sequence embeddings are stored in attr_embeddings[name] Could I get away with simply replacing these vectors with my own or would that lead to issues? Are they in the same order as the original dataset (the examples)?

Issue Analytics

  • State:open
  • Created 3 years ago
  • Reactions:1
  • Comments:7

github_iconTop GitHub Comments

2reactions
rodrigoexecommented, Oct 22, 2020

ziqizhang to solve this I save the model under the name “wiki.es.bin”, because deepmatcher looking for file called wiki.{}.bin

model_fast = fastText.train_unsupervised(“text.txt”) model_fast.save_model(‘wiki.es.bin’)

check class FastTextBinary(vocab.Vectors) in field.py

https://github.com/anhaidgroup/deepmatcher/blob/master/deepmatcher/data/field.py

0reactions
mathisloevenichcommented, Jul 21, 2021

@MocktaiLEngineer Thanks for the hint. Makes sense but I was not aware of that option.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Creating Embeddings of Heterogeneous Relational Datasets ...
There has been prior work that learn embeddings for specific tasks like entity matching (such as DeepER [14] and DeepMatcher [30]) and schema ......
Read more >
Obtaining Embeddings | Machine Learning - Google Developers
This approach gets you an embedding well customized for your particular system, but may take longer than training the embedding separately.
Read more >
Deep Learning for Entity Matching: A Design Space Exploration
A common approach is to replace infrequent words with a special token UNK, and use this to model OOV words. Another option is...
Read more >
anhaidgroup/deepmatcher - GitHub
Python package for performing Entity and Text Matching using Deep Learning. - GitHub - anhaidgroup/deepmatcher: Python package for performing Entity and ...
Read more >
Pre-trained Word Embeddings or Embedding Layer?
For pre-trained embedding experiments, I replace the parameters of this layer with pre-trained embeddings, maintaining the index and freeze ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found