Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Replacing Deepmatcher embeddings with own embeddings

See original GitHub issue

Hi there,

I’m trying to replace the sequence embeddings for each attribute created by Deepmatcher with embeddings generated by Sentence BERT.

It seems the actual creation of the sequence embeddings happens in dataset.py and the final weighted sequence embeddings are stored in attr_embeddings[name] Could I get away with simply replacing these vectors with my own or would that lead to issues? Are they in the same order as the original dataset (the examples)?

Issue Analytics

State:
Created 3 years ago
Reactions:1
Comments:7

Top GitHub Comments

2reactions

rodrigoexecommented, Oct 22, 2020

ziqizhang to solve this I save the model under the name “wiki.es.bin”, because deepmatcher looking for file called wiki.{}.bin

model_fast = fastText.train_unsupervised(“text.txt”) model_fast.save_model(‘wiki.es.bin’)

check class FastTextBinary(vocab.Vectors) in field.py

https://github.com/anhaidgroup/deepmatcher/blob/master/deepmatcher/data/field.py

0reactions

mathisloevenichcommented, Jul 21, 2021

@MocktaiLEngineer Thanks for the hint. Makes sense but I was not aware of that option.

Top Results From Across the Web

Creating Embeddings of Heterogeneous Relational Datasets ...

There has been prior work that learn embeddings for specific tasks like entity matching (such as DeepER [14] and DeepMatcher [30]) and schema ......

Obtaining Embeddings | Machine Learning - Google Developers

This approach gets you an embedding well customized for your particular system, but may take longer than training the embedding separately.

Deep Learning for Entity Matching: A Design Space Exploration

A common approach is to replace infrequent words with a special token UNK, and use this to model OOV words. Another option is...

anhaidgroup/deepmatcher - GitHub

Python package for performing Entity and Text Matching using Deep Learning. - GitHub - anhaidgroup/deepmatcher: Python package for performing Entity and ...

Pre-trained Word Embeddings or Embedding Layer?

For pre-trained embedding experiments, I replace the parameters of this layer with pre-trained embeddings, maintaining the index and freeze ...