question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Generate embeddings of new points

See original GitHub issue

Hello, I have two questions: 1- When we use .fit to generate the embeddings, how to save the model and use it to create the embeddings of new entities of the KG? 2- Before when I used this library, I did not get any error for the label_predicates, while now it returns an unexpected keyword argument! kg = KG("dataset.xml", label_predicates=[rdflib.URIRef(x) for x in label_predicates])

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:17 (1 by maintainers)

github_iconTop GitHub Comments

4reactions
bsteenwicommented, Dec 8, 2021

Hmm ok fair point, seems like a bug indeed…

I’ve investigated this problem a little more in-depth together with @GillesVandewiele and we concluded the following: Our walk hashing procedure within pyrdf2vec is as follows:

vertex.name if i == 0 or i % 2 == 1 or self.md5_bytes is None else hash...

More in general, we do not hash the first node within a walk as this node is in most cases the root node (those for which you want to generate an embedding).

we check if the root node is within the vw vocab during the transform function

if not all([entity in self._model.wv for entity in entities]):
    raise ValueError(
                "The entities must have been provided to fit() first "
                "before they can be transformed into a numerical vector."
            )

But with the with_reverse option set to true, the walks are constructed differently:

if self.with_reverse:
            return [
                r_walk[:-1] + walk
                for walk in fct_search(kg, entity)
                for r_walk in fct_search(kg, entity, is_reverse=True)
            ]

Here you can see that the reverse paths are prefixed to the normal paths, resulting in a node at index 0 different from the root node. The root node will now be hashed somewhere within this path, and cannot be found in the wv vocab in its original form.

As @GillesVandewiele suggested, we might better hash all subjects and predicates within pyrdf2vec and check if the hashed root nodes are available in de wv vocab.

2reactions
bsteenwicommented, Dec 7, 2021
import os

import pandas as pd
from sklearn.metrics import accuracy_score, confusion_matrix
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC

from pyrdf2vec import RDF2VecTransformer
from pyrdf2vec.embedders import Word2Vec
from pyrdf2vec.graphs import KG
from pyrdf2vec.walkers import RandomWalker


RANDOM_STATE = 22
if __name__ == '__main__':
    db_kg = KG("https://dbpedia.org/sparql", skip_verify=True)
    print(db_kg)

    db_transformer = RDF2VecTransformer(walkers=[RandomWalker(1, None, n_jobs=2, random_state=RANDOM_STATE)], embedder=Word2Vec(size = 100), verbose=1)
    db_entities = ['http://dbpedia.org/resource/Belgium','http://dbpedia.org/resource/France']
    db_walk_embeddings, _ = db_transformer.fit_transform(
    db_kg,
    db_entities,
    )

    db_transformer.save("model")


    db_transformer = RDF2VecTransformer(walkers=[RandomWalker(1, None, n_jobs=2, random_state=RANDOM_STATE)], embedder=Word2Vec(size = 100)).load("model")

    input_entity = ['http://dbpedia.org/resource/Hallstatt_culture']
    db_transformer.fit_transform(
        db_kg,
        input_entity,
        is_update=True,
    )

This works on my system without errors (pyrdf2vec version 0.2.3) (depth is 1 to speed up this test)

Read more comments on GitHub >

github_iconTop Results From Across the Web

Obtaining Embeddings | Machine Learning - Google Developers
There are a number of ways to get an embedding, including a state-of-the-art algorithm created at Google. Standard Dimensionality Reduction ...
Read more >
Creating Word Embeddings: Coding the Word2Vec Algorithm ...
I started thinking about how to create word embeddings from scratch and thus ... Read the text -> Preprocess text -> Create (x,...
Read more >
Embeddings in Machine Learning: Everything You Need to ...
This unsupervised technique maps a single category to a vector and generates a binary representation. The actual process is simple. We create a ......
Read more >
Getting Started With Embeddings - Hugging Face
The first step is selecting an existing pre-trained model for creating the embeddings. We can choose a model from the Sentence Transformers ...
Read more >
Using Embeddings to Make Complex Data Simple - Toptal
All embeddings attempt to reduce the dimensionality of data while preserving “essential” information in the data, but every embedding does it in its...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found