Order of rdf triples embeddings
See original GitHub issue❓ Question
I have generated embeddings for RDF triples URIs (from DBpedia) using pyRDF2vec. When I am passing the list(set(entities)) in the transformer.fir_transform(), I am not sure about the sequence of order of embeddings generated by the pyRDF2vec transformer. Will these sequences or order affect the results when I will concatenate these rdf embeddings with sentence context embeddings while training the model?
`code:
def rdftriplestovec(filepath,entities):
kg = KG(filepath)
transformer = RDF2VecTransformer(walkers=[RandomWalker(3, None)],
embedder=Word2Vec(size=500))
entities_names=[entity.name for entity in kg._entities]
filtered_entities = [e for e in entities if e in entities_names]
not_found = set(entities) - set(filtered_entities)
print('entities could not be found in the KG! Removing them')
entities = list(set(filtered_entities))
embeddings = transformer.fit_transform(kg, entities)
print(embeddings)
return embeddings`
Sample of rdf triples in .ttl file (the predicate is of owl type): (passing as filepath in rdftriplestovec function)
@prefix owl: http://www.w3.org/2002/07/owl# .
http://dbpedia.org/resource/AT&T owl:Ontology http://dbpedia.org/resource/Espionage, http://dbpedia.org/resource/Police .
http://dbpedia.org/resource/Actor owl:Ontology http://dbpedia.org/resource/Major, http://dbpedia.org/resource/Plea, http://dbpedia.org/resource/United_States .
http://dbpedia.org/resource/Actor_model owl:Ontology http://dbpedia.org/resource/Visibility .
http://dbpedia.org/resource/Advertising owl:Ontology http://dbpedia.org/resource/Indian_Americans .
http://dbpedia.org/resource/Afghan_National_Army owl:Ontology http://dbpedia.org/resource/Enemy .
http://dbpedia.org/resource/Ago,_Mie owl:Ontology http://dbpedia.org/resource/Haunt_(comics), http://dbpedia.org/resource/Human_back, http://dbpedia.org/resource/Jesus
sample: URI list which I get from DBpedia API for my dataset (passing as entities in function rdftriplestovec)
[‘http://dbpedia.org/resource/United_States_House_of_Representatives’, ‘http://dbpedia.org/resource/Australian_Democrats’, ‘http://dbpedia.org/resource/Aide-de-camp’, ‘http://dbpedia.org/resource/United_Kingdom’, ‘http://dbpedia.org/resource/Even_language’, ‘http://dbpedia.org/resource/James_Comey’, ‘http://dbpedia.org/resource/Letter_(message)’, ‘http://dbpedia.org/resource/Jason_Chaffetz’, ‘http://dbpedia.org/resource/Twitter’, ‘http://dbpedia.org/resource/Italian_language’, ‘http://dbpedia.org/resource/Robb_Flynn’, ‘http://dbpedia.org/resource/Hillary_Clinton’, ‘http://dbpedia.org/resource/Breitbart_News’, ‘http://dbpedia.org/resource/Truth’, ‘http://dbpedia.org/resource/Get_(divorce_document)’, ‘http://dbpedia.org/resource/Inactivated_vaccine’, ‘http://dbpedia.org/resource/India’, ‘http://dbpedia.org/resource/Single_(music)’, ‘http://dbpedia.org/resource/November_2017_Somalia_airstrike’, ‘http://dbpedia.org/resource/Identified’, ‘http://dbpedia.org/resource/Iranian_peoples’, ‘http://dbpedia.org/resource/Woman’, ‘http://dbpedia.org/resource/Fiction’, ‘http://dbpedia.org/resource/Unpublished_Story’, ‘http://dbpedia.org/resource/Stoning’]
Issue Analytics
- State:
- Created 2 years ago
- Comments:7 (4 by maintainers)
Yes, it returns two things: embeddings and literals. Both will be numpy arrays.
Works like a charm. But as I warned, if you use
set()
in python, the order will change! Try to avoid it (which I am not doing here), or store the result of converting to set() so that you can reconstruct the order.