Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

UnsupervisedSampler generates invalid context pairs for unsupervised HinSAGE with multiple node types

See original GitHub issue

Describe the bug

Unsupervised GraphSAGE works by creating context pairs of node IDs using UnsupervisedSampler and essentially using them as links for a link prediction task. One can try unsupervised HinSAGE in an analogous way, creating context pairs of nodes and doing HinSAGE link prediction, however this doesn’t work, because HinSAGE link prediction relies on the source/target nodes having specific types and UnsupervisedSampler generates link pairs without thinking about their types at all.

See https://github.com/stellargraph/stellargraph/issues/1022#issuecomment-595498135 for some possible work-arounds.

To Reproduce

import stellargraph as sg
import pandas as pd

a = pd.DataFrame([12], index=[0])
b = pd.DataFrame([34], index=[1])

e = pd.DataFrame({"source":[0, 1],"target":[1, 1]})

G = sg.StellarGraph(nodes={'a': a,'b': b}, edges=e)

nodes = list(a.index)
number_of_walks = 1
length = 5
unsupervised_samples = sg.data.UnsupervisedSampler(
    G, nodes=nodes, length=length, number_of_walks=number_of_walks
)
[(links, labels)] = unsupervised_samples.run(100)
print(links)
generator = sg.mapper.HinSAGELinkGenerator(
    G, batch_size, num_samples, head_node_types=('a', 'b')
)

train_gen = generator.flow(links, labels)

Also, if the unsupervised_samples is passed directly to flow, one gets errors later, when the Sequence (train_gen) is passed to a model.fit function (see https://github.com/stellargraph/stellargraph/issues/1022#issuecomment-595047182).

Observed behavior

The print statement shows:

[[0 1]
 [0 0]
 [0 1]
 [0 1]
 [0 1]
 [0 1]
 [0 1]
 [0 1]]

The generator.flow line throws an exception:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-89-0b0e094edc41> in <module>
----> 1 train_gen = generator.flow(links, labels)

~/projects/stellargraph/stellargraph/stellargraph/mapper/sampled_link_generators.py in flow(self, link_ids, targets, shuffle, seed)
    148                 ):
    149                     raise ValueError(
--> 150                         f"Node pair ({src}, {dst}) not of expected type ({expected_src_type}, {expected_dst_type})"
    151                     )
    152 

ValueError: Node pair (0, 0) not of expected type (a, b)

Expected behavior

For the code to actually work, either:

HinSAGE should support automatically configuring the UnsupervisedSampler to only get useful info
There should be some way to manually configure the unsupervised sampler to work correctly

In lieu of it working properly, we could potentially flag it more obviously.

Environment

Operating system: Darwin-18.6.0-x86_64-i386-64bit Python version:

3.6.9 (default, Jul 10 2019, 12:25:55) 
[GCC 4.2.1 Compatible Apple LLVM 10.0.1 (clang-1001.0.46.4)]

Package versions:

absl-py==0.8.0
ansiwrap==0.8.4
appdirs==1.4.3
appnope==0.1.0
astor==0.8.0
atomicwrites==1.3.0
attrs==19.3.0
backcall==0.1.0
black==19.10b0
bleach==3.1.0
boto==2.49.0
boto3==1.9.230
botocore==1.12.230
cachetools==4.0.0
certifi==2019.9.11
chardet==3.0.4
Click==7.0
coverage==4.5.4
coveralls==1.8.2
cycler==0.10.0
decorator==4.4.0
defusedxml==0.6.0
docopt==0.6.2
docutils==0.15.2
entrypoints==0.3
gast==0.2.2
gensim==3.8.0
google-auth==1.10.0
google-auth-oauthlib==0.4.1
google-pasta==0.1.7
grpcio==1.23.0
h5py==2.10.0
idna==2.8
importlib-metadata==0.23
ipykernel==5.1.3
ipython==7.9.0
ipython-genutils==0.2.0
ipywidgets==7.5.1
isodate==0.6.0
jedi==0.15.1
Jinja2==2.10.3
jmespath==0.9.4
joblib==0.13.2
jsonschema==3.2.0
jupyter==1.0.0
jupyter-client==5.3.4
jupyter-console==6.0.0
jupyter-core==4.6.1
Keras==2.2.5
Keras-Applications==1.0.8
Keras-Preprocessing==1.1.0
kiwisolver==1.1.0
llvmlite==0.30.0
Mako==1.1.0
Markdown==3.1.1
MarkupSafe==1.1.1
matplotlib==3.1.1
mistune==0.8.4
more-itertools==7.2.0
mplleaflet==0.0.5
mypy==0.750
mypy-extensions==0.4.3
nbclient==0.1.0
nbconvert==5.6.1
nbformat==4.4.0
networkx==2.3
notebook==6.0.2
numba==0.46.0
numpy==1.17.2
oauthlib==3.1.0
opt-einsum==3.1.0
packaging==19.2
pandas==0.25.1
pandocfilters==1.4.2
papermill==1.2.1
parso==0.5.1
pathspec==0.6.0
pdoc3==0.7.2
pexpect==4.7.0
pickleshare==0.7.5
pluggy==0.13.1
prometheus-client==0.7.1
prompt-toolkit==2.0.10
protobuf==3.9.1
ptyprocess==0.6.0
py==1.8.0
py-cpuinfo==5.0.0
py4j==0.10.7
pyasn1==0.4.8
pyasn1-modules==0.2.7
pydot==1.4.1
Pygments==2.4.2
pyparsing==2.4.2
pyrsistent==0.15.6
pyspark==2.4.4
pyspark-stubs==2.4.0.post6
pytest==5.3.1
pytest-benchmark==3.2.2
pytest-cov==2.8.1
pytest-repeat==0.8.0
python-dateutil==2.8.0
pytz==2019.2
PyYAML==5.1.2
pyzmq==18.1.1
qtconsole==4.6.0
rdflib==4.2.2
regex==2019.12.9
requests==2.22.0
requests-oauthlib==1.3.0
rsa==4.0
s3transfer==0.2.1
scikit-learn==0.21.3
scipy==1.3.1
seaborn==0.10.0
Send2Trash==1.5.0
six==1.12.0
smart-open==1.8.4
-e git+git@github.com:stellargraph/stellargraph.git@97820bd72d1aa4e0273d28c5f6b2fe4bb0b3c840#egg=stellargraph
tenacity==6.0.0
tensorboard==2.0.2
tensorflow==2.0.0
tensorflow-estimator==2.0.1
termcolor==1.1.0
terminado==0.8.3
testpath==0.4.4
textwrap3==0.9.2
toml==0.10.0
tornado==6.0.3
tqdm==4.42.1
traitlets==4.3.3
treon==0.1.3
typed-ast==1.4.0
typing-extensions==3.7.4.1
urllib3==1.25.3
wcwidth==0.1.7
webencodings==0.5.1
Werkzeug==0.15.6
widgetsnbextension==3.5.1
wrapt==1.11.2
zipp==0.6.0

Additional context

N/A

Original Issue

Title: Hinsage link prediction demo「AttributeError: ‘MovieLens’ object has no attribute ‘load’」

I use the demo ipynb file from demo/linkprediction/hinsage dataset = datasets.MovieLens() display(HTML(dataset.description)) G, edges_with_ratings = dataset.load() show then error as below:

AttributeError: ‘MovieLens’ object has no attribute ‘load’

Issue Analytics

State:
Created 4 years ago
Comments:7 (3 by maintainers)

Top GitHub Comments

1reaction

micabancommented, Jul 1, 2021

Is there any solution for the Graph disconnected error with Hinsage? I have the same error:

ValueError: Graph disconnected: cannot obtain value for tensor KerasTensor(type_spec=TensorSpec(shape=(None, 32, 1), dtype=tf.float32, name=‘input_57’), name=‘input_57’, description=“created by layer ‘input_57’”) at layer “reshape_70”. The following previous layers were accessed without issue: []

0reactions

huonwcommented, Mar 11, 2020

Ah, yes, it seems there’s some difference. I apologise for suggesting it could work, clearly without having tested it. Sorry!

It looks like 2 is not currently a work-around that works at the moment.

Top Results From Across the Web

Source code for stellargraph.data.unsupervised_sampler

__all__ = ["UnsupervisedSampler"] import numpy as np from ... context) pairs from the walks and the negative samples are contexts generated for each...

Text Representation Enrichment Utilizing Graph based ... - arXiv

After constructing the graph, node embeddings are trained in an unsupervised manner, and we update the graph nodes with new representations.

Linkprediction using Hinsage/Graphsage in StellarGraph ...

So I found the problem, might be useful for others. If there is any node containing missing data, the thing will just produce...

graphsage-unsupervised-sampler-embeddings.ipynb

The positive (target, context) pairs are the node pairs co-occurring on random ... 2. Create the UnsupervisedSampler instance with the relevant parameters ...

Graph-ML ~ Nicolas Racchi

There are 5 types of nodes and 4 types of edges in the graph. Each type of node and ... 3.5.2. Unsupervised HinSAGE...