question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

UnsupervisedSampler generates invalid context pairs for unsupervised HinSAGE with multiple node types

See original GitHub issue

Describe the bug

Unsupervised GraphSAGE works by creating context pairs of node IDs using UnsupervisedSampler and essentially using them as links for a link prediction task. One can try unsupervised HinSAGE in an analogous way, creating context pairs of nodes and doing HinSAGE link prediction, however this doesn’t work, because HinSAGE link prediction relies on the source/target nodes having specific types and UnsupervisedSampler generates link pairs without thinking about their types at all.

See https://github.com/stellargraph/stellargraph/issues/1022#issuecomment-595498135 for some possible work-arounds.

To Reproduce

import stellargraph as sg
import pandas as pd

a = pd.DataFrame([12], index=[0])
b = pd.DataFrame([34], index=[1])

e = pd.DataFrame({"source":[0, 1],"target":[1, 1]})

G = sg.StellarGraph(nodes={'a': a,'b': b}, edges=e)

nodes = list(a.index)
number_of_walks = 1
length = 5
unsupervised_samples = sg.data.UnsupervisedSampler(
    G, nodes=nodes, length=length, number_of_walks=number_of_walks
)
[(links, labels)] = unsupervised_samples.run(100)
print(links)
generator = sg.mapper.HinSAGELinkGenerator(
    G, batch_size, num_samples, head_node_types=('a', 'b')
)

train_gen = generator.flow(links, labels)

Also, if the unsupervised_samples is passed directly to flow, one gets errors later, when the Sequence (train_gen) is passed to a model.fit function (see https://github.com/stellargraph/stellargraph/issues/1022#issuecomment-595047182).

Observed behavior

The print statement shows:

[[0 1]
 [0 0]
 [0 1]
 [0 1]
 [0 1]
 [0 1]
 [0 1]
 [0 1]]

The generator.flow line throws an exception:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-89-0b0e094edc41> in <module>
----> 1 train_gen = generator.flow(links, labels)

~/projects/stellargraph/stellargraph/stellargraph/mapper/sampled_link_generators.py in flow(self, link_ids, targets, shuffle, seed)
    148                 ):
    149                     raise ValueError(
--> 150                         f"Node pair ({src}, {dst}) not of expected type ({expected_src_type}, {expected_dst_type})"
    151                     )
    152 

ValueError: Node pair (0, 0) not of expected type (a, b)

Expected behavior

For the code to actually work, either:

  • HinSAGE should support automatically configuring the UnsupervisedSampler to only get useful info
  • There should be some way to manually configure the unsupervised sampler to work correctly

In lieu of it working properly, we could potentially flag it more obviously.

Environment

Operating system: Darwin-18.6.0-x86_64-i386-64bit Python version:

3.6.9 (default, Jul 10 2019, 12:25:55) 
[GCC 4.2.1 Compatible Apple LLVM 10.0.1 (clang-1001.0.46.4)]

Package versions:

absl-py==0.8.0
ansiwrap==0.8.4
appdirs==1.4.3
appnope==0.1.0
astor==0.8.0
atomicwrites==1.3.0
attrs==19.3.0
backcall==0.1.0
black==19.10b0
bleach==3.1.0
boto==2.49.0
boto3==1.9.230
botocore==1.12.230
cachetools==4.0.0
certifi==2019.9.11
chardet==3.0.4
Click==7.0
coverage==4.5.4
coveralls==1.8.2
cycler==0.10.0
decorator==4.4.0
defusedxml==0.6.0
docopt==0.6.2
docutils==0.15.2
entrypoints==0.3
gast==0.2.2
gensim==3.8.0
google-auth==1.10.0
google-auth-oauthlib==0.4.1
google-pasta==0.1.7
grpcio==1.23.0
h5py==2.10.0
idna==2.8
importlib-metadata==0.23
ipykernel==5.1.3
ipython==7.9.0
ipython-genutils==0.2.0
ipywidgets==7.5.1
isodate==0.6.0
jedi==0.15.1
Jinja2==2.10.3
jmespath==0.9.4
joblib==0.13.2
jsonschema==3.2.0
jupyter==1.0.0
jupyter-client==5.3.4
jupyter-console==6.0.0
jupyter-core==4.6.1
Keras==2.2.5
Keras-Applications==1.0.8
Keras-Preprocessing==1.1.0
kiwisolver==1.1.0
llvmlite==0.30.0
Mako==1.1.0
Markdown==3.1.1
MarkupSafe==1.1.1
matplotlib==3.1.1
mistune==0.8.4
more-itertools==7.2.0
mplleaflet==0.0.5
mypy==0.750
mypy-extensions==0.4.3
nbclient==0.1.0
nbconvert==5.6.1
nbformat==4.4.0
networkx==2.3
notebook==6.0.2
numba==0.46.0
numpy==1.17.2
oauthlib==3.1.0
opt-einsum==3.1.0
packaging==19.2
pandas==0.25.1
pandocfilters==1.4.2
papermill==1.2.1
parso==0.5.1
pathspec==0.6.0
pdoc3==0.7.2
pexpect==4.7.0
pickleshare==0.7.5
pluggy==0.13.1
prometheus-client==0.7.1
prompt-toolkit==2.0.10
protobuf==3.9.1
ptyprocess==0.6.0
py==1.8.0
py-cpuinfo==5.0.0
py4j==0.10.7
pyasn1==0.4.8
pyasn1-modules==0.2.7
pydot==1.4.1
Pygments==2.4.2
pyparsing==2.4.2
pyrsistent==0.15.6
pyspark==2.4.4
pyspark-stubs==2.4.0.post6
pytest==5.3.1
pytest-benchmark==3.2.2
pytest-cov==2.8.1
pytest-repeat==0.8.0
python-dateutil==2.8.0
pytz==2019.2
PyYAML==5.1.2
pyzmq==18.1.1
qtconsole==4.6.0
rdflib==4.2.2
regex==2019.12.9
requests==2.22.0
requests-oauthlib==1.3.0
rsa==4.0
s3transfer==0.2.1
scikit-learn==0.21.3
scipy==1.3.1
seaborn==0.10.0
Send2Trash==1.5.0
six==1.12.0
smart-open==1.8.4
-e git+git@github.com:stellargraph/stellargraph.git@97820bd72d1aa4e0273d28c5f6b2fe4bb0b3c840#egg=stellargraph
tenacity==6.0.0
tensorboard==2.0.2
tensorflow==2.0.0
tensorflow-estimator==2.0.1
termcolor==1.1.0
terminado==0.8.3
testpath==0.4.4
textwrap3==0.9.2
toml==0.10.0
tornado==6.0.3
tqdm==4.42.1
traitlets==4.3.3
treon==0.1.3
typed-ast==1.4.0
typing-extensions==3.7.4.1
urllib3==1.25.3
wcwidth==0.1.7
webencodings==0.5.1
Werkzeug==0.15.6
widgetsnbextension==3.5.1
wrapt==1.11.2
zipp==0.6.0

Additional context

N/A

Original Issue

Title: Hinsage link prediction demo「AttributeError: ‘MovieLens’ object has no attribute ‘load’」

I use the demo ipynb file from demo/linkprediction/hinsage dataset = datasets.MovieLens() display(HTML(dataset.description)) G, edges_with_ratings = dataset.load() show then error as below:

AttributeError: ‘MovieLens’ object has no attribute ‘load’

Issue Analytics

  • State:open
  • Created 4 years ago
  • Comments:7 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
micabancommented, Jul 1, 2021

Is there any solution for the Graph disconnected error with Hinsage? I have the same error:

ValueError: Graph disconnected: cannot obtain value for tensor KerasTensor(type_spec=TensorSpec(shape=(None, 32, 1), dtype=tf.float32, name=‘input_57’), name=‘input_57’, description=“created by layer ‘input_57’”) at layer “reshape_70”. The following previous layers were accessed without issue: []

0reactions
huonwcommented, Mar 11, 2020

Ah, yes, it seems there’s some difference. I apologise for suggesting it could work, clearly without having tested it. Sorry!

It looks like 2 is not currently a work-around that works at the moment.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Source code for stellargraph.data.unsupervised_sampler
__all__ = ["UnsupervisedSampler"] import numpy as np from ... context) pairs from the walks and the negative samples are contexts generated for each...
Read more >
Text Representation Enrichment Utilizing Graph based ... - arXiv
After constructing the graph, node embeddings are trained in an unsupervised manner, and we update the graph nodes with new representations.
Read more >
Linkprediction using Hinsage/Graphsage in StellarGraph ...
So I found the problem, might be useful for others. If there is any node containing missing data, the thing will just produce...
Read more >
graphsage-unsupervised-sampler-embeddings.ipynb
The positive (target, context) pairs are the node pairs co-occurring on random ... 2. Create the UnsupervisedSampler instance with the relevant parameters ...
Read more >
Graph-ML ~ Nicolas Racchi
There are 5 types of nodes and 4 types of edges in the graph. Each type of node and ... 3.5.2. Unsupervised HinSAGE...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found