UnsupervisedSampler generates invalid context pairs for unsupervised HinSAGE with multiple node types
See original GitHub issueDescribe the bug
Unsupervised GraphSAGE works by creating context pairs of node IDs using UnsupervisedSampler
and essentially using them as links for a link prediction task. One can try unsupervised HinSAGE in an analogous way, creating context pairs of nodes and doing HinSAGE link prediction, however this doesn’t work, because HinSAGE link prediction relies on the source/target nodes having specific types and UnsupervisedSampler
generates link pairs without thinking about their types at all.
See https://github.com/stellargraph/stellargraph/issues/1022#issuecomment-595498135 for some possible work-arounds.
To Reproduce
import stellargraph as sg
import pandas as pd
a = pd.DataFrame([12], index=[0])
b = pd.DataFrame([34], index=[1])
e = pd.DataFrame({"source":[0, 1],"target":[1, 1]})
G = sg.StellarGraph(nodes={'a': a,'b': b}, edges=e)
nodes = list(a.index)
number_of_walks = 1
length = 5
unsupervised_samples = sg.data.UnsupervisedSampler(
G, nodes=nodes, length=length, number_of_walks=number_of_walks
)
[(links, labels)] = unsupervised_samples.run(100)
print(links)
generator = sg.mapper.HinSAGELinkGenerator(
G, batch_size, num_samples, head_node_types=('a', 'b')
)
train_gen = generator.flow(links, labels)
Also, if the unsupervised_samples
is passed directly to flow
, one gets errors later, when the Sequence (train_gen
) is passed to a model.fit
function (see https://github.com/stellargraph/stellargraph/issues/1022#issuecomment-595047182).
Observed behavior
The print
statement shows:
[[0 1]
[0 0]
[0 1]
[0 1]
[0 1]
[0 1]
[0 1]
[0 1]]
The generator.flow
line throws an exception:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-89-0b0e094edc41> in <module>
----> 1 train_gen = generator.flow(links, labels)
~/projects/stellargraph/stellargraph/stellargraph/mapper/sampled_link_generators.py in flow(self, link_ids, targets, shuffle, seed)
148 ):
149 raise ValueError(
--> 150 f"Node pair ({src}, {dst}) not of expected type ({expected_src_type}, {expected_dst_type})"
151 )
152
ValueError: Node pair (0, 0) not of expected type (a, b)
Expected behavior
For the code to actually work, either:
- HinSAGE should support automatically configuring the
UnsupervisedSampler
to only get useful info - There should be some way to manually configure the unsupervised sampler to work correctly
In lieu of it working properly, we could potentially flag it more obviously.
Environment
Operating system: Darwin-18.6.0-x86_64-i386-64bit
Python version:
3.6.9 (default, Jul 10 2019, 12:25:55)
[GCC 4.2.1 Compatible Apple LLVM 10.0.1 (clang-1001.0.46.4)]
Package versions:
absl-py==0.8.0
ansiwrap==0.8.4
appdirs==1.4.3
appnope==0.1.0
astor==0.8.0
atomicwrites==1.3.0
attrs==19.3.0
backcall==0.1.0
black==19.10b0
bleach==3.1.0
boto==2.49.0
boto3==1.9.230
botocore==1.12.230
cachetools==4.0.0
certifi==2019.9.11
chardet==3.0.4
Click==7.0
coverage==4.5.4
coveralls==1.8.2
cycler==0.10.0
decorator==4.4.0
defusedxml==0.6.0
docopt==0.6.2
docutils==0.15.2
entrypoints==0.3
gast==0.2.2
gensim==3.8.0
google-auth==1.10.0
google-auth-oauthlib==0.4.1
google-pasta==0.1.7
grpcio==1.23.0
h5py==2.10.0
idna==2.8
importlib-metadata==0.23
ipykernel==5.1.3
ipython==7.9.0
ipython-genutils==0.2.0
ipywidgets==7.5.1
isodate==0.6.0
jedi==0.15.1
Jinja2==2.10.3
jmespath==0.9.4
joblib==0.13.2
jsonschema==3.2.0
jupyter==1.0.0
jupyter-client==5.3.4
jupyter-console==6.0.0
jupyter-core==4.6.1
Keras==2.2.5
Keras-Applications==1.0.8
Keras-Preprocessing==1.1.0
kiwisolver==1.1.0
llvmlite==0.30.0
Mako==1.1.0
Markdown==3.1.1
MarkupSafe==1.1.1
matplotlib==3.1.1
mistune==0.8.4
more-itertools==7.2.0
mplleaflet==0.0.5
mypy==0.750
mypy-extensions==0.4.3
nbclient==0.1.0
nbconvert==5.6.1
nbformat==4.4.0
networkx==2.3
notebook==6.0.2
numba==0.46.0
numpy==1.17.2
oauthlib==3.1.0
opt-einsum==3.1.0
packaging==19.2
pandas==0.25.1
pandocfilters==1.4.2
papermill==1.2.1
parso==0.5.1
pathspec==0.6.0
pdoc3==0.7.2
pexpect==4.7.0
pickleshare==0.7.5
pluggy==0.13.1
prometheus-client==0.7.1
prompt-toolkit==2.0.10
protobuf==3.9.1
ptyprocess==0.6.0
py==1.8.0
py-cpuinfo==5.0.0
py4j==0.10.7
pyasn1==0.4.8
pyasn1-modules==0.2.7
pydot==1.4.1
Pygments==2.4.2
pyparsing==2.4.2
pyrsistent==0.15.6
pyspark==2.4.4
pyspark-stubs==2.4.0.post6
pytest==5.3.1
pytest-benchmark==3.2.2
pytest-cov==2.8.1
pytest-repeat==0.8.0
python-dateutil==2.8.0
pytz==2019.2
PyYAML==5.1.2
pyzmq==18.1.1
qtconsole==4.6.0
rdflib==4.2.2
regex==2019.12.9
requests==2.22.0
requests-oauthlib==1.3.0
rsa==4.0
s3transfer==0.2.1
scikit-learn==0.21.3
scipy==1.3.1
seaborn==0.10.0
Send2Trash==1.5.0
six==1.12.0
smart-open==1.8.4
-e git+git@github.com:stellargraph/stellargraph.git@97820bd72d1aa4e0273d28c5f6b2fe4bb0b3c840#egg=stellargraph
tenacity==6.0.0
tensorboard==2.0.2
tensorflow==2.0.0
tensorflow-estimator==2.0.1
termcolor==1.1.0
terminado==0.8.3
testpath==0.4.4
textwrap3==0.9.2
toml==0.10.0
tornado==6.0.3
tqdm==4.42.1
traitlets==4.3.3
treon==0.1.3
typed-ast==1.4.0
typing-extensions==3.7.4.1
urllib3==1.25.3
wcwidth==0.1.7
webencodings==0.5.1
Werkzeug==0.15.6
widgetsnbextension==3.5.1
wrapt==1.11.2
zipp==0.6.0
Additional context
N/A
Original Issue
Title: Hinsage link prediction demo「AttributeError: ‘MovieLens’ object has no attribute ‘load’」
I use the demo ipynb file from demo/linkprediction/hinsage
dataset = datasets.MovieLens()
display(HTML(dataset.description))
G, edges_with_ratings = dataset.load()
show then error as below:
AttributeError: ‘MovieLens’ object has no attribute ‘load’
Issue Analytics
- State:
- Created 4 years ago
- Comments:7 (3 by maintainers)
Top GitHub Comments
Is there any solution for the Graph disconnected error with Hinsage? I have the same error:
ValueError: Graph disconnected: cannot obtain value for tensor KerasTensor(type_spec=TensorSpec(shape=(None, 32, 1), dtype=tf.float32, name=‘input_57’), name=‘input_57’, description=“created by layer ‘input_57’”) at layer “reshape_70”. The following previous layers were accessed without issue: []
Ah, yes, it seems there’s some difference. I apologise for suggesting it could work, clearly without having tested it. Sorry!
It looks like 2 is not currently a work-around that works at the moment.