Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

About Planetoid edge_index utilization

See original GitHub issue

Hi!

Thank you again for the flexible and interesting implementation of SEAL and the possibility of using custom datasets! I really enjoy it!

In the seal implementation, you are using the Planetoid dataset. May I ask why you are replacing the edge_index by split_edge['train']['edge'].t(), cf. edge index. Is this line specific to the Planetoid dataset?

Issue Analytics

State:
Created 2 years ago
Comments:6

Top GitHub Comments

1reaction

bits-glitchcommented, Aug 2, 2021

Thank you very much for the explanation! Now, I have finally understood why you do not apply data.edge_index = split_edge['train']['edge'].t() to the ogb datasets. I discovered that the edge_index in the ogb_datasets is already reduced (i.e. it already only contains training edges, and not all edges of the complete graph), which I was not aware of.

For example, if we take ogbl-ddi, we get:

Train Split Edge Shape torch.Size([1067911, 2])
Test Split Edge Shape torch.Size([133489, 2])
Valid Split Shape torch.Size([133489, 2])

If I print the data object after data = dataset[0], I get the following result: Data(edge_index=[2, 2135822]) --> 2135822 / 2 = 1067911 (directed links), thus containing only the training links.

Thanks again for the explanation!

0reactions

muhanzhangcommented, Aug 2, 2021

Because the OGB datasets already have pre-specified train/test/val splits, and you can use dataset.get_edge_split() to retrieve it from dataset. In contrast, you need to do the splitting for custom datasets yourself using do_edge_split(). The OGB datasets have a data = dataset[0] which contains the observed network. Uinsg dataset.get_edge_split() you can also get its train/val/test edge splits. The train edges are subset of the observed network.