Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Ability to use your own dataset

See original GitHub issue

Hi @muhanzhang , thanks for sharing this great utility. I managed to run script on the default dataset ‘ogbl-collab’, but the point is to use own datasets. Seems that the object SEALDataset requires the dataset object to contain a dictionary with split edges, as here is the split_edge variable.

dataset = PygLinkPropPredDataset(name=args.dataset) data = dataset[0] split_edge = dataset.get_edge_split()

train_dataset = eval('SEALDataset')( '/.', data, split_edge, num_hops=args.num_hops, percent=args.train_percent, split='train', use_coalesce=use_coalesce, node_label=args.node_label, ratio_per_hop=args.ratio_per_hop, max_nodes_per_hop=args.max_nodes_per_hop, )

I looked into the code of PygLinkPropPredDataset from ogb, however the get_edge_split() method just loads the already ‘preprocessed’ train, test and val splits. Could you please modify the script in order for us users to process own datasets from networkx graph object, or give a hint which splitter utility to use?

Issue Analytics

State:
Created 3 years ago
Comments:9

Top GitHub Comments

1reaction

joc32commented, Oct 22, 2020

At the end, I solved it by downsampling the graphs and achieved very similar results. Thanks for your help, much appreciated!

0reactions

muhanzhangcommented, Oct 25, 2020

At the end, I solved it by downsampling the graphs and achieved very similar results. Thanks for your help, much appreciated!

Hi @joc32, I am sorry that I just found a bug when using custom datasets. The bug was because I didn’t filter out validation/test edges from the input graph when extracting subgraphs. This has been fixed in the latest version. You may need to rerun your experiments to get the true performances. This doesn’t affect the OGB datasets though.