Ability to use your own dataset
See original GitHub issueHi @muhanzhang , thanks for sharing this great utility. I managed to run script on the default dataset ‘ogbl-collab’, but the point is to use own datasets. Seems that the object SEALDataset requires the dataset object to contain a dictionary with split edges, as here is the split_edge variable.
dataset = PygLinkPropPredDataset(name=args.dataset)
data = dataset[0]
split_edge = dataset.get_edge_split()
train_dataset = eval('SEALDataset')(
'/.',
data,
split_edge,
num_hops=args.num_hops,
percent=args.train_percent,
split='train',
use_coalesce=use_coalesce,
node_label=args.node_label,
ratio_per_hop=args.ratio_per_hop,
max_nodes_per_hop=args.max_nodes_per_hop, )
I looked into the code of PygLinkPropPredDataset from ogb, however the get_edge_split() method just loads the already ‘preprocessed’ train, test and val splits. Could you please modify the script in order for us users to process own datasets from networkx graph object, or give a hint which splitter utility to use?
Issue Analytics
- State:
- Created 3 years ago
- Comments:9
At the end, I solved it by downsampling the graphs and achieved very similar results. Thanks for your help, much appreciated!
Hi @joc32, I am sorry that I just found a bug when using custom datasets. The bug was because I didn’t filter out validation/test edges from the input graph when extracting subgraphs. This has been fixed in the latest version. You may need to rerun your experiments to get the true performances. This doesn’t affect the OGB datasets though.