question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Ability to use your own dataset

See original GitHub issue

Hi @muhanzhang , thanks for sharing this great utility. I managed to run script on the default dataset ‘ogbl-collab’, but the point is to use own datasets. Seems that the object SEALDataset requires the dataset object to contain a dictionary with split edges, as here is the split_edge variable.

dataset = PygLinkPropPredDataset(name=args.dataset) data = dataset[0] split_edge = dataset.get_edge_split()

train_dataset = eval('SEALDataset')( '/.', data, split_edge, num_hops=args.num_hops, percent=args.train_percent, split='train', use_coalesce=use_coalesce, node_label=args.node_label, ratio_per_hop=args.ratio_per_hop, max_nodes_per_hop=args.max_nodes_per_hop, )

I looked into the code of PygLinkPropPredDataset from ogb, however the get_edge_split() method just loads the already ‘preprocessed’ train, test and val splits. Could you please modify the script in order for us users to process own datasets from networkx graph object, or give a hint which splitter utility to use?

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:9

github_iconTop GitHub Comments

1reaction
joc32commented, Oct 22, 2020

At the end, I solved it by downsampling the graphs and achieved very similar results. Thanks for your help, much appreciated!

0reactions
muhanzhangcommented, Oct 25, 2020

At the end, I solved it by downsampling the graphs and achieved very similar results. Thanks for your help, much appreciated!

Hi @joc32, I am sorry that I just found a bug when using custom datasets. The bug was because I didn’t filter out validation/test edges from the input graph when extracting subgraphs. This has been fixed in the latest version. You may need to rerun your experiments to get the true performances. This doesn’t affect the OGB datasets though.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Building Your Own Dataset: Benefits, Approach, and Tools
What are the benefits of building your own dataset from scratch? · we take full responsibility for the project, from start to finish...
Read more >
Preparing Your Dataset for Machine Learning: 10 Steps
Preparing data for machine learning projects is a crucial first step. Learn how to collect data, what is data cleaning, who is responsible ......
Read more >
Making your own dataset - UK Data Service
You can make your own teaching datasets from the archived data collections held by the UK Data Service. For example, teaching datasets can...
Read more >
Create your own dataset in 10 minutes! (Secret) - YouTube
Create your own dataset in 10 minutes! (Secret) | Fastai For Kids (ep.02) · Key moments. View all · Key moments · Description...
Read more >
Creating Your Own Machine Learning Dataset
If you don't plan to use dozens of images to train the model, that's OK. One way you can increase your training dataset...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found