question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Problems using custom dataset

See original GitHub issue

Hi!

Thank you very much for providing the code for your impressive paper and the possibility of testing SEAL for custom datasets. At the moment, I would like to use the dynamic SEAL variant for testing my large custom dataset.

Running: seal_link_pred.py --dataset MyGraph --num_hops 1 --dynamic_train --dynamic_val --dynamic_test

fails with the out of memory error:

../anaconda3/envs/pytorch_geometric/lib/python3.8/site-packages/torch_geometric/utils/train_test_split_edges.py", line 50, in train_test_split_edges neg_adj_mask = torch.ones(num_nodes, num_nodes, dtype=torch.uint8) RuntimeError: [enforce fail at CPUAllocator.cpp:65] . DefaultCPUAllocator: can't allocate memory

When SEAL automatically tries to split the edges, my script crashes: path = osp.join('dataset', args.dataset) dataset = Planetoid(path, args.dataset) # replaced by my dataset split_edge = do_edge_split(dataset) data = dataset[0] data.edge_index = split_edge['train']['edge'].t()

What would be the best way to mitigate this? As we need split_edge later, I wanted to ask for the most convinient way of handling this?

Thank you!

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:6 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
SYLin117commented, Jun 15, 2021

@jqmcginnis thanks for the reply, I would definitely try. @muhanzhang It works fine! Thanks a lot!

0reactions
muhanzhangcommented, Jun 15, 2021

@SYLin117 I added a fast_split function for splitting large custom datasets. Please let me know whether it solves your problem.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Problem with a custom dataset and the random_split function
My issue is that downloaded a dataset with 670k images and I don't want to use them all, so I copied 125k into...
Read more >
04. PyTorch Custom Datasets
A custom dataset is a collection of data relating to a specific problem you're working on. In essence, a custom dataset can be ......
Read more >
Custom training: walkthrough | TensorFlow Core
This tutorial shows you how to train a machine learning model with a custom training loop to categorize penguins by species. In this...
Read more >
Pytorch Problem with Custom Dataset Class - Stack Overflow
However, I am experiencing errors with loss = criterion(output, label) . It tells me that ValueError: Expected input batch_size (1) to match ...
Read more >
How I made my custom dataset for machine learning - Medium
If we were to train on the whole data, we would have a problem that is called overfitting basically the algorithm learns and...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found