Dev Observability
Product
Pricing
Docs
Resources
Blog
Company
Debug Wordle

question-mark

Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Problems using custom dataset

See original GitHub issue

Hi!

Thank you very much for providing the code for your impressive paper and the possibility of testing SEAL for custom datasets. At the moment, I would like to use the dynamic SEAL variant for testing my large custom dataset.

Running: seal_link_pred.py --dataset MyGraph --num_hops 1 --dynamic_train --dynamic_val --dynamic_test

fails with the out of memory error:

../anaconda3/envs/pytorch_geometric/lib/python3.8/site-packages/torch_geometric/utils/train_test_split_edges.py", line 50, in train_test_split_edges neg_adj_mask = torch.ones(num_nodes, num_nodes, dtype=torch.uint8) RuntimeError: [enforce fail at CPUAllocator.cpp:65] . DefaultCPUAllocator: can't allocate memory

When SEAL automatically tries to split the edges, my script crashes: path = osp.join('dataset', args.dataset) dataset = Planetoid(path, args.dataset) # replaced by my dataset split_edge = do_edge_split(dataset) data = dataset[0] data.edge_index = split_edge['train']['edge'].t()

What would be the best way to mitigate this? As we need split_edge later, I wanted to ask for the most convinient way of handling this?

Thank you!

Issue Analytics

State:
Created 2 years ago
Comments:6 (2 by maintainers)

Top GitHub Comments

1reaction

SYLin117commented, Jun 15, 2021

@jqmcginnis thanks for the reply, I would definitely try. @muhanzhang It works fine! Thanks a lot!

0reactions

muhanzhangcommented, Jun 15, 2021

@SYLin117 I added a fast_split function for splitting large custom datasets. Please let me know whether it solves your problem.

Read more comments on GitHub >

Top Results From Across the Web

Problem with a custom dataset and the random_split function

My issue is that downloaded a dataset with 670k images and I don't want to use them all, so I copied 125k into...

04. PyTorch Custom Datasets

A custom dataset is a collection of data relating to a specific problem you're working on. In essence, a custom dataset can be ......

Custom training: walkthrough | TensorFlow Core

This tutorial shows you how to train a machine learning model with a custom training loop to categorize penguins by species. In this...

Pytorch Problem with Custom Dataset Class - Stack Overflow

However, I am experiencing errors with loss = criterion(output, label) . It tells me that ValueError: Expected input batch_size (1) to match ...

How I made my custom dataset for machine learning - Medium

If we were to train on the whole data, we would have a problem that is called overfitting basically the algorithm learns and...

Top Related Medium Post

No results found

Top Related StackOverflow Question

No results found

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Top Related Reddit Thread

No results found

Top Related Hackernoon Post

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Top Related Hashnode Post

No results found

About Planetoid edge_index utilization

Updating results for ogbl-citation2