Problems using custom dataset
See original GitHub issueHi!
Thank you very much for providing the code for your impressive paper and the possibility of testing SEAL for custom datasets. At the moment, I would like to use the dynamic SEAL variant for testing my large custom dataset.
Running:
seal_link_pred.py --dataset MyGraph --num_hops 1 --dynamic_train --dynamic_val --dynamic_test
fails with the out of memory error:
../anaconda3/envs/pytorch_geometric/lib/python3.8/site-packages/torch_geometric/utils/train_test_split_edges.py", line 50, in train_test_split_edges neg_adj_mask = torch.ones(num_nodes, num_nodes, dtype=torch.uint8) RuntimeError: [enforce fail at CPUAllocator.cpp:65] . DefaultCPUAllocator: can't allocate memory
When SEAL automatically tries to split the edges, my script crashes:
path = osp.join('dataset', args.dataset)
dataset = Planetoid(path, args.dataset) # replaced by my dataset
split_edge = do_edge_split(dataset)
data = dataset[0]
data.edge_index = split_edge['train']['edge'].t()
What would be the best way to mitigate this? As we need split_edge later, I wanted to ask for the most convinient way of handling this?
Thank you!
Issue Analytics
- State:
- Created 2 years ago
- Comments:6 (2 by maintainers)
@jqmcginnis thanks for the reply, I would definitely try. @muhanzhang It works fine! Thanks a lot!
@SYLin117 I added a fast_split function for splitting large custom datasets. Please let me know whether it solves your problem.