question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

IndexError in MetaPath2Vec

See original GitHub issue

🐛 Bug

Hi,

I’m getting an IndexError when training MetaPath2Vec on my own dataset. The stack trace is IndexError: Caught IndexError in DataLoader worker process 4. Original Traceback (most recent call last): File "/home/ubuntu/anaconda3/envs/GNN2/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 198, in _worker_loop data = fetcher.fetch(index) File "/home/ubuntu/anaconda3/envs/GNN2/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 47, in fetch return self.collate_fn(data) File "/home/ubuntu/anaconda3/envs/GNN2/lib/python3.7/site-packages/torch_geometric/nn/models/metapath2vec.py", line 157, in sample return self.pos_sample(batch), self.neg_sample(batch) File "/home/ubuntu/anaconda3/envs/GNN2/lib/python3.7/site-packages/torch_geometric/nn/models/metapath2vec.py", line 123, in pos_sample batch = adj.sample(num_neighbors=1, subset=batch).squeeze() File "/home/ubuntu/anaconda3/envs/GNN2/lib/python3.7/site-packages/torch_sparse/sample.py", line 22, in sample return col[rand] IndexError: index 1549811 is out of bounds for dimension 0 with size 1549811

From what I understand, it looks like the final entry in the rowptr tensor in sample is being referenced, which is an index out of bounds for the col tensor (as it is equal to the length of the col tensor). However, it looks like this doesn’t happen on the default AMiner dataset, despite the fact that the subset tensor is a subset of a larger tensor in which the maximum value would index the final value in rowptr. Therefore I think I’m misunderstanding part of the code, so any help would be very much appreciated.

Reproducing the behaviour is complicated because I can’t get the error to occur on the AMiner dataset, and I’m unable to share the dataset I’m working with. If it would be helpful for me to report back any metrics, or the results of any functions on my dataset, please let me know and I’ll do what I can.

Thank you very much for your time, and for putting together such a fantastic library!

Environment

  • OS: Ubuntu 18.04.5
  • Python version: 3.7.10
  • PyTorch version: 1.7.1+cu101
  • CUDA/cuDNN version: 10.1, V10.1.243
  • GCC version: 7.5.0

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:1
  • Comments:10 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
rusty1scommented, Oct 21, 2021

Sadly not yet, and it does not really resolve this issue, as there might be nodes that are only isolated for a few edge types, while they are connected to some nodes for other edge types. I’m trying to fix this directly in MetaPath2Vec.

1reaction
Amayamacommented, Oct 20, 2021

Hi @rusty1s , similar problem appeared when I test with my dataset. And I try to build a toy project which can help you to reproduce and know my problem. The project is in https://github.com/Amayama/pyg_error_toy Thanks for your help!

Read more comments on GitHub >

github_iconTop Results From Across the Web

IndexError in MetaPath2Vec · Issue #2273 - GitHub
Bug Hi, I'm getting an IndexError when training MetaPath2Vec on my own dataset. The stack trace is IndexError: Caught IndexError in ...
Read more >
torch_geometric.nn.models.metapath2vec - PyTorch Geometric
Source code for torch_geometric.nn.models.metapath2vec ... note:: For an example of using MetaPath2Vec, see `examples/hetero/metapath2vec.py ...
Read more >
NLTK word_tokenize 抛出IndexError: list index out of range
NLTK word_tokenize throws IndexError: list index out of range | GitAnswer I am working on some NLP experiments, where I want to tokenize ......
Read more >
metapath2vec: Heterogeneous Network Embedding
metapath2vec : Scalable Representation Learning for Heterogeneous Networks. Paper Information. pdf | slides | poster | video. Download all data & code in...
Read more >
百度AI开发者社区 - AI Studio
第二天讲的图游走类的模型,使得我对node2vec,metapath2vec及其... 【百度技术学院机器学习训练营第二期】-实验作业 ... NLP自定义数据集出错IndexError.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found