Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

torch_sparse.SparseTensor.size causing problems in Data and graphSAINT

See original GitHub issue

🐛 Bug

It appears that torch_sparse.SparseTensor causes problems when calling torch_geometric.data.Data.num_nodes. I get the following error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-6-7db7380be762> in <module>
----> 1 data.num_nodes

~/local/miniconda3/envs/gnn/lib/python3.8/site-packages/torch_geometric/data/data.py in num_nodes(self)
    201             return self.__num_nodes__
    202         for key, item in self('x', 'pos', 'norm', 'batch'):
--> 203             return item.size(self.__cat_dim__(key, item))
    204         if hasattr(self, 'adj'):
    205             return self.adj.size(0)

~/local/miniconda3/envs/gnn/lib/python3.8/site-packages/torch_sparse/tensor.py in size(self, dim)
    212 
    213     def size(self, dim: int) -> int:
--> 214         return self.sizes()[dim]
    215 
    216     def dim(self) -> int:

TypeError: list indices must be integers or slices, not tuple

I would like to use this because my graph nodes do not have features, so I did the standard thing and put in an identity matrix. The graphs are pretty big, so I want to use sparse matrices here, otherwise, I’ll run out of GPU memory pretty quick. The same error occurs in graphSAINT, because it tries to access data.num_nodes.

I’m pretty new to this field and torch_geometric in general, so I was surprised this wasn’t working and that this issue doesn’t seem to have been reported before. Am I using this incorrectly, or is this something that just isn’t supported yet?

To Reproduce

import torch
import torch_sparse
import torch_geometric as pyg

edge_index = torch.tensor([
    [1, 0, 3, 1, 2, 0],
    [0, 1, 1, 3, 0, 2],
])

num_nodes = len(edge_index.unique())

x = torch_sparse.SparseTensor.eye(num_nodes)

data = pyg.data.Data(edge_index=edge_index, x=x)

data.num_nodes  # causes error

Environment

OS: Ubuntu 16.04
Python version: 3.8.3
PyTorch version: 1.6.0
PyTorch geometric: 1.6.1
PyTorch sparse: 0.6.7

Issue Analytics

State:
Created 3 years ago
Comments:5 (2 by maintainers)

Top GitHub Comments

1reaction

rusty1scommented, Aug 31, 2020

Hi and thanks for this issue. Using sparse node features is an interesting idea but it currently isn’t officially supported in PyG, and most GNN operators require to operate on dense feature representations.

Furthermore, I do not think that using sparse identity matrices as input features does help in reducing memory complexity of your model since your weight matrices in the first GNN will have a memory complexity of O(N) nonetheless.

An alternative solution is to simply use random node features of low-dimensionality, e.g.,

data.x = torch.randn(data.num_nodes num_features)

or to use a trainable embedding layer, e.g.:

data.n_id = torch.arange(data.num_nodes)
loader = GraphSAINT(...)

embedding = torch.nn.Embedding(data.num_nodes, num_features)

for data in loader:
   x = embedding(data.n_id)

0reactions

pavlin-policarcommented, Aug 31, 2020

Yeah, I don’t know how not having node-features would work in an inductive learning scenario.

it’s just that an embedding matrix is equal to performing I @ weight

That makes a lot of sense. So technically, we’re adding a linear transformation to the identity matrix before passing it through any convolutions. This does increase the number of parameters I guess, but should also increase model capacity?