# torch_sparse.SparseTensor.size causing problems in Data and graphSAINT

See original GitHub issue## š Bug

It appears that `torch_sparse.SparseTensor`

causes problems when calling `torch_geometric.data.Data.num_nodes`

. I get the following error:

```
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-6-7db7380be762> in <module>
----> 1 data.num_nodes
~/local/miniconda3/envs/gnn/lib/python3.8/site-packages/torch_geometric/data/data.py in num_nodes(self)
201 return self.__num_nodes__
202 for key, item in self('x', 'pos', 'norm', 'batch'):
--> 203 return item.size(self.__cat_dim__(key, item))
204 if hasattr(self, 'adj'):
205 return self.adj.size(0)
~/local/miniconda3/envs/gnn/lib/python3.8/site-packages/torch_sparse/tensor.py in size(self, dim)
212
213 def size(self, dim: int) -> int:
--> 214 return self.sizes()[dim]
215
216 def dim(self) -> int:
TypeError: list indices must be integers or slices, not tuple
```

I would like to use this because my graph nodes do not have features, so I did the standard thing and put in an identity matrix. The graphs are pretty big, so I want to use sparse matrices here, otherwise, Iāll run out of GPU memory pretty quick. The same error occurs in graphSAINT, because it tries to access `data.num_nodes`

.

Iām pretty new to this field and torch_geometric in general, so I was surprised this wasnāt working and that this issue doesnāt seem to have been reported before. Am I using this incorrectly, or is this something that just isnāt supported yet?

## To Reproduce

```
import torch
import torch_sparse
import torch_geometric as pyg
edge_index = torch.tensor([
[1, 0, 3, 1, 2, 0],
[0, 1, 1, 3, 0, 2],
])
num_nodes = len(edge_index.unique())
x = torch_sparse.SparseTensor.eye(num_nodes)
data = pyg.data.Data(edge_index=edge_index, x=x)
data.num_nodes # causes error
```

## Environment

- OS: Ubuntu 16.04
- Python version: 3.8.3
- PyTorch version: 1.6.0
- PyTorch geometric: 1.6.1
- PyTorch sparse: 0.6.7

### Issue Analytics

- State:
- Created 3 years ago
- Comments:5 (2 by maintainers)

#### Top Results From Across the Web

torch_sparse.SparseTensor.size causing problems in Data ...

SparseTensor , in which case getting num_nodes works, but in that case graphSAINT failed because torch.sparse.FloatTensor doesn't support properĀ ...

Read more >Source code for torch_geometric.data.graph_saint

For an example of using GraphSAINT sampling, see `examples/graph_saint.py ... E = data.num_edges self.adj = SparseTensor( row=data.edge_index[0],Ā ...

Read more >torch.sparse ā PyTorch 1.13 documentation

A sparse COO tensor can be constructed by providing the two tensors of indices and values, as well as the size of the...

Read more >EXACT: SCALABLE GRAPH NEURAL NETWORKS

Hence, storing these node embeddings is the major memory bottleneck for training GNNs on large graphs. Most of the existing works towards this...

Read more >Column/row slicing a torch sparse tensor - Stack Overflow

Hopefully this feature will be properly covered soon c.f. https://github.com/pytorch/pytorch/issues/3025 Snippet by Aron Barreira Bordin Args: xĀ ...

Read more >#### Top Related Medium Post

No results found

#### Top Related StackOverflow Question

No results found

#### Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free#### Top Related Reddit Thread

No results found

#### Top Related Hackernoon Post

No results found

#### Top Related Tweet

No results found

#### Top Related Dev.to Post

No results found

#### Top Related Hashnode Post

No results found

## Top GitHub Comments

Hi and thanks for this issue. Using sparse node features is an interesting idea but it currently isnāt officially supported in PyG, and most GNN operators require to operate on dense feature representations.

Furthermore, I do not think that using sparse identity matrices as input features does help in reducing memory complexity of your model since your weight matrices in the first GNN will have a memory complexity of O(N) nonetheless.

An alternative solution is to simply use random node features of low-dimensionality,

e.g.,or to use a trainable embedding layer,

e.g.:Yeah, I donāt know how not having node-features would work in an inductive learning scenario.

That makes a lot of sense. So technically, weāre adding a linear transformation to the identity matrix before passing it through any convolutions. This does increase the number of parameters I guess, but should also increase model capacity?

I tried this, and it seems to be working somewhat well.

Thanks a bunch, youāve been very helpful!