Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

`return out.scatter_add_(dim, index, src)` RuntimeError: index 765 is out of bounds for dimension 0 with size 765

See original GitHub issue

🐛 Describe the bug

I was doing my graduation project relevant to an NLP task and I encountered this bug when I ran my GNN model training code. This is my first time learning and using torch_geometric, but I have encountered a lot of problems, please help me to solve this problem, I will be grateful！ Simply put, it is a text classification task**(it is a 5 classification task).** In the first step, I have converted the text data into a graph data structure that torch_geometric can identify.

# e.g.
print(dataset[0])
print(dataset[0].num_node_features)
print(dataset[0].num_nodes)
# My word vector dimension is 300, so my num_node_features = 300
> Data(x=[271, 300], edge_index=[2, 1614], y=[1])
> 300
> 271

In the second step, I created my own train_loader with DataLoader, instantiated the GNN model, and defined the device, optimizer, loss function, etc.

dataset.shuffle()
data_size=len(dataset)
train_loader = DataLoader(dataset[:int(data_size * 0.9)], batch_size=3, shuffle=True)

gnn_model = GCN(num_node_feature=300,num_classes=5,hidden_channels=3)
device = torch.device('cuda:0' if torch.cuda.is_available() else "cpu")
gnn_model = gnn_model.to(device)

optimizer = torch.optim.Adam(gnn_model.parameters(), lr=0.005)
criterion = torch.nn.CrossEntropyLoss()

and my GNN model is defined as below.

class GCN(torch.nn.Module):
    def __init__(self, num_node_features, num_classes, hidden_channels):
        super(GCN, self).__init__()
        torch.manual_seed(666)
        self.conv1 = GCNConv(num_node_features, hidden_channels * 2)
        self.conv2 = GCNConv(hidden_channels * 2, hidden_channels)
        self.lin = Linear(hidden_channels, num_classes)

    def forward(self, x, edge_index, batch):

        x = self.conv1(x, edge_index)
        x = x.relu()
        x = self.conv2(x, edge_index)
        x = x.relu()
        x = global_mean_pool(x, batch)  # [batch_size, hidden_channels]

        # 3. Apply a final classifier
        x = F.dropout(x, p=0.2, training=self.training)
        x = self.lin(x)
        return x

In the third step, I wrote the train() function and then tried to train my model. Unfortunately, I encountered the above bug.

def train():
  total_loss= 0
  best_acc = -1
  for d in tqdm(train_loader):
    print(d.x.shape)
    correct = 0
    optimizer.zero_grad()
    out = gnn_model(d.x, d.edge_index, d.batch)
    loss = criterion(out, d.y) 
    loss.backward()
    total_loss+=loss.item()      
    optimizer.step()
    pred = out.argmax(dim=1)  # Use the class with highest probability.
    correct += int((pred == d.y).sum())  # Check against 
    print(f"Train Loss：{loss.item()}")

  total_loss/=len(train_loader)
  acc = correct/len(train_loader)
  return total_loss, acc

for epoch in range(2):
  train()

[<ipython-input-31-50b9ee8e84d4>](https://localhost:8080/#) in train()
      6     correct = 0
      7     optimizer.zero_grad()
----> 8     out = gnn_model(d.x, d.edge_index, d.batch)
      9     loss = criterion(out, d.y)
     10     loss.backward()

[/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py](https://localhost:8080/#) in _call_impl(self, *input, **kwargs)
   1108         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1109                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1110             return forward_call(*input, **kwargs)
   1111         # Do not call functions when jit is used
   1112         full_backward_hooks, non_full_backward_hooks = [], []

[<ipython-input-27-3ada20380b27>](https://localhost:8080/#) in forward(self, x, edge_index, batch)
     22     def forward(self, x, edge_index, batch):
     23 
---> 24         x = self.conv1(x, edge_index)
     25         x = x.relu()
     26         x = self.conv2(x, edge_index)

[/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py](https://localhost:8080/#) in _call_impl(self, *input, **kwargs)
   1108         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1109                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1110             return forward_call(*input, **kwargs)
   1111         # Do not call functions when jit is used
   1112         full_backward_hooks, non_full_backward_hooks = [], []

[/usr/local/lib/python3.7/dist-packages/torch_geometric/nn/conv/gcn_conv.py](https://localhost:8080/#) in forward(self, x, edge_index, edge_weight)
    172                     edge_index, edge_weight = gcn_norm(  # yapf: disable
    173                         edge_index, edge_weight, x.size(self.node_dim),
--> 174                         self.improved, self.add_self_loops)
    175                     if self.cached:
    176                         self._cached_edge_index = (edge_index, edge_weight)

[/usr/local/lib/python3.7/dist-packages/torch_geometric/nn/conv/gcn_conv.py](https://localhost:8080/#) in gcn_norm(edge_index, edge_weight, num_nodes, improved, add_self_loops, dtype)
     62 
     63         row, col = edge_index[0], edge_index[1]
---> 64         deg = scatter_add(edge_weight, col, dim=0, dim_size=num_nodes)
     65         deg_inv_sqrt = deg.pow_(-0.5)
     66         deg_inv_sqrt.masked_fill_(deg_inv_sqrt == float('inf'), 0)

[/usr/local/lib/python3.7/dist-packages/torch_scatter/scatter.py](https://localhost:8080/#) in scatter_add(src, index, dim, out, dim_size)
     27                 out: Optional[torch.Tensor] = None,
     28                 dim_size: Optional[int] = None) -> torch.Tensor:
---> 29     return scatter_sum(src, index, dim, out, dim_size)
     30 
     31 

[/usr/local/lib/python3.7/dist-packages/torch_scatter/scatter.py](https://localhost:8080/#) in scatter_sum(src, index, dim, out, dim_size)
     19             size[dim] = int(index.max()) + 1
     20         out = torch.zeros(size, dtype=src.dtype, device=src.device)
---> 21         return out.scatter_add_(dim, index, src)
     22     else:
     23         return out.scatter_add_(dim, index, src)

RuntimeError: index 765 is out of bounds for dimension 0 with size 765

Environment

PyG version:(I installed it in Colab)

!pip install torch-cluster -f https://data.pyg.org/whl/torch-1.11.0+cu113.html
!pip install torch-sparse -f https://data.pyg.org/whl/torch-1.11.0+cu113.html
!pip install torch-scatter -f https://data.pyg.org/whl/torch-1.11.0+cu113.html
!pip install torch-geometric

PyTorch version:1.11.0+cu113
How you installed PyTorch and PyG (conda, pip, source): pip
torch-geometric-2.0.4
torch-scatter-2.0.9
torch-sparse-0.6.13
torch-cluster-1.6.0

Issue Analytics

State:
Created a year ago
Comments:7 (3 by maintainers)

Top GitHub Comments

3reactions

rusty1scommented, May 3, 2022

It looks like your edge_index may be wrongly encoded. Can you confirm that

for data in dataset:
    assert data.edge_index.max() < data.num_nodes

runs through for you?

2reactions

rusty1scommented, Sep 16, 2022

Yes, this is a requirement since the indices in edge_index are used to index select entries from the feature matrix of shape [num_nodes, num_features]. In your case, you need to map the indices in edge_index to their corresponding indices in the feature matrix, e.g., via

mapping = {}
mapped_edge_index = []
for (src, dst) in edge_index.t().tolist():
    if src not in mapping:
        mapping[src] = len(mapping)
    if dst not in mapping:
        mapping[dst] = len(mapping)
    mapped_edge_index.append([mapping[src], mapping[dst]])
edge_index = torch.tensor(mapped_edge_index).t().contiguous()