Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

DataLoader doesn't support batches composed of graph with different edge_index

See original GitHub issue

Hello,

l would like to build a DataLoader from my custom dataset. Each sample has its own adjacency matrix. The latter varies in terms of the number of nodes and connectivity from one sample to another. As a consequence, each sample has its own edge_index and edge_weight.

When l use dataLoader with batch_size=1 it works. However when batch_size > 1 l get the following error :

from torch_geometric.data import DataLoader
loader = DataLoader(train_folder, batch_size=2, shuffle=True)
batch=next(iter(loader))

***** RuntimeError: stack expects each tensor to be equal size, but got [2, 107050] at entry 0 and [2, 106190] at entry 1**

Is DataLoader not supporting varying edge_index ?

Issue Analytics

State:
Created 2 years ago
Comments:7 (4 by maintainers)

Top GitHub Comments

1reaction

rusty1scommented, Mar 26, 2021

No, it should only be applied to indices.
Yes, that’s what DataLoader is doing out of the box (your third point results in wrong behaviour)

0reactions

pinkfloyd06commented, Mar 26, 2021

Thank you for the links and details.

1/ cumsum is applied also on edge_weight ?

train_folder=[ ]

for data_sample in data_list:

                           features,coordinates,edge_index,edge_weight,targets=data_sample

                           train_folder.append(
                            torch_geometric.data.Data(
                            x=features,
                            pos=coordinates, 
                            edge_index=edge_index, 
                            edge_attr=edge_weight,
                            y=targets, 
                            )
                            )

train_loader = torch_geometric.data.DataLoader(train_folder, batch_size=batch_size_train, shuffle=True)

Do you think that my edge_index are accumulated correctly in 2/ or l need to consider cumsum as follow in 3/ ?

train_folder=[ ]

cumsum = 0

for data_sample in data_list:

                           features,coordinates,edge_index,edge_weight,targets=data_sample

                           num_nodes=coordinates.shape[0]

                           cumsum += num_nodes

                           edge_index = edge_index + cumsum 

                           train_folder.append(
                            torch_geometric.data.Data(
                            x=features,
                            pos=coordinates, 
                            edge_index=edge_index, 
                            edge_attr=edge_weight,
                            y=targets, 
                            )
                            )

train_loader = torch_geometric.data.DataLoader(train_folder, batch_size=batch_size_train, shuffle=True)

4/ It seems to me that cumsum plays a role of mask. It control the number of nodes per samples involved in matrix multiplications, in order not to confuse/mix the nodes between the different samples.

Than you

Top Results From Across the Web

Edge index in batch DataLoader · Issue #1827 - GitHub

Questions & Help Does the Pytorch Geometric DataLoader keep track of which edge indices in the batch object belong to which graph?

torch_geometric.data — pytorch_geometric documentation

A data object describing a heterogeneous graph, holding multiple node and/or edge ... Returns True if edge indices edge_index are sorted and do...

dgl.dataloading.pytorch.dataloader — DGL 0.7.2 documentation

[docs]class EdgeDataLoader: """PyTorch dataloader for batch-iterating over a set of edges, generating the list of message flow graphs (MFGs) as computation ...

'DataLoader' object does not support indexing - Stack Overflow

Getting a batch DataLoader creates random indices in default or specified way (see samplers), hence there is no __getitem__ as it wouldn't make ......

Pytorch geometric dataloader - apothele.de

Creates new DataLoader with the specified batch loader function and default ... the graphs to multiple GPUs according to the index and then...