question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

DataLoader doesn't support batches composed of graph with different edge_index

See original GitHub issue

Hello,

l would like to build a DataLoader from my custom dataset. Each sample has its own adjacency matrix. The latter varies in terms of the number of nodes and connectivity from one sample to another. As a consequence, each sample has its own edge_index and edge_weight.

When l use dataLoader with batch_size=1 it works. However when batch_size > 1 l get the following error :

from torch_geometric.data import DataLoader
loader = DataLoader(train_folder, batch_size=2, shuffle=True)
batch=next(iter(loader))

***** RuntimeError: stack expects each tensor to be equal size, but got [2, 107050] at entry 0 and [2, 106190] at entry 1**

Is DataLoader not supporting varying edge_index ?

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:7 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
rusty1scommented, Mar 26, 2021
  1. No, it should only be applied to indices.
  2. Yes, that’s what DataLoader is doing out of the box (your third point results in wrong behaviour)
0reactions
pinkfloyd06commented, Mar 26, 2021

Thank you for the links and details.

1/ cumsum is applied also on edge_weight ?

2/

train_folder=[ ]

for data_sample in data_list:

                           features,coordinates,edge_index,edge_weight,targets=data_sample

                           train_folder.append(
                            torch_geometric.data.Data(
                            x=features,
                            pos=coordinates, 
                            edge_index=edge_index, 
                            edge_attr=edge_weight,
                            y=targets, 
                            )
                            )

train_loader = torch_geometric.data.DataLoader(train_folder, batch_size=batch_size_train, shuffle=True)

Do you think that my edge_index are accumulated correctly in 2/ or l need to consider cumsum as follow in 3/ ?

3/

train_folder=[ ]

cumsum = 0

for data_sample in data_list:

                           features,coordinates,edge_index,edge_weight,targets=data_sample

                           num_nodes=coordinates.shape[0]

                           cumsum += num_nodes

                           edge_index = edge_index + cumsum 

                           train_folder.append(
                            torch_geometric.data.Data(
                            x=features,
                            pos=coordinates, 
                            edge_index=edge_index, 
                            edge_attr=edge_weight,
                            y=targets, 
                            )
                            )

train_loader = torch_geometric.data.DataLoader(train_folder, batch_size=batch_size_train, shuffle=True)

4/ It seems to me that cumsum plays a role of mask. It control the number of nodes per samples involved in matrix multiplications, in order not to confuse/mix the nodes between the different samples.

Than you

Read more comments on GitHub >

github_iconTop Results From Across the Web

Edge index in batch DataLoader · Issue #1827 - GitHub
Questions & Help Does the Pytorch Geometric DataLoader keep track of which edge indices in the batch object belong to which graph?
Read more >
torch_geometric.data — pytorch_geometric documentation
A data object describing a heterogeneous graph, holding multiple node and/or edge ... Returns True if edge indices edge_index are sorted and do...
Read more >
dgl.dataloading.pytorch.dataloader — DGL 0.7.2 documentation
[docs]class EdgeDataLoader: """PyTorch dataloader for batch-iterating over a set of edges, generating the list of message flow graphs (MFGs) as computation ...
Read more >
'DataLoader' object does not support indexing - Stack Overflow
Getting a batch​​ DataLoader creates random indices in default or specified way (see samplers), hence there is no __getitem__ as it wouldn't make ......
Read more >
Pytorch geometric dataloader - apothele.de
Creates new DataLoader with the specified batch loader function and default ... the graphs to multiple GPUs according to the index and then...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found