DataLoader: Dynamic Batch-size based on num_nodes/num_edges
See original GitHub issue❓ Questions & Help
I was using the RGCNConv layer with num_relations=4, in_channels=out_channels=512. In the forward step, I pass in a graph with 300 nodes and 22768 edges which caused it to raise an CUDA OOM saying it needs 22.23GB memory while I only have 11.17GB.
The line causing the error is
w = torch.index_select(w, 0, edge_type)
in the message function of RGCNConv class, which makes sense as it was trying to create a float tensor of size [22768 x 512 x 512].
But it seems copying the weights num_edges times is inefficient and not necessary. Is there another way to implement RGCN without copying the weights this many times?
Issue Analytics
- State:
- Created 4 years ago
- Comments:13 (9 by maintainers)
Top Results From Across the Web
DataLoader: Dynamic Batch-size based on num_nodes ...
I was using the RGCNConv layer with num_relations=4, in_channels=out_channels=512. In the forward step, I pass in a graph with 300 nodes and ...
Read more >torch.utils.data — PyTorch 1.13 documentation
This allows easier implementations of chunk-reading and dynamic batch size (e.g., by yielding a batched sample at each time).
Read more >Data loading with variable batch size? - Stack Overflow
The following code snippet works for your purpose. First, we define a ToyDataset which takes in a list of tensors ( tensors )...
Read more >torch_geometric.loader — pytorch_geometric documentation
Dynamically adds samples to a mini-batch up to a maximum size (either based on number of nodes or number of edges). class DataLoader(dataset: ......
Read more >Efficient Dynamic Batching of Large Datasets with Infinibatch
Let's implement a typical dynamic padding workflow with pytorch dataloader and a subword level tokenizer. We use BERT-base-cased tokenizer ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Here’s a first hack of the above with all the caveats included:
This is a good request and we currently do nor support this. We could add an argument
max_num_nodes
ormax_num_edges
in addition tobatch_size
to theDataLoader
. However, this requires us to implement our own data loading routine without relying on PyTorch to do this task for us so it can be a bit tricky to get right, especially in combination withnum_workers
.