Stack error on using mini-batch size > 1
See original GitHub issueDescribe the bug On running the below code:
cl_strategy = EWC(
model_ft, optimizer_ft, criterion, ewc_lambda=5,
train_mb_size=32, train_epochs=4, eval_mb_size=32
)
We get the below-mentioned error. This works fine for train_mb_size=1 and eval_mb_size=1 but gives error for any other value.
/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/collate.py in default_collate(batch)
53 storage = elem.storage()._new_shared(numel)
54 out = elem.new(storage)
---> 55 return torch.stack(batch, 0, out=out)
56 elif elem_type.__module__ == 'numpy' and elem_type.__name__ != 'str_' \
57 and elem_type.__name__ != 'string_':
RuntimeError: stack expects each tensor to be equal size, but got [3, 341, 500] at entry 0 and [3, 313, 500] at entry 1
To Reproduce
cl_strategy = EWC(
model_ft, optimizer_ft, criterion, ewc_lambda=5,
train_mb_size=32, train_epochs=4, eval_mb_size=32
)
Expected behavior It was supposed to train with the specified mini-batch size.
Issue Analytics
- State:
- Created 2 years ago
- Comments:9 (4 by maintainers)
Top Results From Across the Web
Why mini batch size is better than one single "batch" with all ...
Training with large minibatches is bad for your health. More importantly, it's bad for your test error. Friends dont let friends use minibatches...
Read more >SDG with batch size >1? - pytorch - Stack Overflow
No. Batch size = 20 means, it would process all the 20 samples and then get the scalar loss. Based on that it...
Read more >A Gentle Introduction to Mini-Batch Gradient Descent and How ...
Large values give a learning process that converges slowly with accurate estimates of the error gradient. Tip 1: A good default for batch...
Read more >ML | Mini-Batch Gradient Descent with Python - GeeksforGeeks
Make predictions on the mini-batch; Compute error in predictions ... Step #1: First step is to import dependencies, generate data for linear ...
Read more >What's the Optimal Batch Size to Train a Neural Network?
When you put m examples in a minibatch, you need to do O(m) computation and use O(m) memory, but you reduce the amount...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I inspected the CUB200 dataset and the problem is related to the fact that each pattern is a 3-channel image with different height and width. So, each pattern tensors has shape (3, H, W). The dataloader tries to build the minibatch with
stack
and fails since dimensions are different.I think this could be fixed by forcing a center crop of same dimensions on all patterns or by padding. However, I don’t know if there are standard practices to work with this dataset. @lrzpellegrini , @vlomonaco do you have any hints on this?
Ok, I’ll try to check if a sproper implementation of the dataset exists in the wild 😉