Dataloader does not work with inputs of different size
See original GitHub issueI am not sure if this is a bug or it is build on purpose like this, but i notice if i have data with size of BxCxAxWxH, where A can be different from sample to sample the dataloader through an error.
Example:
Training = MyDataset(VideosPath)
for i in range(3):
sample = Training[i]
print(i, sample['frames'].size() )
0 torch.Size([1, 3, 10, 10, 10])
1 torch.Size([1, 3, 10, 10, 10])
2 torch.Size([1, 3, 10, 10, 10])
dataloader = DataLoader(Training, batch_size=2, shuffle=False, num_workers=4)
for i_batch, sample_batched in enumerate(dataloader):
print(i_batch, sample_batched['frames'].size() )
works fine.
but if i have:
Training = MyDataset(VideosPath)
for i in range(3):
sample = Training[i]
print(i, sample['frames'].size() )
0 torch.Size([1, 3, 90, 10, 10])
1 torch.Size([1, 3, 211, 10, 10])
2 torch.Size([1, 3, 370, 10, 10])
dataloader = DataLoader(Training, batch_size=2, shuffle=False, num_workers=4)
for i_batch, sample_batched in enumerate(dataloader):
print(i_batch, sample_batched['frames'].size() )
it does not work and throw an error
RuntimeError Traceback (most recent call last) <ipython-input-69-87a6d0a0df75> in <module>() ----> 1 for i_batch, sample_batched in enumerate(dataloader): 2 print(i_batch, sample_batched[‘frames’].size() ) 3
~/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py in next(self) 284 self.reorder_dict[idx] = batch 285 continue –> 286 return self._process_next_batch(batch) 287 288 next = next # Python 2 compatibility
~/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py in _process_next_batch(self, batch) 305 self._put_indices() 306 if isinstance(batch, ExceptionWrapper): –> 307 raise batch.exc_type(batch.exc_msg) 308 return batch 309
RuntimeError: Traceback (most recent call last): File “/home/alireza/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py”, line 57, in _worker_loop samples = collate_fn([dataset[i] for i in batch_indices]) File “/home/alireza/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py”, line 135, in default_collate return {key: default_collate([d[key] for d in batch]) for key in batch[0]} File “/home/alireza/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py”, line 135, in <dictcomp> return {key: default_collate([d[key] for d in batch]) for key in batch[0]} File “/home/alireza/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py”, line 115, in default_collate return torch.stack(batch, 0, out=out) RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 90 and 211 in dimension 3 at /opt/conda/conda-bld/pytorch_1524586445097/work/aten/src/TH/generic/THTensorMath.c:3586
the error itself is saying that: Sizes of tensors must match except in dimension 0
, meaning i can have permuted the dimensions to bring A to dimension 0 and have the rest the same, meaning
0 torch.Size([90, 1, 3, 10, 10])
1 torch.Size([ 211, 1, 3, 10, 10])
but even doing this will give me an error
Issue Analytics
- State:
- Created 5 years ago
- Comments:7 (3 by maintainers)
That’s right. You need to write your own
collate_fn
and pass it toDataLoader
so that you can have batches of different sizes (for example, by padding the images with zero so that they have the same size and can be concatenated).It should be fairly easy to write your own collate_fn for handling your use-case. Let me know if it isn’t the case.
I have modified my collate_fn, and the data can be in different sizes within my Dataloader. But I met another problem.
Do you have an idea to deal with this problem? I really appreciate any help you can provide.