question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Dataloader does not work with inputs of different size

See original GitHub issue

I am not sure if this is a bug or it is build on purpose like this, but i notice if i have data with size of BxCxAxWxH, where A can be different from sample to sample the dataloader through an error.

Example:


Training = MyDataset(VideosPath)
for i in range(3):
    sample = Training[i]
    print(i, sample['frames'].size()  )
0 torch.Size([1, 3, 10, 10, 10])
1 torch.Size([1, 3, 10, 10, 10])
2 torch.Size([1, 3, 10, 10, 10])
dataloader = DataLoader(Training, batch_size=2, shuffle=False, num_workers=4)
for i_batch, sample_batched in enumerate(dataloader):
    print(i_batch, sample_batched['frames'].size() )

works fine.

but if i have:

Training = MyDataset(VideosPath)
for i in range(3):
    sample = Training[i]
    print(i, sample['frames'].size()  )
0 torch.Size([1, 3, 90, 10, 10])
1 torch.Size([1, 3, 211, 10, 10])
2 torch.Size([1, 3, 370, 10, 10])

dataloader = DataLoader(Training, batch_size=2, shuffle=False, num_workers=4)
for i_batch, sample_batched in enumerate(dataloader):
    print(i_batch, sample_batched['frames'].size() )

it does not work and throw an error


RuntimeError Traceback (most recent call last) <ipython-input-69-87a6d0a0df75> in <module>() ----> 1 for i_batch, sample_batched in enumerate(dataloader): 2 print(i_batch, sample_batched[‘frames’].size() ) 3

~/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py in next(self) 284 self.reorder_dict[idx] = batch 285 continue –> 286 return self._process_next_batch(batch) 287 288 next = next # Python 2 compatibility

~/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py in _process_next_batch(self, batch) 305 self._put_indices() 306 if isinstance(batch, ExceptionWrapper): –> 307 raise batch.exc_type(batch.exc_msg) 308 return batch 309

RuntimeError: Traceback (most recent call last): File “/home/alireza/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py”, line 57, in _worker_loop samples = collate_fn([dataset[i] for i in batch_indices]) File “/home/alireza/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py”, line 135, in default_collate return {key: default_collate([d[key] for d in batch]) for key in batch[0]} File “/home/alireza/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py”, line 135, in <dictcomp> return {key: default_collate([d[key] for d in batch]) for key in batch[0]} File “/home/alireza/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py”, line 115, in default_collate return torch.stack(batch, 0, out=out) RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 90 and 211 in dimension 3 at /opt/conda/conda-bld/pytorch_1524586445097/work/aten/src/TH/generic/THTensorMath.c:3586

the error itself is saying that: Sizes of tensors must match except in dimension 0, meaning i can have permuted the dimensions to bring A to dimension 0 and have the rest the same, meaning 0 torch.Size([90, 1, 3, 10, 10]) 1 torch.Size([ 211, 1, 3, 10, 10])

but even doing this will give me an error

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:7 (3 by maintainers)

github_iconTop GitHub Comments

10reactions
fmassacommented, Jul 23, 2018

That’s right. You need to write your own collate_fn and pass it to DataLoader so that you can have batches of different sizes (for example, by padding the images with zero so that they have the same size and can be concatenated).

It should be fairly easy to write your own collate_fn for handling your use-case. Let me know if it isn’t the case.

0reactions
lurenyi233commented, Feb 17, 2021

That’s right. You need to write your own collate_fn and pass it to DataLoader so that you can have batches of different sizes (for example, by padding the images with zero so that they have the same size and can be concatenated).

It should be fairly easy to write your own collate_fn for handling your use-case. Let me know if it isn’t the case.

I have modified my collate_fn, and the data can be in different sizes within my Dataloader. But I met another problem.

  1. Now, a batch of training sample data is a large List, and in this List, there are several Tensors (the tensor number is determined by the batch size we set). Here, it needs to be a List as these inside Tensors are in different sizes.
  2. The CNN model’s input (i.e., this batch of training sample data) can only be Tensor, and it can not be a List.
  3. I failed to transform this List into a Tensor, as the Tensors in this List have different sizes.

Do you have an idea to deal with this problem? I really appreciate any help you can provide.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Dataloader does not work with inputs of different size - vision
The default collate (read “batching”) function tries to torch.stack your data into one Bx… tensor. You're getting this error because your dimensions A...
Read more >
How does Pytorch Dataloader handle variable size data?
You can write your own collate_fn , which for instance 0 -pads the input, truncates it to some predefined length or applies any...
Read more >
Can data loader work with different input shape - Gluon
I have been trying to load data using dataloader. However, if I pass different shaped input for each individual record than it returns...
Read more >
Complete Guide to the DataLoader Class in PyTorch
This post covers the PyTorch dataloader class. We'll show how to load built-in and custom datasets in PyTorch, plus how to transform and...
Read more >
Data — MONAI 1.0.1 Documentation
DataLoader do not seed this class (as a subclass of IterableDataset ) at run time. ... Note that as a stream input, it...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found