UCF101: Dataloader Fail on assertion
See original GitHub issue🐛 Bug
When loading UCF101 with different value of frames_per_clip and step_between_clips, it often yield frames_per_clip + 1 images which results in the following assertion to fail:
assert len(video) == self.num_frames, "{} x {}".format(
video.shape, self.num_frames
Code To Reproduce
import torch
from torchvision.datasets import UCF101
ucf_loc = "/dataset/ucf"
ucf_data_dir = f"{ucf_loc}/UCF101/UCF-101"
ucf_label_dir = f"{ucf_loc}/ucfTrainTestlist"
frames_per_clip = 5
step_between_clips = 1
num_workers = 4
def custom_collate(batch):
filtered_batch = []
for video, _, label in batch:
filtered_batch.append((video, label))
return torch.utils.data.dataloader.default_collate(filtered_batch)
if __name__ == '__main__':
# create train loader (allowing batches and other extras)
test_dataset = UCF101(ucf_data_dir, ucf_label_dir, frames_per_clip=frames_per_clip,
step_between_clips=step_between_clips, train=False, transform=None, num_workers=num_workers)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=8, shuffle=True,
collate_fn=custom_collate,num_workers=num_workers)
for i, (video, label) in enumerate(test_loader):
print(video.size())
print(label)
Steps to reproduce the behavior:
- Download UCF101
- Modify ucf_loc and run the code included above
stack trace
Original Traceback (most recent call last): File “/opt/miniconda3/envs/p38_ucf/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py”, line 287, in _worker_loop data = fetcher.fetch(index) File “/opt/miniconda3/envs/p38_ucf/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py”, line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File “/opt/miniconda3/envs/p38_ucf/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py”, line 44, in <listcomp> data = [self.dataset[idx] for idx in possibly_batched_index] File “/opt/miniconda3/envs/p38_ucf/lib/python3.8/site-packages/torchvision/datasets/ucf101.py”, line 102, in getitem video, audio, info, video_idx = self.video_clips.get_clip(idx) File “/opt/miniconda3/envs/p38_ucf/lib/python3.8/site-packages/torchvision/datasets/video_utils.py”, line 382, in get_clip assert len(video) == self.num_frames, “{} x {}”.format( AssertionError: torch.Size([6, 240, 320, 3]) x 5
Expected behavior
The return tensor data should be composed of a total of frames_per_clip images consistently no matter the properties of the input video and the parameter values provided to UCF101 class.
Environment
- PyTorch =1.9.0 and TorchVision=0.10.0.
- OS (e.g., Linux): MacOs BigSur
- How you installed PyTorch / torchvision (
conda
,pip
, source): conda - Build command you used (if compiling from source): NA
- Python version: 3.8.10
- CUDA/cuDNN version: NA
- GPU models and configuration: CPU
- Any other relevant information:
Additional context
Preliminary Investigation:
My guess is that with the provided argument, _read_from_stream function return +/- 1 frames in video.py: read_video(). I did not dig deeper to understand why
Issue Analytics
- State:
- Created 2 years ago
- Comments:8 (1 by maintainers)
Top GitHub Comments
This is probably related to https://github.com/pytorch/vision/pull/3791, as now when using
sec
to index into the video, there are rounding errors which leads to the error.cc @prabhat00155 @bjuncek I’ve brought this potential problem during our call a few weeks ago, we should fix it
How can I solve this problem? PyTorch =1.10.1 and TorchVision= 0.11.2 .