question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Assertion error during kinetics400 validation

See original GitHub issue

🐛 Describe the bug

Running on main:

torchrun --nproc_per_node=8 train.py --data-path /datasets01/kinetics/070618/400/ --train-dir=val --val-dir=val --batch-size=16 --sync-bn --test-only --pretrained --cache-dataset

throws the following error:

Test:  [2200/3008]  eta: 0:11:24  loss: 2.6703 (2.1475)  acc1: 43.7500 (57.4938)  acc5: 68.7500 (77.8623)  time: 0.9043  data: 0.6405  max mem: 5888
Traceback (most recent call last):
  File "train.py", line 392, in <module>
    main(args)
  File "train.py", line 273, in main
    evaluate(model, criterion, data_loader_test, device=device)
  File "train.py", line 62, in evaluate
    for video, target in metric_logger.log_every(data_loader, 100, header):
  File "/private/home/vvryniotis/vision/references/video_classification/utils.py", line 128, in log_every
    for obj in iterable:
  File "/private/home/vvryniotis/.conda/envs/datumbox/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 521, in __next__
    data = self._next_data()
  File "/private/home/vvryniotis/.conda/envs/datumbox/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1183, in _next_data
    return self._process_data(data)
  File "/private/home/vvryniotis/.conda/envs/datumbox/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1229, in _process_data
    data.reraise()
  File "/private/home/vvryniotis/.conda/envs/datumbox/lib/python3.8/site-packages/torch/_utils.py", line 438, in reraise
    raise exception
AssertionError: Caught AssertionError in DataLoader worker process 3.
Original Traceback (most recent call last):
  File "/private/home/vvryniotis/.conda/envs/datumbox/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
    data = fetcher.fetch(index)
  File "/private/home/vvryniotis/.conda/envs/datumbox/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/private/home/vvryniotis/.conda/envs/datumbox/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/private/home/vvryniotis/vision/torchvision/datasets/kinetics.py", line 233, in __getitem__
    video, audio, info, video_idx = self.video_clips.get_clip(idx)
  File "/private/home/vvryniotis/vision/torchvision/datasets/video_utils.py", line 362, in get_clip
    assert len(video) == self.num_frames, f"{video.shape} x {self.num_frames}"
AssertionError: torch.Size([17, 288, 352, 3]) x 16

If we apply the following patch:

$ git diff
diff --git a/torchvision/datasets/video_utils.py b/torchvision/datasets/video_utils.py
index f0f19e33..2254f8c5 100644
--- a/torchvision/datasets/video_utils.py
+++ b/torchvision/datasets/video_utils.py
@@ -359,8 +359,8 @@ class VideoClips:
                 resampling_idx = resampling_idx - resampling_idx[0]
             video = video[resampling_idx]
             info["video_fps"] = self.frame_rate
-        assert len(video) == self.num_frames, f"{video.shape} x {self.num_frames}"
-        return video, audio, info, video_idx
+        #assert len(video) == self.num_frames, f"{video.shape} x {self.num_frames}"
+        return video[:self.num_frames], audio[:self.num_frames], info, video_idx
 
     def __getstate__(self):
         video_pts_sizes = [len(v) for v in self.video_pts]

We get an accuracy which is far from the expected one:

Result:
 * Clip Acc@1 56.488 Clip Acc@5 77.773

Expected:
 * Clip Acc@1 57.50 Clip Acc@5 78.81

Questions:

cc @pmeier @fmassa @bjuncek

Versions

Latest main 0817f7f

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:7 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
bjuncekcommented, Apr 7, 2022

@datumbox I’ve been able to confirm that I don’t run into an error anymore. Could you double check on your end (I just used kinetics400 val set like in the example above)?

0reactions
datumboxcommented, Apr 7, 2022

@bjuncek I confirm that this is solved on the latest main. Thanks!

Read more comments on GitHub >

github_iconTop Results From Across the Web

I'm getting an error while using torchvision.datasets.Kinetics
I'm trying to use [torchvision.datasets.Kinetics][1] but I'm getting this Assertion Error. This is the initialization of the dataset
Read more >
Enabling Detailed Action Recognition Evaluation Through ...
The paper provides an evaluation toolkit and leaderboard for comparing current action recognition models on the famous Kinetics-400 dataset for their background ...
Read more >
assertion-error | Yarn - Package Manager
Error constructor for test and validation frameworks that implements standardized ... Assertion Error is a module that contains two classes: AssertionError ...
Read more >
Downloading The Kinetics Dataset For Human Action ...
The total downloaded video count is 631604 while the failed videos is 15380, which means 2.37 % of the entire dataset could not...
Read more >
assertion-error - npm Package Health Analysis - Snyk
Error constructor for test and validation frameworks that implements standardized AssertionError specification. For more information about how to use this ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found