question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

VideoDataset has a bug about data normalization

See original GitHub issue

There is a bug about normalize(self, buffer) function in dataset.py, it has not normalize data to [0, 1], which we usually do this in Deep Learning training process with PyTorch. And I also tested it, if we don’t normalize it, the training process was totally failed when I used the official train/test split of UCF101, after 54 epochs, the testing accuracy was only around 5%. And if we normalize it, the training process was fine, after 5 epochs, it obtained 8.2% testing accuracy. https://github.com/jfzhang95/pytorch-video-recognition/blob/ca37de9f69a961f22a821c157e9ccf47a601904d/dataloaders/dataset.py#L204

Issue Analytics

  • State:open
  • Created 4 years ago
  • Comments:6

github_iconTop GitHub Comments

2reactions
wave-transmittercommented, Apr 5, 2019

Why do you think this is a bug? Normalizing data to [0, 1] is not always the case. Subtracting the mean RGB values of the used dataset(usually ImageNet) for backbone’s pre-training is also common. Function normalize() follows this approach.

If you want to prove that normalizing data to [0, 1] leads to higher performance, you have to elaborate more on this. The results that you provided are not comparable to each other. You could validate this by training you model while applying each time one of the two normalization approaches and report the results for the same number of epochs.

1reaction
leftthomascommented, Apr 8, 2019

@wave-transmitter I trained C3D with official split1 from scratch, not used the pre-trained model, and you could test the C3D model from scratch just change one line code in normalize function to frame = frame / 255.0, you will see the result. In this repo, the input tensor values are lage value such as 233.7, -45.2, etc. it’s not common in deep learning training period, it easily causes the value overflow problem, because the conventional ops are matrix multiplication in essential. This is why someone had proposed issues like NAN loss value. mentioned in #17 . If you normalize the data to [0,1], you will see the NAN problem gone.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Data Cleaning and Normalization - YouTube
Full course: https://www.udemy.com/course/ data -science-and- ... Your browser can't play this video. ... Data Cleaning and Normalization.
Read more >
Data Modeling, Normalization and Denormalization - YouTube
by Dimitri FontaineAt: FOSDEM 2019https:// video.fosdem.org/2019/UA2.220/ ... tasks you have to deal with is modeling the database schema for ...
Read more >
Why models performs better If normalize test data and train ...
The authors said "There is no duplicate records in the proposed test sets" and that they have removed any redundant values. So I...
Read more >
Data Denormalization - A New Way to Optimize Databases
In normalization of data, we store data in separate tables to avoid redundancy due to which we have only one copy of each...
Read more >
How to Normalize and Standardize Time Series Data in Python
Below is a plot of the entire dataset. Minimum Daily Temperatures. The dataset shows a strong seasonality component and has a nice, fine-grained ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found