VideoDataset has a bug about data normalization
See original GitHub issueThere is a bug about normalize(self, buffer)
function in dataset.py
, it has not normalize data to [0, 1], which we usually do this in Deep Learning training process with PyTorch.
And I also tested it, if we don’t normalize it, the training process was totally failed when I used the official train/test
split of UCF101
, after 54 epochs, the testing accuracy was only around 5%.
And if we normalize it, the training process was fine, after 5 epochs, it obtained 8.2% testing accuracy.
https://github.com/jfzhang95/pytorch-video-recognition/blob/ca37de9f69a961f22a821c157e9ccf47a601904d/dataloaders/dataset.py#L204
Issue Analytics
- State:
- Created 4 years ago
- Comments:6
Top Results From Across the Web
Data Cleaning and Normalization - YouTube
Full course: https://www.udemy.com/course/ data -science-and- ... Your browser can't play this video. ... Data Cleaning and Normalization.
Read more >Data Modeling, Normalization and Denormalization - YouTube
by Dimitri FontaineAt: FOSDEM 2019https:// video.fosdem.org/2019/UA2.220/ ... tasks you have to deal with is modeling the database schema for ...
Read more >Why models performs better If normalize test data and train ...
The authors said "There is no duplicate records in the proposed test sets" and that they have removed any redundant values. So I...
Read more >Data Denormalization - A New Way to Optimize Databases
In normalization of data, we store data in separate tables to avoid redundancy due to which we have only one copy of each...
Read more >How to Normalize and Standardize Time Series Data in Python
Below is a plot of the entire dataset. Minimum Daily Temperatures. The dataset shows a strong seasonality component and has a nice, fine-grained ......
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Why do you think this is a bug? Normalizing data to [0, 1] is not always the case. Subtracting the mean RGB values of the used dataset(usually ImageNet) for backbone’s pre-training is also common. Function normalize() follows this approach.
If you want to prove that normalizing data to [0, 1] leads to higher performance, you have to elaborate more on this. The results that you provided are not comparable to each other. You could validate this by training you model while applying each time one of the two normalization approaches and report the results for the same number of epochs.
@wave-transmitter I trained
C3D
with official split1 from scratch, not used the pre-trained model, and you could test theC3D
model from scratch just change one line code innormalize
function toframe = frame / 255.0
, you will see the result. In this repo, the input tensor values are lage value such as 233.7, -45.2, etc. it’s not common in deep learning training period, it easily causes the value overflow problem, because the conventional ops are matrix multiplication in essential. This is why someone had proposed issues likeNAN loss value
. mentioned in #17 . If you normalize the data to [0,1], you will see theNAN
problem gone.