Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

VideoDataset has a bug about data normalization

See original GitHub issue

There is a bug about normalize(self, buffer) function in dataset.py, it has not normalize data to [0, 1], which we usually do this in Deep Learning training process with PyTorch. And I also tested it, if we don’t normalize it, the training process was totally failed when I used the official train/test split of UCF101, after 54 epochs, the testing accuracy was only around 5%. And if we normalize it, the training process was fine, after 5 epochs, it obtained 8.2% testing accuracy. https://github.com/jfzhang95/pytorch-video-recognition/blob/ca37de9f69a961f22a821c157e9ccf47a601904d/dataloaders/dataset.py#L204

Issue Analytics

State:
Created 4 years ago
Comments:6

Top GitHub Comments

2reactions

wave-transmittercommented, Apr 5, 2019

Why do you think this is a bug? Normalizing data to [0, 1] is not always the case. Subtracting the mean RGB values of the used dataset(usually ImageNet) for backbone’s pre-training is also common. Function normalize() follows this approach.

If you want to prove that normalizing data to [0, 1] leads to higher performance, you have to elaborate more on this. The results that you provided are not comparable to each other. You could validate this by training you model while applying each time one of the two normalization approaches and report the results for the same number of epochs.

1reaction

leftthomascommented, Apr 8, 2019

@wave-transmitter I trained C3D with official split1 from scratch, not used the pre-trained model, and you could test the C3D model from scratch just change one line code in normalize function to frame = frame / 255.0, you will see the result. In this repo, the input tensor values are lage value such as 233.7, -45.2, etc. it’s not common in deep learning training period, it easily causes the value overflow problem, because the conventional ops are matrix multiplication in essential. This is why someone had proposed issues like NAN loss value. mentioned in #17 . If you normalize the data to [0,1], you will see the NAN problem gone.

Top Results From Across the Web

Data Cleaning and Normalization - YouTube

Full course: https://www.udemy.com/course/ data -science-and- ... Your browser can't play this video. ... Data Cleaning and Normalization.

Data Modeling, Normalization and Denormalization - YouTube

by Dimitri FontaineAt: FOSDEM 2019https:// video.fosdem.org/2019/UA2.220/ ... tasks you have to deal with is modeling the database schema for ...

Why models performs better If normalize test data and train ...

The authors said "There is no duplicate records in the proposed test sets" and that they have removed any redundant values. So I...

Data Denormalization - A New Way to Optimize Databases

In normalization of data, we store data in separate tables to avoid redundancy due to which we have only one copy of each...

How to Normalize and Standardize Time Series Data in Python

Below is a plot of the entire dataset. Minimum Daily Temperatures. The dataset shows a strong seasonality component and has a nice, fine-grained ......