Initializing Conv Weights with Zero
See original GitHub issueWhere and What?
At the Code for the default_classification_model (retinanet.py#L66) the weights for the kernel are initialized with zero. According to the Keras Implementation (source code here) it is actually 0.
This might improve model performance.
Why is Zero Initialization a Big Deal?
As explained here: "Why random? Why not initialize them all to 0? An important concept here is called symmetry breaking. If all the neurons have the same weights, they will produce the same outputs and we won’t be learning different features. We won’t learn different features because during the backpropagation step, all the weight updates will be exactly the same. So starting with a randomized distribution allows us to initialize the neurons to be different (with very high probability) and allows us to learn a rich and diverse feature hierarchy.
Why mean zero? A common practice in machine learning is to zero-center or normalize the input data, such that the raw input features (for image data these would be pixels) average to zero."
Suggested Solution
Use a normal initializer such as:
keras.initializers.normal(mean=0.0, stddev=0.01, seed=None)
So the whole Conv2D statement should be as follows:
outputs = keras.layers.Conv2D(
filters=num_classes * num_anchors,
kernel_initializer=keras.initializers.normal(mean=0.0, stddev=0.01, seed=None),
bias_initializer=initializers.PriorProbability(probability=prior_probability),
name='pyramid_classification',
**options
)(outputs)
Did I Test my Solution?
No, I do not have the resources to test it at the moment.
Issue Analytics
- State:
- Created 5 years ago
- Reactions:1
- Comments:5 (5 by maintainers)
Top GitHub Comments
Yep i just also read the detectron code.
Ugh I hate it when you have to read papers 10 times and still see small details >.>
Looking at Detectron (which I will assume is the correct implementation), it seems like they initialize the weights non-zero indeed:
https://github.com/facebookresearch/Detectron/blob/master/detectron/modeling/retinanet_heads.py#L136-L139
Would be nice to have a comparison to see if this makes any difference.
ps. @penguinmenac3 nice find!