Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Initializing Conv Weights with Zero

See original GitHub issue

Where and What?

At the Code for the default_classification_model (retinanet.py#L66) the weights for the kernel are initialized with zero. According to the Keras Implementation (source code here) it is actually 0.

This might improve model performance.

Why is Zero Initialization a Big Deal?

As explained here: "Why random? Why not initialize them all to 0? An important concept here is called symmetry breaking. If all the neurons have the same weights, they will produce the same outputs and we won’t be learning different features. We won’t learn different features because during the backpropagation step, all the weight updates will be exactly the same. So starting with a randomized distribution allows us to initialize the neurons to be different (with very high probability) and allows us to learn a rich and diverse feature hierarchy.

Why mean zero? A common practice in machine learning is to zero-center or normalize the input data, such that the raw input features (for image data these would be pixels) average to zero."

Did I Test my Solution?

No, I do not have the resources to test it at the moment.

Issue Analytics

State:
Created 5 years ago
Reactions:1
Comments:5 (5 by maintainers)

Top GitHub Comments

1reaction

penguinmenac3commented, Oct 4, 2018

Yep i just also read the detectron code.

retnet_cls_pred = model.Conv(
                bl_feat,
                'retnet_cls_pred_fpn{}'.format(lvl),
                dim_in,
                cls_pred_dim * A,
                3,
                pad=1,
                stride=1,
                weight_init=('GaussianFill', {
                    'std': 0.01
                }),
                bias_init=bias_init
            )

1reaction

hgaisercommented, Oct 4, 2018

Ugh I hate it when you have to read papers 10 times and still see small details >.>

Looking at Detectron (which I will assume is the correct implementation), it seems like they initialize the weights non-zero indeed:

https://github.com/facebookresearch/Detectron/blob/master/detectron/modeling/retinanet_heads.py#L136-L139

Would be nice to have a comparison to see if this makes any difference.

ps. @penguinmenac3 nice find!

Top Results From Across the Web

Weight Initialization Techniques in Neural Networks

In general practice biases are initialized with 0 and weights are initialized with random numbers, what if weights are initialized with 0?

Initializing neural networks - DeepLearning.AI

Initializing all the weights with zeros leads the neurons to learn the same features during training. In fact, any constant initialization scheme will ......

Initializing Neural Networks with only Zeros and Ones

The paper proposes a new deterministic initialization method for residual networks, where all the initial weight values are ones and zeros. They theoretically ......

Weight Initialization for Deep Learning Neural Networks

Weight initialization is a procedure to set the weights of a neural network to small random values that define the starting point for...

How to Initialize Weights in PyTorch - Wandb

This code snippet initializes all weights from a Normal Distribution with mean 0 and standard deviation 1, and initializes all the biases to ......