question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Initializing Conv Weights with Zero

See original GitHub issue

Where and What?

At the Code for the default_classification_model (retinanet.py#L66) the weights for the kernel are initialized with zero. According to the Keras Implementation (source code here) it is actually 0.

This might improve model performance.

Why is Zero Initialization a Big Deal?

As explained here: "Why random? Why not initialize them all to 0? An important concept here is called symmetry breaking. If all the neurons have the same weights, they will produce the same outputs and we won’t be learning different features. We won’t learn different features because during the backpropagation step, all the weight updates will be exactly the same. So starting with a randomized distribution allows us to initialize the neurons to be different (with very high probability) and allows us to learn a rich and diverse feature hierarchy.

Why mean zero? A common practice in machine learning is to zero-center or normalize the input data, such that the raw input features (for image data these would be pixels) average to zero."

Suggested Solution

Use a normal initializer such as:

keras.initializers.normal(mean=0.0, stddev=0.01, seed=None)

So the whole Conv2D statement should be as follows:

outputs = keras.layers.Conv2D(
        filters=num_classes * num_anchors,
        kernel_initializer=keras.initializers.normal(mean=0.0, stddev=0.01, seed=None),
        bias_initializer=initializers.PriorProbability(probability=prior_probability),
        name='pyramid_classification',
        **options
    )(outputs)

Did I Test my Solution?

No, I do not have the resources to test it at the moment.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Reactions:1
  • Comments:5 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
penguinmenac3commented, Oct 4, 2018

Yep i just also read the detectron code.

retnet_cls_pred = model.Conv(
                bl_feat,
                'retnet_cls_pred_fpn{}'.format(lvl),
                dim_in,
                cls_pred_dim * A,
                3,
                pad=1,
                stride=1,
                weight_init=('GaussianFill', {
                    'std': 0.01
                }),
                bias_init=bias_init
            )
1reaction
hgaisercommented, Oct 4, 2018

Ugh I hate it when you have to read papers 10 times and still see small details >.>

Looking at Detectron (which I will assume is the correct implementation), it seems like they initialize the weights non-zero indeed:

https://github.com/facebookresearch/Detectron/blob/master/detectron/modeling/retinanet_heads.py#L136-L139

Would be nice to have a comparison to see if this makes any difference.

ps. @penguinmenac3 nice find!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Weight Initialization Techniques in Neural Networks
In general practice biases are initialized with 0 and weights are initialized with random numbers, what if weights are initialized with 0?
Read more >
Initializing neural networks - DeepLearning.AI
Initializing all the weights with zeros leads the neurons to learn the same features during training. In fact, any constant initialization scheme will ......
Read more >
Initializing Neural Networks with only Zeros and Ones
The paper proposes a new deterministic initialization method for residual networks, where all the initial weight values are ones and zeros. They theoretically ......
Read more >
Weight Initialization for Deep Learning Neural Networks
Weight initialization is a procedure to set the weights of a neural network to small random values that define the starting point for...
Read more >
How to Initialize Weights in PyTorch - Wandb
This code snippet initializes all weights from a Normal Distribution with mean 0 and standard deviation 1, and initializes all the biases to ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found