Does LambdaLayer need BatchNorm and activation after it?
See original GitHub issueHello,
I’m trying to reproduce of this.
I’m build the LambdaResnet
,a little question is BatchNorm and activation
are needed after this(LambdaLayer)?
Thanks.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:3
- Comments:7 (3 by maintainers)
Top Results From Across the Web
Batch Norm Explained Visually — How it works, and why ...
Calculate the normalized values for each activation feature vector using the corresponding mean and variance. These normalized values now have ...
Read more >Ordering of batch normalization and dropout? - Stack Overflow
So the Batch Normalization Layer is actually inserted right after a Conv ... As far as dropout goes, I believe dropout is applied...
Read more >Batch normalization and the need for bias in neural networks
i.e. each activation is shifted by its own shift parameter (beta). So yes, the batch normalization eliminates the need for a bias vector....
Read more >[D] Batch Normalization before or after ReLU? - Reddit
BN after activation will normalize the positive features without ... It seems a lot of folks have false notions about BatchNorm.
Read more >Where should I place the batch normalization layer(s)?
thinking: Just before or after the activation function layer? ... @shirui-japina In general, Batch Norm layer is usually added before ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hi @lucidrains, thanks for reply. I tested them all, I think you’re right. When apply
bn+relu
the val accuracy doesn’t grow. This is my final implement.Now I’m training
LambdaResnet50
,it’s looking good: I use the standard training step in my project same withResnet50
exceptbatch_size
set to 64.Epoch 28 result: Train Acc:0.44 Loss:2.54 Val Acc:0.48 Loss:3.1748e+08 Some observations are:
parameters
andGFLOPs
are small but training speed and gpu memory cost are still high.torch.cuda.amp
would make train lossnan
, that’s make me could only train with 64 batch_size.Resnet50
.@lucidrains Unfortunately, I only got 76.1 best top1 on val set(79.2 on train set). I’d better wait author release their code.