ELU instead of ReLU in conv_dw_no_bn
See original GitHub issueHi, nice work on this repo!
I’m wondering that why do you use ELU
instead of ReLU
in con_dw_no_bw
, while the conv_dw
counterpart uses the regular ReLU
:
https://github.com/Daniil-Osokin/lightweight-human-pose-estimation.pytorch/blob/a6e41cf56e0b5e2d23686de9ef15671833bdb72e/modules/conv.py#L25-L32
Is there a particular reason to use ELU
? I didn’t see any mentioning of activation function in your paper or the original openpose paper.
Thank you!
Issue Analytics
- State:
- Created 4 years ago
- Reactions:1
- Comments:10 (5 by maintainers)
Top Results From Across the Web
ELU instead of ReLU in convdwnobn-程序媛/猿/员之家
程序媛/猿/员之家已搜集了关于[ELU instead of ReLU in convdwnobn]在实际运用中的解决方案,提供大量的ELU instead of ReLU in convdwnobn源码实例……
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Good question! For this particular network architecture there is no difference, it is from unfinished experiment. Nets with
ReLU
activations exposed “dead” neurons problem, so some percentage of neurons just never works. And the purpose of this work is to do lightweight net, yet maintain baseline accuracy. One of possible ways to reduce network complexity and save original capacity is to get rid of “dead” neurons with other activation function, so network can be more narrow (no “dead” neurons) but have same capacity (and accuracy). So here we’ve usedELU
and the next thing is to reduce number of channels in these layers, but the time is left. I’ve taken this idea from RMNet paper: “Fast and Accurate Person Re-Identification with RMNet” by E. Izutov.We do not. I’m think the best way is to mask out occluded points too, so network may predict them (if it smart enough), but we will not penalize it if it cannot do it.