reimplement of split_attention_conv2d and why don't want to add BN2/ReLU in Bottleneck?
See original GitHub issuehi @zhanghang1989 ,First of all, thank you very much for providing such an imaginative model
I refer to the source code implementation of ResNetSt and reproduce a new implementation of SplitAttentionConv2d. The implementation architecture may be clearer
# -*- coding: utf-8 -*-
"""
@date: 2021/1/4 上午11:32
@file: split_attention_conv2d.py
@author: zj
@description:
"""
from abc import ABC
import torch
import torch.nn as nn
from ..init_helper import init_weights
class SplitAttentionConv2d(nn.Module, ABC):
"""
ResNetSt的SplitAttention实现,参考:
1. https://github.com/open-mmlab/mmdetection/blob/master/mmdet/models/backbones/resnest.py
2. https://github.com/zhanghang1989/ResNeSt/blob/73b43ba63d1034dbf3e96b3010a8f2eb4cc3854f/resnest/torch/splat.py
部分参考./selective_kernel_conv2d.py实现
"""
def __init__(self,
# 输入通道数
in_channels,
# 输出通道数
out_channels,
# 每个group中的分离数
radix=2,
# cardinality
groups=1,
# 中间层衰减率
reduction_rate=4,
# 默认中间层最小通道数
default_channels: int = 32,
# 维度
dimension: int = 2
):
super(SplitAttentionConv2d, self).__init__()
# split
self.split = nn.Sequential(
nn.Conv2d(in_channels, out_channels * radix, kernel_size=3, stride=1, padding=1, bias=False,
groups=groups * radix),
nn.BatchNorm2d(out_channels * radix),
nn.ReLU(inplace=True)
)
# fuse
self.pool = nn.AdaptiveAvgPool2d((1, 1))
inner_channels = max(out_channels // reduction_rate, default_channels)
self.compact = nn.Sequential(
nn.Conv2d(out_channels, inner_channels, kernel_size=1, stride=1, padding=0, bias=False,
groups=groups),
nn.BatchNorm2d(inner_channels),
nn.ReLU(inplace=True)
)
# select
self.select = nn.Conv2d(inner_channels, out_channels * radix, kernel_size=1, stride=1, bias=False,
groups=groups)
self.softmax = nn.Softmax(dim=0)
self.dimension = dimension
self.out_channels = out_channels
self.radix = radix
init_weights(self.modules())
def forward(self, x):
# N, C, H, W = x.shape[:4]
# split
out = self.split(x)
split_out = torch.stack(torch.split(out, self.out_channels, dim=1))
# fuse
u = torch.sum(split_out, dim=0)
s = self.pool(u)
z = self.compact(s)
# select
c = self.select(z)
split_c = torch.stack(torch.split(c, self.out_channels, dim=1))
softmax_c = self.softmax(split_c)
v = torch.sum(split_out.mul(softmax_c), dim=0)
return v.contiguous()
and one of my question is why there is no need to add bn2/relu in Bottleneck when radix>0,Is it obtained through experiments?
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (1 by maintainers)
Top Results From Across the Web
No results found
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Hi @FrancescoSaverioZuppichini , the bn+relu is applied to the first conv, because it adds non-linearity between two convs (otherwise it is equivalent to a single one). There is no bn+relu for the second conv, because the softmax is a kind of non-linearity or activation function.
Same question, posting my implementation for completeness:
btw the bias in the first conv is useless but it is present in the original implementation, I guess it is an error
[Edit] After thinking about it, I think it makes sense because when radix > 1 softmax is applied (in rSoftmax) while when radix=0 sigmoid is used making it the same as SE. But there shouldn’t be a batchnorm and a ReLU