Add parameter for Deformable Convolution offset group scalar value
See original GitHub issue🚀 Feature
Currently, the scalar used to calculate the number of deformable groups is hardcoded at 2. I would like for a parameter to be added that allows this number to be anything in order to have compatibility with repositories such as EDVR which use 3 for this value.
I have already added it myself and was going to submit a PR before reading that I should submit an issue first.
Motivation
I am currently trying to replace the MMdetection Deformable Convolution v2 with the Torchvision one for the EDVR repository. However, for its offsets, it calculates the out_nc size using this formula: self.deformable_groups * 3 * self.kernel_size[0] * self.kernel_size[1]
. The usual formula, which the current Torchvision implementation expects, is self.deformable_groups * 2 * self.kernel_size[0] * self.kernel_size[1]
. As you can see, they use a 3 in this calculation instead of a 2. I’m not entirely sure why, but it doesn’t work unless it uses 3.
This causes an issue when using the Torchvision implementation, as in order to calculate the number of offset groups (called deformable groups in the formula above), it requires that scalar value to be 2.
Pitch
I would like for a parameter to be added that would allow me to change this value, like so.
Alternatives
Another alternative could be to allow the number of offset groups to be passed in instead of being auto-calculated, as that is what the MMDetection version does.
Additional context
None.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:8 (4 by maintainers)
I agree with @fmassa that the
2
in torchvision’s implementation refers to the h and w dimensions.From section 3.2 of the DeformConv v2 paper https://arxiv.org/abs/1811.11168:
where K is
self.kernel_size[0] * self.kernel_size[1]
. So the difference between 2 and 3 seems to come from the modulation scalars.I could be wrong as I’m not super familiar with the paper nor the implementation, but I believe those modulation scalars actually correspond to the
mask
parameter.I’ll close the issue, please feel free to re-open if there are still some doubts.
@JoeyBallentine ok, so from my understanding then this was a user error as the shapes of offsets and masks were not correct, so we couldn’t properly infer the number of offset groups.
BTW, I would not recommend calling directly through the
torch.ops.torchvision.deform_conv2d
as it is an implementation detail and can change at any time without notice. So it might be preferable to fix the code upstream then on relying on internal implementations