Clarify keypoints behaviour and their limitations
See original GitHub issueDescribe the bug
TL; TR: document kornia convention for keypoints and their current limitations or fix their calculation. This affects to bbox_v2 PR (#1304).
Although kornia uses floats for keypoints, it assumes that they come from integers pointing out the pixel coordinate (index based convention). So, they are really meaning pixel center -> i+0.5 ( point (3,4) refers to the center of the pixel (3,4) that it’s (3.5, 4.5)). As they are floats, users expect that keypoints represent exact coordinates, for example, (0.04, 6.4). Below, you can see an example.
Why this happens?
- Pixels in images are index based. So their matrix transforms assumes it.
- Augmentation containers shares the same matrix transform between images, keypoints and bboxes.
- Keypoints under exact coordinates uses an slightly different formulas. See here and here
- Bboxes using “width = xmax - xmin” convention assumes that underlying vertices (boxes keypoints) are expressed as exact coordinates while “width = xmax - xmin +1” convention as index based.
This leads to:
- Kornia can only handle keypoints coming from integers.
- Keypoints using exact coordinates values after transforms are inaccurate. The real value may be 1 pixel off. I don’t think that zoom affects (not 100% sure here).
- This affects model accuracy.
- It could lead to subtle bugs when manually inverting a transform made by kornia. When detectron fixed a similar error, boxes AP increased between 0.4 - 0.7, see here.
- Boxes/Bouding boxes computation must be made with “width = xmax - xmin +1” convention as the offset to exact coordinates is compensated.
- This affects to Bbox_v2 PR. Internal boxes representation must be changed from “width = xmax - xmin” to “width = xmax - xmin +1” convention.
So, a decision should be made:
- Refactor kornia to be accurate with keypoints by using exact coordinates.
- Leave as it + document in augmentation containers + update bbox_v2 internal format.
- Leave as it + update bbox_v2 internal format. Keypoints in exact coordinates (float) will be slightly inaccurate.
Reproduction steps
Note: keypoints valid range depends on their convention.
- Pixel indexes (integers): [0, W-1 or H-1]. For example (3, 5).
- Exact coordinates (must be floats): [0, W or H].
Here, you can see a graphical example
from kornia import augmentation as K
import torch
def to_xyxy(bbox):
return torch.stack([bbox.amin(dim=-2), bbox.amax(dim=-2)], dim=-2).view(
bbox.shape[0], bbox.shape[1], 4
)
img = torch.eye(4).expand(1,3, -1, -1) # torch.rand(1, 3, 4, 4)
h, w = img.shape[-2:]
pts = torch.tensor([[0, 0],
[w-1, h-1], # Max value for pixel indexes convention
[w, h]]) # Max value for exact coordinate convention
bbox = torch.tensor([[
[[0,h//4],[w-1,0],[w-1,h//2],[0, h//2]], # convention width = xmax - xmin + 1
[[0,h//4],[w,0],[w,h//2],[0, h//2]], # convention width = xmax - xmin
]])
print('Original values')
print(img[0,0,...], pts, bbox, to_xyxy(bbox))
(tensor([[1., 0., 0., 0.],
[0., 1., 0., 0.],
[0., 0., 1., 0.],
[0., 0., 0., 1.]]),
tensor([[0, 0],
[3, 3],
[4, 4]]),
tensor([[[[0, 1],
[3, 0],
[3, 2],
[0, 2]],
[[0, 1],
[4, 0],
[4, 2],
[0, 2]]]]),
tensor([[[0, 0, 3, 2],
[0, 0, 4, 2]]]))
Here, what happens for horizontal flips:
tfm= K.AugmentationSequential(
K.RandomHorizontalFlip(p=1.),
data_keys=["input", "bbox", "keypoints"],
return_transform=True,
same_on_batch=False
)
(img_out, M), pts_out, bbox_out = tfm(img, pts.float(), bbox.float())
print(img_out[0, 0, ...], pts_out, bbox_out, to_xyxy(bbox_out))
(tensor([[0., 0., 0., 1.],
[0., 0., 1., 0.],
[0., 1., 0., 0.],
[1., 0., 0., 0.]]),
tensor([[ 3., 0.],
[ 0., 3.],
[-1., 4.]]),
tensor([[[[ 3., 1.],
[ 0., 0.],
[ 0., 2.],
[ 3., 2.]],
[[ 3., 1.],
[-1., 0.],
[-1., 2.],
[ 3., 2.]]]]),
tensor([[[ 0., 0., 3., 2.],
[-1., 0., 3., 2.]]]))
As you can see, keypoints and bboxes are only accurate is under pixel indexes convention (integers). Points at top-left (origin) are flipped correctly assuming pixel index convention. Points at top-right under pixel index convention are flipped correctly to the origin (0,0) but not those under exact coordinates as one becomes negative (-1, 4). Finally, you can see that only boxes under width=… +1 convention are right.
The same happens with random rotations:
tfm1= K.AugmentationSequential(
K.RandomAffine([180,180], p=1.),
data_keys=["input", "bbox", "keypoints"],
return_transform=True,
same_on_batch=False
)
(img_out, M1), pts_out, bbox_out = tfm1(img, pts.float(), bbox.float())
print(img_out[0, 0, ...].round(), pts_out.round(), bbox_out.round(), to_xyxy(bbox_out))
(tensor([[1., 0., 0., 0.],
[0., 1., 0., 0.],
[0., 0., 1., 0.],
[0., 0., 0., 1.]]),
tensor([[ 3., 3.],
[ 0., -0.],
[-1., -1.]]),
tensor([[[[ 3., 2.],
[ 0., 3.],
[ 0., 1.],
[ 3., 1.]],
[[ 3., 2.],
[-1., 3.],
[-1., 1.],
[ 3., 1.]]]]),
tensor([[[ 0.0000, 1.0000, 3.0000, 3.0000],
[-1.0000, 1.0000, 3.0000, 3.0000]]]))
Issue Analytics
- State:
- Created 2 years ago
- Reactions:2
- Comments:5 (3 by maintainers)
Top GitHub Comments
@shijianjian The boxes workaround is implemented in one unique commit (07795f9c78468b42ecd96f4d5d2b8df562deecf7) on #1304 to be easier to revert it in the future.
Great! Let’s fix the boxes firstly.
Not sure about this either, since they are functions that can be used for other data modalities than indices.
@ducha-aiki @edgarriba Any insights in this?
@sshaoshuai I am wondering if this numerical error would affect OpenPCDet?