Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Clarify keypoints behaviour and their limitations

See original GitHub issue

Describe the bug

TL; TR: document kornia convention for keypoints and their current limitations or fix their calculation. This affects to bbox_v2 PR (#1304).

Although kornia uses floats for keypoints, it assumes that they come from integers pointing out the pixel coordinate (index based convention). So, they are really meaning pixel center -> i+0.5 ( point (3,4) refers to the center of the pixel (3,4) that it’s (3.5, 4.5)). As they are floats, users expect that keypoints represent exact coordinates, for example, (0.04, 6.4). Below, you can see an example.

Why this happens?

Pixels in images are index based. So their matrix transforms assumes it.
Augmentation containers shares the same matrix transform between images, keypoints and bboxes.
Keypoints under exact coordinates uses an slightly different formulas. See here and here
Bboxes using “width = xmax - xmin” convention assumes that underlying vertices (boxes keypoints) are expressed as exact coordinates while “width = xmax - xmin +1” convention as index based.

This leads to:

Kornia can only handle keypoints coming from integers.
Keypoints using exact coordinates values after transforms are inaccurate. The real value may be 1 pixel off. I don’t think that zoom affects (not 100% sure here).
- This affects model accuracy.
- It could lead to subtle bugs when manually inverting a transform made by kornia. When detectron fixed a similar error, boxes AP increased between 0.4 - 0.7, see here.
Boxes/Bouding boxes computation must be made with “width = xmax - xmin +1” convention as the offset to exact coordinates is compensated.
- This affects to Bbox_v2 PR. Internal boxes representation must be changed from “width = xmax - xmin” to “width = xmax - xmin +1” convention.

So, a decision should be made:

Refactor kornia to be accurate with keypoints by using exact coordinates.
Leave as it + document in augmentation containers + update bbox_v2 internal format.
Leave as it + update bbox_v2 internal format. Keypoints in exact coordinates (float) will be slightly inaccurate.

Reproduction steps

Note: keypoints valid range depends on their convention.

Pixel indexes (integers): [0, W-1 or H-1]. For example (3, 5).

Exact coordinates (must be floats): [0, W or H].

Here, you can see a graphical example

from kornia import augmentation as K
import torch

def to_xyxy(bbox):
    return torch.stack([bbox.amin(dim=-2), bbox.amax(dim=-2)], dim=-2).view(
            bbox.shape[0], bbox.shape[1], 4
        )

img = torch.eye(4).expand(1,3, -1, -1) # torch.rand(1, 3, 4, 4)
h, w = img.shape[-2:]
pts = torch.tensor([[0, 0],
                    [w-1, h-1], # Max value for pixel indexes convention
                    [w, h]])  #  Max value for exact coordinate convention
bbox = torch.tensor([[
    [[0,h//4],[w-1,0],[w-1,h//2],[0, h//2]],  # convention width = xmax - xmin + 1
    [[0,h//4],[w,0],[w,h//2],[0, h//2]], # convention width = xmax - xmin
]])

print('Original values')
print(img[0,0,...], pts, bbox, to_xyxy(bbox))

(tensor([[1., 0., 0., 0.],
         [0., 1., 0., 0.],
         [0., 0., 1., 0.],
         [0., 0., 0., 1.]]),
 tensor([[0, 0],
         [3, 3],
         [4, 4]]),
 tensor([[[[0, 1],
           [3, 0],
           [3, 2],
           [0, 2]],
 
          [[0, 1],
           [4, 0],
           [4, 2],
           [0, 2]]]]),
 tensor([[[0, 0, 3, 2],
          [0, 0, 4, 2]]]))

Here, what happens for horizontal flips:

tfm= K.AugmentationSequential(
    K.RandomHorizontalFlip(p=1.),
    data_keys=["input", "bbox", "keypoints"],
    return_transform=True,
    same_on_batch=False
)

(img_out, M), pts_out, bbox_out = tfm(img, pts.float(), bbox.float())
print(img_out[0, 0, ...], pts_out, bbox_out, to_xyxy(bbox_out))

(tensor([[0., 0., 0., 1.],
         [0., 0., 1., 0.],
         [0., 1., 0., 0.],
         [1., 0., 0., 0.]]),
 tensor([[ 3.,  0.],
         [ 0.,  3.],
         [-1.,  4.]]),
 tensor([[[[ 3.,  1.],
           [ 0.,  0.],
           [ 0.,  2.],
           [ 3.,  2.]],
 
          [[ 3.,  1.],
           [-1.,  0.],
           [-1.,  2.],
           [ 3.,  2.]]]]),
 tensor([[[ 0.,  0.,  3.,  2.],
          [-1.,  0.,  3.,  2.]]]))

As you can see, keypoints and bboxes are only accurate is under pixel indexes convention (integers). Points at top-left (origin) are flipped correctly assuming pixel index convention. Points at top-right under pixel index convention are flipped correctly to the origin (0,0) but not those under exact coordinates as one becomes negative (-1, 4). Finally, you can see that only boxes under width=… +1 convention are right.

The same happens with random rotations:

tfm1= K.AugmentationSequential(
    K.RandomAffine([180,180], p=1.),
    data_keys=["input", "bbox", "keypoints"],
    return_transform=True,
    same_on_batch=False
)

(img_out, M1), pts_out, bbox_out = tfm1(img, pts.float(), bbox.float())
print(img_out[0, 0, ...].round(), pts_out.round(), bbox_out.round(), to_xyxy(bbox_out))

(tensor([[1., 0., 0., 0.],
         [0., 1., 0., 0.],
         [0., 0., 1., 0.],
         [0., 0., 0., 1.]]),
 tensor([[ 3.,  3.],
         [ 0., -0.],
         [-1., -1.]]),
 tensor([[[[ 3.,  2.],
           [ 0.,  3.],
           [ 0.,  1.],
           [ 3.,  1.]],
 
          [[ 3.,  2.],
           [-1.,  3.],
           [-1.,  1.],
           [ 3.,  1.]]]]),
 tensor([[[ 0.0000,  1.0000,  3.0000,  3.0000],
          [-1.0000,  1.0000,  3.0000,  3.0000]]]))

Issue Analytics

State:
Created 2 years ago
Reactions:2
Comments:5 (3 by maintainers)

Top GitHub Comments

1reaction

hal-314commented, Oct 23, 2021

@shijianjian The boxes workaround is implemented in one unique commit (07795f9c78468b42ecd96f4d5d2b8df562deecf7) on #1304 to be easier to revert it in the future.

0reactions

shijianjiancommented, Oct 18, 2021

For boxes, yes, we can handle internally in boxes by using “+1” convention. No need to worry the end user 😃 . As in bbox_v2 the internal representation is almost not exposed to the end user (only through constructor), we can make the change transparent to users after the refactoring.

Great! Let’s fix the boxes firstly.

In case of keypoints, I don’t think it’s possible. Adding +0.5 doesn’t fix it (I tried before posting with flips). Also, I’m not sure what would happen with perspective or scaling transforms.

Not sure about this either, since they are functions that can be used for other data modalities than indices.

@ducha-aiki @edgarriba Any insights in this?

@sshaoshuai I am wondering if this numerical error would affect OpenPCDet?

Top Results From Across the Web

The Social Cognitive Theory - SPH - Boston University

There are several limitations of SCT, which should be considered when using this theory in public health. Limitations of the model include the ......

Behaviorism - Simply Psychology

Behaviorism, also known as behavioral psychology, is a theory of ... The behaviorist, in his efforts to get a unitary scheme of animal ......

Theory of Planned Behavior - an overview - ScienceDirect.com

The studies help to identify where, and when, behaviour change interventions may be successful, but they also point out some of their limitations....

Introduction to Social Learning Theory in Social Work

This limits a person's behavior to either nature or nurture, rather than recognizing that behavior is the interaction of both one's biology and...

What is Cognitive Behavioral Therapy?

Cognitive behavioral therapy (CBT) is a form of psychological treatment that has been demonstrated to be effective for a range of problems including ......