W and H computation in `crop` method of several target structures seems wrong
See original GitHub issue🐛 Bug
It seems that the crop()
methods present in many target structure classes, e.g. BoxList
, BinaryMaskList
, and Polygons
, calculate the height and width wrongly.
BoxList
class
In its crop(self, box)
method, L173-L178:
w, h = box[2] - box[0], box[3] - box[1]
cropped_xmin = (xmin - box[0]).clamp(min=0, max=w)
cropped_ymin = (ymin - box[1]).clamp(min=0, max=h)
cropped_xmax = (xmax - box[0]).clamp(min=0, max=w)
cropped_ymax = (ymax - box[1]).clamp(min=0, max=h)
Since box
defines the corner coordinates of the desired crop region, its width and height should be box[2] - box[0] + 1
and box[3] - box[1] + 1
respectively. The correct clamp
max bound in the following four lines of code should be w - 1
and h - 1
.
Therefore, the above code block should be:
w, h = box[2] - box[0] + 1, box[3] - box[1] + 1
cropped_xmin = (xmin - box[0]).clamp(min=0, max=w - 1)
cropped_ymin = (ymin - box[1]).clamp(min=0, max=h - 1)
cropped_xmax = (xmax - box[0]).clamp(min=0, max=w - 1)
cropped_ymax = (ymax - box[1]).clamp(min=0, max=h - 1)
While at first glance it seems that the wrong code cancels itself out, and doesn’t impact the resultant cropped_x
s and cropped_y
s, it does cause a wrong bounding box size later, in L187 of the same method:
bbox = BoxList(cropped_box, (w, h), mode="xyxy")
The cropped BoxList
would be smaller than the desired crop box
, by 1px in both x
and y
direction.
BinaryMaskList
, PolygonInstance
, PolygonList
class
Similarly, in the crop()
method of these classes, the computed w
and h
are 1px smaller than the correct values.
BinaryMaskList.crop(self, box)
L92-L111:
def crop(self, box):
...
width, height = xmax - xmin, ymax - ymin
cropped_masks = self.masks[:, ymin:ymax, xmin:xmax]
cropped_size = width, height
return BinaryMaskList(cropped_masks, cropped_size)
PolygonInstance.crop(self, box)
L242-L268:
def crop(self, box):
...
w, h = xmax - xmin, ymax - ymin
cropped_polygons = []
for poly in self.polygons:
p = poly.clone()
p[0::2] = p[0::2] - xmin # .clamp(min=0, max=w)
p[1::2] = p[1::2] - ymin # .clamp(min=0, max=h)
cropped_polygons.append(p)
return PolygonInstance(cropped_polygons, size=(w, h))
PolygonList.crop(self, box)
L381-L388:
def crop(self, box):
w, h = box[2] - box[0], box[3] - box[1]
cropped_polygons = []
for polygon in self.polygons:
cropped_polygons.append(polygon.crop(box))
cropped_size = w, h
return PolygonList(cropped_polygons, cropped_size)
Impact
The impact is usually unfelt if one just runs the code base as is, since these mistakes seem quite self-consistent. However, besides the implementation being fundamentally incorrect, it also adds unnecessary annoyances to people developing new functionalities based on this code base.
Issue Analytics
- State:
- Created 4 years ago
- Reactions:5
- Comments:12 (12 by maintainers)
Top GitHub Comments
It is a matter of convention, but there are good convention and bad convention.
“+1” is a bad convention because when x2 = x1 + w - 1, x2 is not scale-invariant, meaning that all code written to scale the boxes are not precise. Without the “+1”, a box that’s half of the image is simply (0.0, 0.0, 128.0, 128.0).
You’re right and there should not be. This is just a historical issue and we’re slowly correcting it (e.g., https://github.com/pytorch/pytorch/commit/373e6a78bf6afaf66d2749740036b077b70312c8 )