The prototype `FiveCrop`/`TenCrop`/`BatchMultiCrop` produce unexpected results
See original GitHub issueš Describe the bug
The current implementation of the transforms produce unexpected results for Classification.
- If a single image comes in (no batch&collated), a batched result comes out with a single label
- if a batch of images comes in, we get a new image batch with 5x the original size and a label vector with length equal to the original size
In both cases, the X
and the Y
lengths donāt match. This creates issues on validation pipelines.
One approach would be to duplicate the length of Labels to ensure they match. Itās quite likely the same approach should be considered for meta-data (like ids) and other content included in the record. Things get even more complex if we consider extending the transform to Detection as we would need to crop the Masks/BBoxes, Cleanup the BBoxes and sync the Labels for each of the new crops.
This complexity is possible the reason why existing the FiveCrop
implementation on stable donāt offer a way to stack the result for the user but instead it just provides an example in the documentation on how it can be done. Due to the above, I believe that the current implementation of BatchMultiCrop
is flawed and should be either removed or redeveloped.
Versions
Latest main branch
Issue Analytics
- State:
- Created a year ago
- Comments:8 (4 by maintainers)
Looks OK to me.
Another thing I would avoid, if you agree, is the use of
transforms.MultiCropResult
. Why canāt we put everything a simple List that clearly shows to the signature what is the expected input (List[features.Image
). I understand that on the future, if we want to provide such a transform, we would need to be able to identify that this is a special type of list. But if this list is not used at the moment, it only gets in the way for the user. Such a future change can happen in a BC manner I think.Let me know if Iām missing something important here.
That would look like
Note that we cannot have a
forward(self, images: transforms.MultiCropResult, labels: features.Label)
signature, because even if you usetransform = transforms.Compose([transforms.FiveCrop(), BatchMultiCrop()])
withtransform(image, label)
, the output ofFiveCrop
will always be a tuple. The only way to avoid this would be to remove theFiveCrop
from the example, but Iām not sure if we arenāt then wading into āuselessā example since we showcase something that will not happen in practice and move the burden entirely on the user.