[docs] PIL image/enhance ; OpenCV; scikit-image ops <> torchvision transforms migration advice / summary table / test+diff images / comments in individual functions
See original GitHub issueOriginal name: [docs] F.gaussian_blur should specify relation of kernel_size / sigma to PIL's sigma
📚 The doc issue
A lot of legacy GaussianBlur augmentations like in https://github.com/facebookresearch/dino/blob/d2f3156/utils.py#L36 and in https://github.com/facebookresearch/unbiased-teacher/blob/main/ubteacher/data/transforms/augmentation_impl.py#L20 have used PIL’s GaussianBlur which has only a single radius
parameter. New torchvision’s GaussianBlur
has two parameters: hard kernel_size
and soft sigma
. It would be very useful if their semantics is explained in relation to existing/popular/legacy PIL’s arguments. This will help in porting the legacy code to new torchvision’s native GaussianBlur.
For reference, native pillow’s implementation: https://github.com/python-pillow/Pillow/blob/95cff6e959bb3c37848158ed2145d49d49806a31/src/libImaging/BoxBlur.c#L286. It also seems that Pillow’s implementation is actually a uniform weighted blur, not a true gaussian one
Related question on SO: https://stackoverflow.com/questions/62968174/for-pil-imagefilter-gaussianblur-how-what-kernel-is-used-and-does-the-radius-par which suggests that radius ~= sigma
Related issues:
Issue Analytics
- State:
- Created 2 years ago
- Comments:9
Top GitHub Comments
Well, that’s the problem 😃 It would be nice to start collecting a battery of test examples and small code snippets evaluating these functions with different options (or at least default options?) with different libraries: publish these images, html with all them and diff images. Especially if the ops are implemented slightly differently, these differences in results should be presented even if the root cause of difference isn’t diagnosed / researched yet.
Otherwise, users do this duplicate effort or just abandon it and have some hard-to-detect bugs in all these teacher-student self-supervised models that are relying on very particular hyper-params of augmentations to be set right.
My point is that incomplete, published to docs result of this testing are better than no testing and it being discussed in forums and passed from a PhD student to the next one as dark knowledge 😃
@vadimkantorov researchers have a freedom to use any kind of implementation and coin a name to it. When GaussianBlur was introduced to torchvision (https://github.com/pytorch/vision/issues/2635), the reference was swav project and https://github.com/facebookresearch/swav/blob/d4970c83791c8bd9928ec4efcc25d4fd788c405f/src/multicropdataset.py#L65 where they use opencv, fixed kernel size and sigma. swav historically is earlier than dino and others. I do not know the reason why they switched from one to another implementation (maybe, to kick-out opencv dependency?). Also PIL blur kernel size is limited as far as I remember.
As for https://github.com/pytorch/vision/issues/5194#issuecomment-1015316258 it is not about PIL vs torchvision (as PIL does not provide shear arg, only affine matrix) but how parametrization (rotation, scale, shear, translation -> affine matrix) is done in torchvision vs scikit-image and computer vision/graphics conventions.