Working with keras-ocr: float32 errors, non-square bounding boxes, and speed
See original GitHub issueI am trying to extend keras-ocr with albumentations by wrapping its native image generator with custom one:
def get_image_generator(**args):
import albumentations as A
p=0.9
pipe=A.Compose([
### Weather
A.OneOf([
A.RandomRain(p=p),
A.RandomSunFlare(flare_roi=(0, 0, 1, 0.5), angle_lower=0.5, p=p),
A.RandomShadow(num_shadows_lower=1, num_shadows_upper=3, shadow_dimension=7, shadow_roi=(0, 0.5, 1, 1), p=p),
],p=p),
### Colors, channels
A.OneOf([
A.RGBShift(r_shift_limit=15, g_shift_limit=15, b_shift_limit=15,p=p),
A.ToGray(p=p,),
A.ToSepia(p=p,),
A.RandomBrightnessContrast(p=p),
A.RandomGamma(p=p) ,
A.HueSaturationValue(hue_shift_limit=20, sat_shift_limit=50, val_shift_limit=50, p=p),
A.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225),p=p),
A.Equalize(p=p,),
A.ChannelShuffle(p=p),
A.ChannelDropout(p=p),
],p=p),
### Blurring/sharpening
A.OneOf([
A.Blur(p=p,blur_limit=30),
A.GlassBlur(p=p,),
A.MotionBlur(p=p,blur_limit=20),
A.MedianBlur(blur_limit=9, p=p),
A.IAASharpen(p=p,),
],p=p),
### Noise
A.OneOf([
A.IAAAdditiveGaussianNoise(p=p,),
A.GaussNoise(p=p,),
A.CoarseDropout (p=p,max_height=40,max_width=40),
A.Downscale(p=p,),
A.MultiplicativeNoise(p=p,multiplier=(1.1,8)),
A.IAASuperpixels(p=p),
],p=p),
],p=0.5)
inner_gen=keras_ocr.data_generation.get_image_generator(**args)
for image,lines in inner_gen:
new_image=pipe(image=image)['image']
new_lines=lines
yield new_image, new_lines
1) I am randomly getting several kinds of errors when running this pipeline:
~/anaconda3/envs/tensorflow2_latest_p37/lib/python3.7/site-packages/imgaug/dtypes.py in gate_dtypes(dtypes, allowed, disallowed, augmenter) 330 augmenter.name, 331 augmenter.class.name, –> 332 ", ".join(disallowed) 333 )) 334 else:
ValueError: Got dtype ‘float32’ in augmenter ‘UnnamedSuperpixels’ (class ‘Superpixels’), which is a forbidden dtype (uint128, uint256, int128, int256, float16, float32, float64, float96, float128, float256).
and
File “/home/ubuntu/anaconda3/envs/tensorflow2_latest_p37/lib/python3.7/site-packages/albumentations/augmentations/functional.py”, line 759, in median_blur “Invalid ksize value {}. For a float32 image the only valid ksize values are 3 and 5”.format(ksize)
ValueError: Invalid ksize value 7. For a float32 image the only valid ksize values are 3 and 5
Is this a know behaviour of IAASuperpixelsand MedianBlur transforms in a pipeline? How can I assure they always get input of uint8 format, if it’s obviously previous transforms in my pipeline which return float32 ('cause keras-ocr returns image = (alpha * text_image[…, :3] + (1 - alpha) * current_background).astype(‘uint8’))? Also, as I’m feeding outputs to a neural network, is there a way to guarantee a pipeline to always return rescaled images?
2) Speed consideration:
Abovementioned pipeline has average generation speed of 3.81 im/sec on 800x600 size images, compared with 4.26 im/s native performance of the keras-ocr generator. CPU is 100% busy. Is ~0.5 sec/image albumentations overhead normal to expect?
3) Bounding boxes
I can’t use spatial transformations, because albumentations seems to only support 2-points, 4 coordinates notation for bounding boxes, and they are expected to be rectangular and strictly parallel to the x,y axes. Which is not the case with keras-ocr, which uses 4 points, 8 coordinates to store its bounding boxes. Are there any plans to support such extended format of bounding boxes in albumentations? Do you have any advice for my use case?
My system info:
Albumentations: 0.4.6 Tensorflow: 2.3.0 System Platform: linux System Platform: linux System Version: 3.7.6 (default, Jan 8 2020, 19:59:22) [GCC 7.3.0] Machine: x86_64 Platform: Linux-5.3.0-1035-aws-x86_64-with-debian-buster-sid Pocessor: x86_64 System OS: Linux Release: 5.3.0-1035-aws Version: #37-Ubuntu SMP Sun Sep 6 01:17:09 UTC 2020 Number of CPUs: 8 Number of Physical CPUs: 4
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:8 (1 by maintainers)
Some of your transforms need expensive computations. For example in your pipeline I see
GlassBlur
, this transform repeatsGaussianBlur
several times and do many substitutions on image. If you will removeGlassBlur
you will see high speed increase.Can you provide keras pipeline that is faster, than Albumentations? I will be very grateful to see it and trying to improve Albumentations results. Maybe albumentations used single thread? Many our transforms by default used single thread, but keras can use more than 1 thread for some functions.
About GPU augmentations it is complex problem. In many cases we can not have enough memory and GPU power to do augmentations on GPU. For this reason current strategy to delegate part of work to CPU. When CPU do augmentations GPU do inference on another batch. At this time I recommend to use kornia or DALI if you need GPU augmentations.