Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Working with keras-ocr: float32 errors, non-square bounding boxes, and speed

See original GitHub issue

I am trying to extend keras-ocr with albumentations by wrapping its native image generator with custom one:

def get_image_generator(**args):
    import albumentations as A
    
    p=0.9
    pipe=A.Compose([
        ### Weather
        A.OneOf([
            A.RandomRain(p=p),
            A.RandomSunFlare(flare_roi=(0, 0, 1, 0.5), angle_lower=0.5, p=p),
            A.RandomShadow(num_shadows_lower=1, num_shadows_upper=3, shadow_dimension=7, shadow_roi=(0, 0.5, 1, 1), p=p),
        ],p=p),
        ### Colors, channels
        A.OneOf([
            A.RGBShift(r_shift_limit=15, g_shift_limit=15, b_shift_limit=15,p=p),
            A.ToGray(p=p,),
            A.ToSepia(p=p,),
            A.RandomBrightnessContrast(p=p),
            A.RandomGamma(p=p)    ,
            A.HueSaturationValue(hue_shift_limit=20, sat_shift_limit=50, val_shift_limit=50, p=p),
            A.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225),p=p),
            A.Equalize(p=p,),
            A.ChannelShuffle(p=p),
            A.ChannelDropout(p=p),        
        ],p=p),
        
        ### Blurring/sharpening
        A.OneOf([
            A.Blur(p=p,blur_limit=30),   
            A.GlassBlur(p=p,),
            A.MotionBlur(p=p,blur_limit=20),
            A.MedianBlur(blur_limit=9, p=p),   
            A.IAASharpen(p=p,),        
        ],p=p),
        ### Noise
        A.OneOf([
            A.IAAAdditiveGaussianNoise(p=p,),
            A.GaussNoise(p=p,),   
            A.CoarseDropout (p=p,max_height=40,max_width=40),
            A.Downscale(p=p,),
            A.MultiplicativeNoise(p=p,multiplier=(1.1,8)),
            A.IAASuperpixels(p=p),        
        ],p=p),
        ],p=0.5)
    inner_gen=keras_ocr.data_generation.get_image_generator(**args)
    for image,lines in inner_gen:
        new_image=pipe(image=image)['image']
        new_lines=lines
        yield new_image, new_lines

1) I am randomly getting several kinds of errors when running this pipeline:

~/anaconda3/envs/tensorflow2_latest_p37/lib/python3.7/site-packages/imgaug/dtypes.py in gate_dtypes(dtypes, allowed, disallowed, augmenter) 330 augmenter.name, 331 augmenter.class.name, –> 332 ", ".join(disallowed) 333 )) 334 else:

ValueError: Got dtype ‘float32’ in augmenter ‘UnnamedSuperpixels’ (class ‘Superpixels’), which is a forbidden dtype (uint128, uint256, int128, int256, float16, float32, float64, float96, float128, float256).

and

File “/home/ubuntu/anaconda3/envs/tensorflow2_latest_p37/lib/python3.7/site-packages/albumentations/augmentations/functional.py”, line 759, in median_blur “Invalid ksize value {}. For a float32 image the only valid ksize values are 3 and 5”.format(ksize)

ValueError: Invalid ksize value 7. For a float32 image the only valid ksize values are 3 and 5

Is this a know behaviour of IAASuperpixelsand MedianBlur transforms in a pipeline? How can I assure they always get input of uint8 format, if it’s obviously previous transforms in my pipeline which return float32 ('cause keras-ocr returns image = (alpha * text_image[…, :3] + (1 - alpha) * current_background).astype(‘uint8’))? Also, as I’m feeding outputs to a neural network, is there a way to guarantee a pipeline to always return rescaled images?

2) Speed consideration:

Abovementioned pipeline has average generation speed of 3.81 im/sec on 800x600 size images, compared with 4.26 im/s native performance of the keras-ocr generator. CPU is 100% busy. Is ~0.5 sec/image albumentations overhead normal to expect?

3) Bounding boxes

I can’t use spatial transformations, because albumentations seems to only support 2-points, 4 coordinates notation for bounding boxes, and they are expected to be rectangular and strictly parallel to the x,y axes. Which is not the case with keras-ocr, which uses 4 points, 8 coordinates to store its bounding boxes. Are there any plans to support such extended format of bounding boxes in albumentations? Do you have any advice for my use case?

My system info:

Albumentations: 0.4.6 Tensorflow: 2.3.0 System Platform: linux System Platform: linux System Version: 3.7.6 (default, Jan 8 2020, 19:59:22) [GCC 7.3.0] Machine: x86_64 Platform: Linux-5.3.0-1035-aws-x86_64-with-debian-buster-sid Pocessor: x86_64 System OS: Linux Release: 5.3.0-1035-aws Version: #37-Ubuntu SMP Sun Sep 6 01:17:09 UTC 2020 Number of CPUs: 8 Number of Physical CPUs: 4

Issue Analytics

State:
Created 3 years ago
Reactions:1
Comments:8 (1 by maintainers)

Top GitHub Comments

1reaction

Dipetcommented, Oct 9, 2020

Some of your transforms need expensive computations. For example in your pipeline I see GlassBlur, this transform repeats GaussianBlur several times and do many substitutions on image. If you will remove GlassBlur you will see high speed increase.

1reaction

Dipetcommented, Oct 9, 2020

Can you provide keras pipeline that is faster, than Albumentations? I will be very grateful to see it and trying to improve Albumentations results. Maybe albumentations used single thread? Many our transforms by default used single thread, but keras can use more than 1 thread for some functions.

About GPU augmentations it is complex problem. In many cases we can not have enough memory and GPU power to do augmentations on GPU. For this reason current strategy to delegate part of work to CPU. When CPU do augmentations GPU do inference on another batch. At this time I recommend to use kornia or DALI if you need GPU augmentations.