question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Working with keras-ocr: float32 errors, non-square bounding boxes, and speed

See original GitHub issue

I am trying to extend keras-ocr with albumentations by wrapping its native image generator with custom one:

def get_image_generator(**args):
    import albumentations as A
    
    p=0.9
    pipe=A.Compose([
        ### Weather
        A.OneOf([
            A.RandomRain(p=p),
            A.RandomSunFlare(flare_roi=(0, 0, 1, 0.5), angle_lower=0.5, p=p),
            A.RandomShadow(num_shadows_lower=1, num_shadows_upper=3, shadow_dimension=7, shadow_roi=(0, 0.5, 1, 1), p=p),
        ],p=p),
        ### Colors, channels
        A.OneOf([
            A.RGBShift(r_shift_limit=15, g_shift_limit=15, b_shift_limit=15,p=p),
            A.ToGray(p=p,),
            A.ToSepia(p=p,),
            A.RandomBrightnessContrast(p=p),
            A.RandomGamma(p=p)    ,
            A.HueSaturationValue(hue_shift_limit=20, sat_shift_limit=50, val_shift_limit=50, p=p),
            A.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225),p=p),
            A.Equalize(p=p,),
            A.ChannelShuffle(p=p),
            A.ChannelDropout(p=p),        
        ],p=p),
        
        ### Blurring/sharpening
        A.OneOf([
            A.Blur(p=p,blur_limit=30),   
            A.GlassBlur(p=p,),
            A.MotionBlur(p=p,blur_limit=20),
            A.MedianBlur(blur_limit=9, p=p),   
            A.IAASharpen(p=p,),        
        ],p=p),
        ### Noise
        A.OneOf([
            A.IAAAdditiveGaussianNoise(p=p,),
            A.GaussNoise(p=p,),   
            A.CoarseDropout (p=p,max_height=40,max_width=40),
            A.Downscale(p=p,),
            A.MultiplicativeNoise(p=p,multiplier=(1.1,8)),
            A.IAASuperpixels(p=p),        
        ],p=p),
        ],p=0.5)
    inner_gen=keras_ocr.data_generation.get_image_generator(**args)
    for image,lines in inner_gen:
        new_image=pipe(image=image)['image']
        new_lines=lines
        yield new_image, new_lines

1) I am randomly getting several kinds of errors when running this pipeline:

~/anaconda3/envs/tensorflow2_latest_p37/lib/python3.7/site-packages/imgaug/dtypes.py in gate_dtypes(dtypes, allowed, disallowed, augmenter) 330 augmenter.name, 331 augmenter.class.name, –> 332 ", ".join(disallowed) 333 )) 334 else:

ValueError: Got dtype ‘float32’ in augmenter ‘UnnamedSuperpixels’ (class ‘Superpixels’), which is a forbidden dtype (uint128, uint256, int128, int256, float16, float32, float64, float96, float128, float256).

and

File “/home/ubuntu/anaconda3/envs/tensorflow2_latest_p37/lib/python3.7/site-packages/albumentations/augmentations/functional.py”, line 759, in median_blur “Invalid ksize value {}. For a float32 image the only valid ksize values are 3 and 5”.format(ksize)

ValueError: Invalid ksize value 7. For a float32 image the only valid ksize values are 3 and 5

Is this a know behaviour of IAASuperpixelsand MedianBlur transforms in a pipeline? How can I assure they always get input of uint8 format, if it’s obviously previous transforms in my pipeline which return float32 ('cause keras-ocr returns image = (alpha * text_image[…, :3] + (1 - alpha) * current_background).astype(‘uint8’))? Also, as I’m feeding outputs to a neural network, is there a way to guarantee a pipeline to always return rescaled images?

2) Speed consideration:

Abovementioned pipeline has average generation speed of 3.81 im/sec on 800x600 size images, compared with 4.26 im/s native performance of the keras-ocr generator. CPU is 100% busy. Is ~0.5 sec/image albumentations overhead normal to expect?

3) Bounding boxes

I can’t use spatial transformations, because albumentations seems to only support 2-points, 4 coordinates notation for bounding boxes, and they are expected to be rectangular and strictly parallel to the x,y axes. Which is not the case with keras-ocr, which uses 4 points, 8 coordinates to store its bounding boxes. Are there any plans to support such extended format of bounding boxes in albumentations? Do you have any advice for my use case?

My system info:

Albumentations: 0.4.6 Tensorflow: 2.3.0 System Platform: linux System Platform: linux System Version: 3.7.6 (default, Jan 8 2020, 19:59:22) [GCC 7.3.0] Machine: x86_64 Platform: Linux-5.3.0-1035-aws-x86_64-with-debian-buster-sid Pocessor: x86_64 System OS: Linux Release: 5.3.0-1035-aws Version: #37-Ubuntu SMP Sun Sep 6 01:17:09 UTC 2020 Number of CPUs: 8 Number of Physical CPUs: 4

Issue Analytics

  • State:open
  • Created 3 years ago
  • Reactions:1
  • Comments:8 (1 by maintainers)

github_iconTop GitHub Comments

1reaction
Dipetcommented, Oct 9, 2020

Some of your transforms need expensive computations. For example in your pipeline I see GlassBlur, this transform repeats GaussianBlur several times and do many substitutions on image. If you will remove GlassBlur you will see high speed increase.

1reaction
Dipetcommented, Oct 9, 2020

Can you provide keras pipeline that is faster, than Albumentations? I will be very grateful to see it and trying to improve Albumentations results. Maybe albumentations used single thread? Many our transforms by default used single thread, but keras can use more than 1 thread for some functions.

About GPU augmentations it is complex problem. In many cases we can not have enough memory and GPU power to do augmentations on GPU. For this reason current strategy to delegate part of work to CPU. When CPU do augmentations GPU do inference on another batch. At this time I recommend to use kornia or DALI if you need GPU augmentations.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Object detection: Bounding box regression with Keras ...
To learn how to perform object detection via bounding box regression with Keras, TensorFlow, and Deep Learning, just keep reading.
Read more >
LayoutLMV2
This is the simplest case, in which the processor (actually the feature extractor) will perform OCR on the image to get the words...
Read more >
Selecting the Right Bounding Box Using Non-Max ...
So this functions returns the list of bounding box/boxes to keep as an output, in the decreasing order of objectiveness score. Since I...
Read more >
python - Keras_ocr - Failed to recognize_from_boxes
I am working with the keras_ocr recognizer and I want it to scan only inside the bounding boxes which I already selected.
Read more >
How to Train an Object Detection Model with Keras
It does not work with TensorFlow 2.0+ or Keras 2.2.5+ because a ... to learn to predict both bounding boxes for objects as...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found