question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[RFC] How do we want to deal with images that include alpha channels?

See original GitHub issue

This discussion started in https://github.com/pytorch/vision/pull/5500#discussion_r816503203 and @vfdev-5 and I continued offline.

PIL as well as our image reading functions support RGBA images

https://github.com/pytorch/vision/blob/95d418970e6dbf2e4d928a204c4e620da7bccdc0/torchvision/io/image.py#L16-L31

but our color transformations currently only support RGB images ignoring an extra alpha channel. This leads to wrong results. One thing that we agreed upon is that these transforms should fail if anything but 3 channels is detected.

Still, some datasets include non-RGB images so we need to deal with this for a smooth UX. Previously we implicitly converted every image to RGB before returning it from a dataset

https://github.com/pytorch/vision/blob/f9fbc104c02f277f9485d9f8727f3d99a1cf5f0b/torchvision/datasets/folder.py#L245-L249

Since we no longer decode images in the datasets, we need to provide a solution for the users here. I currently see two possible options:

  1. We could deal with this on a per-image basis within the dataset. For example, the train split of ImageNet contains a single RGBA image. We could simply perform an appropriate conversion for irregular image modes in the dataset so this issue is abstracted away from the user. tensorflow-datasets uses this approach: https://github.com/tensorflow/datasets/blob/a1caff379ed3164849fdefd147473f72a22d3fa7/tensorflow_datasets/image_classification/imagenet.py#L105-L131

  2. The most common non-RGB image in datasets are grayscale images. For example, the train split of ImageNet contains 19970 grayscale images. Thus, the users will need a transforms.ConvertImageColorSpace("rgb") in most cases anyway. If that would support RGBA to RGB conversions the problem would also be solved. The conversion happens with this formula:

    pixel_new = (1 - alpha) * background + alpha * pixel_old
    

    where pixel_{old|new} is a single value from a color channel. Since we don’t know background we need to either make assumptions or require the user to provide a value for it. I’d wager a guess that in 99% of the cases the background is white. i.e. background == 1, but we can’t be sure about that.

    Another issue with this is that the user has no option to set the background on a per-image basis in the transforms pipeline if that is needed.

    In special case for alpha == 1 everywhere, the equation above simplifies to

    pixel_new = pixel_old
    

    which is equivalent to stripping the alpha channel. We could check for that and only perform the RGBA to RGB transform if the condition holds or the user supplies a background color.

cc @pmeier @vfdev-5 @datumbox @bjuncek

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:14 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
pmeiercommented, Mar 8, 2022

After some offline discussion, we decided to align with PIL for now. The only difference should be that we should fail the transformation if the alpha channel is not the max value everywhere. This way we can implement the correct conversion as detailed in my top comment later without worrying about BC.

Read more comments on GitHub >

github_iconTop Results From Across the Web

RFC 2083 - IETF
Alpha channels can be included with images that have either 8 or 16 bits per sample, but not with images that have fewer...
Read more >
Everything You Need to Know About Alpha Channel | VFX
How to use Alpha Channel ? Alpha Channels is a topic that can be very confusing and hard to understand. This lesson reveals...
Read more >
What Is the Alpha Channel in PNG Images? - MakeUseOf
The alpha channel is a special channel that handles transparency. When an image has an alpha channel on it, it means you can...
Read more >
JSON function in Power Apps - Power Platform - Microsoft Learn
For the alpha channel, 00 is fully transparent, and ff is fully opaque. You can pass the string to the ColorValue function. "#102030ff"....
Read more >
Frequently Asked Questions | WebP - Google Developers
Why should I use animated WebP? · WebP images store metadata about whether each frame contains alpha, eliminating the need to decode the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found