torchvision.io.read_image return tensor shape is different.
See original GitHub issue🐛 Bug
torchvision.io.read_image return tensor shape is different with [3, width, height] on the document when reading the grayscale or RGBA image. It returns [1, width, height] or [4, width, height].
https://pytorch.org/docs/stable/torchvision/io.html#torchvision.io.read_image
To Reproduce
Steps to reproduce the behavior:
>>> img = torchvision.io.read_image(<grayscale image>)
>>> img.shape
(1, 123, 123)
>>> img = torchvision.io.read_image(<RGBA image>)
>>> img.shape
(4, 123, 123)
Expected behavior
>>> img = torchvision.io.read_image(<grayscale image>)
>>> img.shape
(3, 123, 123)
>>> img = torchvision.io.read_image(<RGBA image>)
>>> img.shape
(3, 123, 123)
Environment
PyTorch version: 1.7.1 Is debug build: False CUDA used to build PyTorch: 10.2 ROCM used to build PyTorch: N/A
OS: Ubuntu 18.04.3 LTS (x86_64) GCC version: (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 Clang version: Could not collect CMake version: version 3.10.2
Python version: 3.7 (64-bit runtime) Is CUDA available: True CUDA runtime version: Could not collect GPU models and configuration: GPU 0: GeForce GTX 1080 Ti Nvidia driver version: 440.100 cuDNN version: /usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.5 HIP runtime version: N/A MIOpen runtime version: N/A
Versions of relevant libraries:
[pip] numpy==1.19.4
[pip] torch==1.7.1
[pip] torchaudio==0.7.0a0+a853dff
[pip] torchvision==0.8.2
[conda] blas 1.0 mkl
[conda] cudatoolkit 10.2.89 hfd86e86_1
[conda] mkl 2020.0 166
[conda] mkl-service 2.3.0 py37he904b0f_0
[conda] mkl_fft 1.0.15 py37ha843d7b_0
[conda] mkl_random 1.1.0 py37hd6b4f25_0
[conda] numpy 1.19.4 pypi_0 pypi
[conda] pytorch 1.7.1 py3.7_cuda10.2.89_cudnn7.6.5_0 pytorch
[conda] torchaudio 0.7.2 py37 pytorch
[conda] torchvision 0.8.2 py37_cu102 pytorch
Issue Analytics
- State:
- Created 3 years ago
- Comments:13 (5 by maintainers)
This works for images that are grayscale; but I have RGB images where the actual channels are important and replicating the information across all channels is not desired behavior.
It blows the mind that defaulting to single-channel image reading was ever implemented in the first place. I suspect this probably means I cannot use torch for my use case.
set the Args ‘mode=ImageReadMode.RGB’ can change output to [3, width, height] class ImageReadMode directly controls it more infomation can be see in ‘https://github.com/pytorch/vision/blob/master/torchvision/io/image.py#L234-L248’ I meet this question today and find this link in the first place I think comment here maybe useful for later viewers