Torchvision decode_jpeg memory leak
See original GitHub issue🐛 Describe the bug
nvJPEG leaks memory and fails with OOM after ~1-2k images.
import torch
from torchvision.io import read_file, decode_jpeg
for i in range(1000): # increase to your liking till gpu OOMs (:
img_u8 = read_file('lena.jpg')
img_nv = decode_jpeg(img_u8, device='cuda')
Probably related to first response to https://github.com/pytorch/vision/issues/3848
RuntimeError: nvjpegDecode failed: 5
is exactly the message you get after OOM.
Versions
PyTorch version: 1.9.0+cu111 Is debug build: False CUDA used to build PyTorch: 11.1 ROCM used to build PyTorch: N/A
OS: Arch Linux (x86_64) GCC version: (GCC) 11.1.0 Clang version: 12.0.1 CMake version: version 3.21.1 Libc version: glibc-2.33
Python version: 3.8.7 (default, Jan 19 2021, 18:48:37) [GCC 10.2.0] (64-bit runtime) Python platform: Linux-5.13.8-arch1-1-x86_64-with-glibc2.2.5 Is CUDA available: True CUDA runtime version: 11.4.48 GPU models and configuration: GPU 0: NVIDIA GeForce RTX 2080 Ti GPU 1: NVIDIA GeForce RTX 2080 Ti GPU 2: NVIDIA GeForce GTX 1080
Nvidia driver version: 470.57.02 cuDNN version: Probably one of the following: /usr/lib/libcudnn.so.8.2.2 /usr/lib/libcudnn_adv_infer.so.8.2.2 /usr/lib/libcudnn_adv_train.so.8.2.2 /usr/lib/libcudnn_cnn_infer.so.8.2.2 /usr/lib/libcudnn_cnn_train.so.8.2.2 /usr/lib/libcudnn_ops_infer.so.8.2.2 /usr/lib/libcudnn_ops_train.so.8.2.2 HIP runtime version: N/A MIOpen runtime version: N/A
Versions of relevant libraries: [pip3] adabelief-pytorch==0.2.0 [pip3] mypy-extensions==0.4.3 [pip3] numpy==1.19.5 [pip3] pytorch-lightning==1.4.5 [pip3] torch==1.9.0+cu111 [pip3] torchaudio==0.9.0 [pip3] torchfile==0.1.0 [pip3] torchmetrics==0.4.1 [pip3] torchvision==0.10.0+cu111 [conda] Could not collect
Issue Analytics
- State:
- Created 2 years ago
- Reactions:2
- Comments:24 (6 by maintainers)
Top GitHub Comments
I just checked if this was fixed in pytorch nightly with cuda 11.6, but i’m still experiencing a memory leak.
python -m pip install torch torchvision --pre --extra-index-url https://download.pytorch.org/whl/nightly/cu116
Hi,
I am using: pytorch 1.11.0+cu113 ubuntu 20.04 LTS python 3.9
I did replace libnvjpeg.90286a3c.so.11 with .so from cuda 11.6.2. However the memory keeps growing indefinitely.