question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Torchvision decode_jpeg memory leak

See original GitHub issue

🐛 Describe the bug

nvJPEG leaks memory and fails with OOM after ~1-2k images.

import torch
from torchvision.io import read_file, decode_jpeg

for i in range(1000): # increase to your liking till gpu OOMs (:
    img_u8 = read_file('lena.jpg')
    img_nv = decode_jpeg(img_u8, device='cuda')

Probably related to first response to https://github.com/pytorch/vision/issues/3848

RuntimeError: nvjpegDecode failed: 5

is exactly the message you get after OOM.

Versions

PyTorch version: 1.9.0+cu111 Is debug build: False CUDA used to build PyTorch: 11.1 ROCM used to build PyTorch: N/A

OS: Arch Linux (x86_64) GCC version: (GCC) 11.1.0 Clang version: 12.0.1 CMake version: version 3.21.1 Libc version: glibc-2.33

Python version: 3.8.7 (default, Jan 19 2021, 18:48:37) [GCC 10.2.0] (64-bit runtime) Python platform: Linux-5.13.8-arch1-1-x86_64-with-glibc2.2.5 Is CUDA available: True CUDA runtime version: 11.4.48 GPU models and configuration: GPU 0: NVIDIA GeForce RTX 2080 Ti GPU 1: NVIDIA GeForce RTX 2080 Ti GPU 2: NVIDIA GeForce GTX 1080

Nvidia driver version: 470.57.02 cuDNN version: Probably one of the following: /usr/lib/libcudnn.so.8.2.2 /usr/lib/libcudnn_adv_infer.so.8.2.2 /usr/lib/libcudnn_adv_train.so.8.2.2 /usr/lib/libcudnn_cnn_infer.so.8.2.2 /usr/lib/libcudnn_cnn_train.so.8.2.2 /usr/lib/libcudnn_ops_infer.so.8.2.2 /usr/lib/libcudnn_ops_train.so.8.2.2 HIP runtime version: N/A MIOpen runtime version: N/A

Versions of relevant libraries: [pip3] adabelief-pytorch==0.2.0 [pip3] mypy-extensions==0.4.3 [pip3] numpy==1.19.5 [pip3] pytorch-lightning==1.4.5 [pip3] torch==1.9.0+cu111 [pip3] torchaudio==0.9.0 [pip3] torchfile==0.1.0 [pip3] torchmetrics==0.4.1 [pip3] torchvision==0.10.0+cu111 [conda] Could not collect

Issue Analytics

  • State:open
  • Created 2 years ago
  • Reactions:2
  • Comments:24 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
dschoerkcommented, Jun 22, 2022

I just checked if this was fixed in pytorch nightly with cuda 11.6, but i’m still experiencing a memory leak.

python -m pip install torch torchvision --pre --extra-index-url https://download.pytorch.org/whl/nightly/cu116

1reaction
Kubcicommented, Jun 3, 2022

Hi,

I am using: pytorch 1.11.0+cu113 ubuntu 20.04 LTS python 3.9

I did replace libnvjpeg.90286a3c.so.11 with .so from cuda 11.6.2. However the memory keeps growing indefinitely. image

Read more comments on GitHub >

github_iconTop Results From Across the Web

decode_jpeg — Torchvision main documentation - PyTorch
There is a memory leak in the nvjpeg library for CUDA versions < 11.6. Make sure to rely on CUDA 11.6 or above...
Read more >
Feature extraction in loop seems to cause memory leak in ...
You may try with any image named "source_image.bmp" to replicate the issue. import torch from PIL import Image import torchvision from ...
Read more >
Is There Memory Leak with LSTM in PyTorch 1.5+ on CPU?
print('memory usage init {} MB'.format(py.memory_info()[0]/2.**20)) ... pip install torch==1.6.0+cpu torchvision==0.7.0+cpu -f ...
Read more >
dwm.exe (Desktop Window Manager) Produces a Memory ...
Describes an issue where drivers newer than 24.20.100.6290 are causing dwm.exe (Desktop Windows Manager) to produce memory leaks.
Read more >
RuntimeError: DataLoader worker is killed by signal - fastai
Direct usage of torch.DataLoader classes, the transformations are taken from the torchvision; num_workers=12. Here are a memory usage plots:.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found