Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

new video reading API crash

See original GitHub issue

🐛 Describe the bug

I get malloc(): memory corruption when running the following code with a video file.

reader = torchvision.io.VideoReader(path, num_threads=1)
data = next(reader)
print(data)
data = next(reader)
print(data)
data = next(reader)
print(data)
data = next(reader)
print(data)

Video metadata:

Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/data/home/prabhatroy/data/output.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf58.45.100
  Duration: 00:01:02.00, start: 0.000000, bitrate: 838 kb/s
    Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 360x360 [SAR 1:1 DAR 1:1], 694 kb/s, 60 fps, 60 tbr, 16k tbn, 2k tbc (default)
    Metadata:
      handler_name    : VideoHandler
    Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 127 kb/s (default)
    Metadata:
      handler_name    : SoundHandler

On debugging, it points at this line as the culprit: https://github.com/pytorch/vision/blob/0db67d857d612b8b5f196d1b9e1314d07b8a7a29/torchvision/csrc/io/video/video.cpp#L314

Versions

Collecting environment information… PyTorch version: 1.11.0.dev20220203+cu111 Is debug build: False CUDA used to build PyTorch: 11.1 ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.5 LTS (x86_64) GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 Clang version: 6.0.0-1ubuntu2 (tags/RELEASE_600/final) CMake version: version 3.20.4 Libc version: glibc-2.27

Python version: 3.8.12 (default, Oct 12 2021, 13:49:34) [GCC 7.5.0] (64-bit runtime) Python platform: Linux-5.4.0-1051-aws-x86_64-with-glibc2.17 Is CUDA available: False CUDA runtime version: 11.1.105 GPU models and configuration: Could not collect Nvidia driver version: Could not collect cuDNN version: Probably one of the following: /usr/local/cuda-10.1/targets/x86_64-linux/lib/libcudnn.so.7.6.5 /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudnn.so.7.6.5 /usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn.so.8.0.5 /usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.0.5 /usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.0.5 /usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.0.5 /usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.0.5 /usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.0.5 /usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.0.5 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn.so.8.0.5 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.0.5 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.0.5 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.0.5 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.0.5 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.0.5 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.0.5 HIP runtime version: N/A MIOpen runtime version: N/A

Versions of relevant libraries: [pip3] numpy==1.22.2 [pip3] torch==1.11.0.dev20220203+cu111 [pip3] torchvision==0.12.0a0+22f8dc4 [conda] numpy 1.22.2 pypi_0 pypi [conda] torch 1.11.0.dev20220203+cu111 pypi_0 pypi [conda] torchvision 0.12.0a0+22f8dc4 dev_0 <develop>

Issue Analytics

State:
Created 2 years ago
Comments:11 (3 by maintainers)

Top GitHub Comments

5reactions

bjuncekcommented, Jul 22, 2022

Hi all, So I’ve dug quite a bit into this for the past two weeks (and am continuing to do so), and there are a few confusing factors. The error, to the best of my tracebacks is coming the fact that returned data from FFMPEG is larger than the allocated tensor (which should be guaranteed based on the headers), but there is some sort of a mismatch.

what I can’t seem to figure out is why that is happening. I’ve tried hi-res videos, didn’t have an issue, but then a video from #6204 does. The codec looks the same as some other videos, and it passes the ffprobe without an issue.

I’ve been getting some help from collegues at QS so hopefully will be able to get to the bottom of this.

1reaction

hmaarrfkcommented, Jul 25, 2022

We’ve had to disable ffmpeg support at conda-forge.

We can reliably recreate the a segfault that seems to occur during the video read tests.

Curiously, it doesn’t occur on python 3.9.

This occurs for CPU builds too, not just GPU.

Build logs can be followed https://github.com/conda-forge/torchvision-feedstock/pull/60