new video reading API crash
See original GitHub issueš Describe the bug
I get malloc(): memory corruption
when running the following code with a video file.
reader = torchvision.io.VideoReader(path, num_threads=1)
data = next(reader)
print(data)
data = next(reader)
print(data)
data = next(reader)
print(data)
data = next(reader)
print(data)
Video metadata:
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/data/home/prabhatroy/data/output.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
encoder : Lavf58.45.100
Duration: 00:01:02.00, start: 0.000000, bitrate: 838 kb/s
Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 360x360 [SAR 1:1 DAR 1:1], 694 kb/s, 60 fps, 60 tbr, 16k tbn, 2k tbc (default)
Metadata:
handler_name : VideoHandler
Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 127 kb/s (default)
Metadata:
handler_name : SoundHandler
On debugging, it points at this line as the culprit: https://github.com/pytorch/vision/blob/0db67d857d612b8b5f196d1b9e1314d07b8a7a29/torchvision/csrc/io/video/video.cpp#L314
Versions
Collecting environment informationā¦ PyTorch version: 1.11.0.dev20220203+cu111 Is debug build: False CUDA used to build PyTorch: 11.1 ROCM used to build PyTorch: N/A
OS: Ubuntu 18.04.5 LTS (x86_64) GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 Clang version: 6.0.0-1ubuntu2 (tags/RELEASE_600/final) CMake version: version 3.20.4 Libc version: glibc-2.27
Python version: 3.8.12 (default, Oct 12 2021, 13:49:34) [GCC 7.5.0] (64-bit runtime) Python platform: Linux-5.4.0-1051-aws-x86_64-with-glibc2.17 Is CUDA available: False CUDA runtime version: 11.1.105 GPU models and configuration: Could not collect Nvidia driver version: Could not collect cuDNN version: Probably one of the following: /usr/local/cuda-10.1/targets/x86_64-linux/lib/libcudnn.so.7.6.5 /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudnn.so.7.6.5 /usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn.so.8.0.5 /usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.0.5 /usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.0.5 /usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.0.5 /usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.0.5 /usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.0.5 /usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.0.5 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn.so.8.0.5 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.0.5 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.0.5 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.0.5 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.0.5 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.0.5 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.0.5 HIP runtime version: N/A MIOpen runtime version: N/A
Versions of relevant libraries: [pip3] numpy==1.22.2 [pip3] torch==1.11.0.dev20220203+cu111 [pip3] torchvision==0.12.0a0+22f8dc4 [conda] numpy 1.22.2 pypi_0 pypi [conda] torch 1.11.0.dev20220203+cu111 pypi_0 pypi [conda] torchvision 0.12.0a0+22f8dc4 dev_0 <develop>
Issue Analytics
- State:
- Created 2 years ago
- Comments:11 (3 by maintainers)
Top GitHub Comments
Hi all, So Iāve dug quite a bit into this for the past two weeks (and am continuing to do so), and there are a few confusing factors. The error, to the best of my tracebacks is coming the fact that returned data from FFMPEG is larger than the allocated tensor (which should be guaranteed based on the headers), but there is some sort of a mismatch.
what I canāt seem to figure out is why that is happening. Iāve tried hi-res videos, didnāt have an issue, but then a video from #6204 does. The codec looks the same as some other videos, and it passes the ffprobe without an issue.
Iāve been getting some help from collegues at QS so hopefully will be able to get to the bottom of this.
Weāve had to disable ffmpeg support at conda-forge.
We can reliably recreate the a segfault that seems to occur during the video read tests.
Curiously, it doesnāt occur on python 3.9.
This occurs for CPU builds too, not just GPU.
Build logs can be followed https://github.com/conda-forge/torchvision-feedstock/pull/60