question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. ItĀ collects links to all the places you might be looking at while hunting down a tough bug.

And, if youā€™re still stuck at the end, weā€™re happy to hop on a call to see how we can help out.

new video reading API crash

See original GitHub issue

šŸ› Describe the bug

I get malloc(): memory corruption when running the following code with a video file.

reader = torchvision.io.VideoReader(path, num_threads=1)
data = next(reader)
print(data)
data = next(reader)
print(data)
data = next(reader)
print(data)
data = next(reader)
print(data)

Video metadata:

Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/data/home/prabhatroy/data/output.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf58.45.100
  Duration: 00:01:02.00, start: 0.000000, bitrate: 838 kb/s
    Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 360x360 [SAR 1:1 DAR 1:1], 694 kb/s, 60 fps, 60 tbr, 16k tbn, 2k tbc (default)
    Metadata:
      handler_name    : VideoHandler
    Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 127 kb/s (default)
    Metadata:
      handler_name    : SoundHandler

On debugging, it points at this line as the culprit: https://github.com/pytorch/vision/blob/0db67d857d612b8b5f196d1b9e1314d07b8a7a29/torchvision/csrc/io/video/video.cpp#L314

Versions

Collecting environment informationā€¦ PyTorch version: 1.11.0.dev20220203+cu111 Is debug build: False CUDA used to build PyTorch: 11.1 ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.5 LTS (x86_64) GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 Clang version: 6.0.0-1ubuntu2 (tags/RELEASE_600/final) CMake version: version 3.20.4 Libc version: glibc-2.27

Python version: 3.8.12 (default, Oct 12 2021, 13:49:34) [GCC 7.5.0] (64-bit runtime) Python platform: Linux-5.4.0-1051-aws-x86_64-with-glibc2.17 Is CUDA available: False CUDA runtime version: 11.1.105 GPU models and configuration: Could not collect Nvidia driver version: Could not collect cuDNN version: Probably one of the following: /usr/local/cuda-10.1/targets/x86_64-linux/lib/libcudnn.so.7.6.5 /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudnn.so.7.6.5 /usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn.so.8.0.5 /usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.0.5 /usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.0.5 /usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.0.5 /usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.0.5 /usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.0.5 /usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.0.5 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn.so.8.0.5 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.0.5 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.0.5 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.0.5 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.0.5 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.0.5 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.0.5 HIP runtime version: N/A MIOpen runtime version: N/A

Versions of relevant libraries: [pip3] numpy==1.22.2 [pip3] torch==1.11.0.dev20220203+cu111 [pip3] torchvision==0.12.0a0+22f8dc4 [conda] numpy 1.22.2 pypi_0 pypi [conda] torch 1.11.0.dev20220203+cu111 pypi_0 pypi [conda] torchvision 0.12.0a0+22f8dc4 dev_0 <develop>

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:11 (3 by maintainers)

github_iconTop GitHub Comments

5reactions
bjuncekcommented, Jul 22, 2022

Hi all, So Iā€™ve dug quite a bit into this for the past two weeks (and am continuing to do so), and there are a few confusing factors. The error, to the best of my tracebacks is coming the fact that returned data from FFMPEG is larger than the allocated tensor (which should be guaranteed based on the headers), but there is some sort of a mismatch.

what I canā€™t seem to figure out is why that is happening. Iā€™ve tried hi-res videos, didnā€™t have an issue, but then a video from #6204 does. The codec looks the same as some other videos, and it passes the ffprobe without an issue.

Iā€™ve been getting some help from collegues at QS so hopefully will be able to get to the bottom of this.

1reaction
hmaarrfkcommented, Jul 25, 2022

Weā€™ve had to disable ffmpeg support at conda-forge.

We can reliably recreate the a segfault that seems to occur during the video read tests.

Curiously, it doesnā€™t occur on python 3.9.

This occurs for CPU builds too, not just GPU.

Build logs can be followed https://github.com/conda-forge/torchvision-feedstock/pull/60

Read more comments on GitHub >

github_iconTop Results From Across the Web

crashes when an unsupported video resolution is requested ...
Issue 541232: GetUserMedia video capture: crashes when an unsupported video resolution is requested through Camera2 API. Reported by schedule simon.
Read more >
Crash when using YouTube Data API - android - Stack Overflow
My application's using YouTube Data API v3 to play YouTube video. I have an Activity VideoPlayerActivity that playing Youtube video usingĀ ...
Read more >
Diagnosing issues using crash reports and device logs
To view the latest developer news, visit News and Updates. Light. Dark. Auto. Copyright Ā© 2022Ā ...
Read more >
Crash course in API documentation -- a one-hour video
If you want a condensed, one-hour version of what I cover in my API documentation workshop, check out this crash-course video.
Read more >
Check a crashed app for errors by using Error Reporting
Set up notifications so you'll know when new types of errors occur. To follow step-by-step guidance for this task directly in the Google...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found