Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add video feature

See original GitHub issue

Feature request

Add a Video feature to the library so folks can include videos in their datasets.

Motivation

Being able to load Video data would be quite helpful. However, there are some challenges when it comes to videos:

Videos, unlike images, can end up being extremely large files
Often times when training video models, you need to do some very specific sampling. Videos might end up needing to be broken down into X number of clips used for training/inference
Videos have an additional audio stream, which must be accounted for
The feature needs to be able to encode/decode videos (with right video settings) from bytes.

Your contribution

I did work on this a while back in this (now closed) PR. It used a library I made called encoded_video, which is basically the utils from pytorchvideo, but without the torch dep. It included the ability to read/write from bytes, as we need to do here. We don’t want to be using a sketchy library that I made as a dependency in this repo, though.

Would love to use this issue as a place to:

brainstorm ideas on how to do this right
list ways/examples to work around it for now

CC @sayakpaul @mariosasko @fcakyon

Issue Analytics

State:
Created 10 months ago
Reactions:3
Comments:7 (6 by maintainers)

Top GitHub Comments

3reactions

mariosaskocommented, Nov 29, 2022

@NielsRogge @rwightman may have additional requirements regarding this feature.

When adding a new (decodable) type, the hardest part is choosing the right decoding library. What I mean by “right” here is that it has all the features we need and is easy to install (with GPU support?).

Some candidates/options:

decord: no longer maintained, not trivial to install with GPU support
pyAV: used for CPU decoding in torchvision, GPU decoding not supported if I’m not mistaken, otherwise the best candidate probably
video_reader: used for GPU decoding in torchvision, depends on `torch’
OpenCV: uses ffmpeg for video decoding under the hood
…

And the last resort is building our own library, which is the most flexible solution but also requires the most work.

PS: I’m adding a link to an article that compares various video decoding libraries: https://towardsdatascience.com/lightning-fast-video-reading-in-python-c1438771c4e6

2reactions

sayakpaulcommented, Nov 30, 2022

For standalone usage, decoding on GPU could be ideal but isn’t async processing of inputs on CPUs while letting the accelerator busy for training the de-facto? Of course, I am aware of other advanced mechanisms such as CPU offloading, but I think my point is conveyed.