Allow automatic creation of subtitles through AutoSub, served as WebVTT files
See original GitHub issueAn important feature to support is captions, this is especially important for accessibility for those who cannot hear, as well as for non-native English speakers to be able to understand content better. Also, it’s offered by YouTube and people should not have to feel like they are sacrificing features to use Odysee and we should aim for feature parity or even having additional features when compared to YouTube.
An issue is created for supporting captioning here: https://github.com/lbryio/lbry-desktop/issues/2325
But I will make this ticket only to cover automatically generated captions, with the ability for people to upload their own captions during the upload process to be implemented as a separate ticket.
I tested out the AutoSub module, which is a CLI which integrates open-source Mozilla DeepSpeech for the speech to text functionality, and then through some clever programming is able to correspond that text with the proper timestamps, and it works actually quite well out of the box.
https://github.com/abhirooptalasila/AutoSub
They say they have the capability to output in WebVTT automatically, which I wasn’t able to get working with a first attempt, but regardless .srt and .vtt formats are very similar so to convert between the two is trivial and there are a lot of packages that allow that to be carried out.
Once the .vtt file is created, it is trivial to serve it via videojs by adding to this line:
ui/component/viewers/videoViewer/internal/videojs.jsx:220
Something along the lines of
tracks: [{src: 'https://servestatic.tv/mysub.vtt', kind:'captions', srclang: 'en', label: 'English'}]
I implemented and tested this functionality and was quite impressed with how well Autosub worked. You can see it even properly transcribed the word ‘prophylactic’. It was similar to what would be expected from YouTube so I would say that Autosub would work well enough out of the box to ship.
AutoSub is also built on top of Mozilla’s Deep Speech which although I used against a model trained on an English speaking dataset, there are also models for different languages so we would be able to use those as well though I never tested using a non-English dataset myself. Although, I believe most of the content and viewers are English speaking so this could probably cover in Pareto distribution style maybe 80% of content creators/users right out of the gate. Would also be a great way to begin supporting captioning, at which point the ability for users to upload their custom captions during the upload process could be supported as well.
Issue Analytics
- State:
- Created 2 years ago
- Comments:7 (4 by maintainers)
Top GitHub Comments
You have the option to use a paid 3rd party API. But if you don’t, you can use the free version of Google speech v2 to create the subtitles.
https://github.com/BingLingGroup/autosub#google-speech-v2
All I had to do was:
autosub -i file.mp4 -S en-US
Issue moved to OdyseeTeam/odysee-frontend #165 via ZenHub