Beam search decoding and language model integration for Wav2Vec2ForCTC models
See original GitHub issue- AFAIK,
Wav2Vec2ForCTCTokenizer.decode
method only provides greedy decoding. Is there a Beamsearch implementation for CTC available yet? - Also, as it is a common norm in ASR modelling, language models are also generally added on top of the acoustic model. It would also be nice to have a possibility of appending a pretrained Language model which gets taken into consideration at the beamsearch decoding time. Not sure if there’s an out-of-box solution implemented for that yet?
I’m also aware of efforts to integrate a language model in #10794 and have had a look at the notebook here. Although it is a nice, simple way to integrate an LM, it is suboptimal when considering CTC semantics. A more appropriate approach would be the one described in this paper and explained in this distilpub blog. Would be great to have these features added (if they are already not there and I somehow missed them).
Issue Analytics
- State:
- Created 2 years ago
- Reactions:3
- Comments:13 (5 by maintainers)
Top Results From Across the Web
Boosting Wav2Vec2 with n-grams in Transformers
Decoding audio data with Wav2Vec2 and a language model ... process of Wav2Vec2ProcessorWithLM as applying beam search through a matrix of ...
Read more >arXiv:2203.11325v2 [cs.CL] 5 Apr 2022
decoding scheme such as beam search [4, 13, 6]. Additional gains are likely to be obtained when an external language model.
Read more >Towards better decoding and language model integration in ...
2.5 Language Model Integration. The simplest solution to include a separate language model is to extend the beam search cost with a language...
Read more >Boosting your Sequence Generation Performance with 'Beam ...
We will start with a greedy-search-decoding technique and introduce beam-search-decoding fused with language model to further improve the ...
Read more >torchaudio Changelog - pyup.io
Custom language model support for CTC beam search decoding - StreamWriter for audio and video encoding [Beta] Source Separation Models and Bundles
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I think we can try to add a dependency to wav2letter: https://github.com/flashlight/wav2letter and add LM decoding as explained here on fairseq: https://github.com/pytorch/fairseq/blob/master/examples/wav2vec/README.md#evaluating-a-ctc-model . It would be awesome if we manage to create a nice
run_wav2vec2_eval_with_lm.py
script that people can use out of the box with every wav2vec2 model. We can also make a nice blog post out of this and publish it on our blog 😃Hello @patrickvonplaten and @tanujjain,
I have already worked with prefix beam search decoding with language models for wav2vec2 and would like to implement it for huggingface, if you guys are okay with it.