question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Beam search decoding and language model integration for Wav2Vec2ForCTC models

See original GitHub issue
  1. AFAIK, Wav2Vec2ForCTCTokenizer.decode method only provides greedy decoding. Is there a Beamsearch implementation for CTC available yet?
  2. Also, as it is a common norm in ASR modelling, language models are also generally added on top of the acoustic model. It would also be nice to have a possibility of appending a pretrained Language model which gets taken into consideration at the beamsearch decoding time. Not sure if there’s an out-of-box solution implemented for that yet?

I’m also aware of efforts to integrate a language model in #10794 and have had a look at the notebook here. Although it is a nice, simple way to integrate an LM, it is suboptimal when considering CTC semantics. A more appropriate approach would be the one described in this paper and explained in this distilpub blog. Would be great to have these features added (if they are already not there and I somehow missed them).

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:3
  • Comments:13 (5 by maintainers)

github_iconTop GitHub Comments

3reactions
patrickvonplatencommented, May 3, 2021

I think we can try to add a dependency to wav2letter: https://github.com/flashlight/wav2letter and add LM decoding as explained here on fairseq: https://github.com/pytorch/fairseq/blob/master/examples/wav2vec/README.md#evaluating-a-ctc-model . It would be awesome if we manage to create a nice run_wav2vec2_eval_with_lm.py script that people can use out of the box with every wav2vec2 model. We can also make a nice blog post out of this and publish it on our blog 😃

3reactions
deepang17commented, Apr 28, 2021

Hello @patrickvonplaten and @tanujjain,

I have already worked with prefix beam search decoding with language models for wav2vec2 and would like to implement it for huggingface, if you guys are okay with it.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Boosting Wav2Vec2 with n-grams in Transformers
Decoding audio data with Wav2Vec2 and a language model ... process of Wav2Vec2ProcessorWithLM as applying beam search through a matrix of ...
Read more >
arXiv:2203.11325v2 [cs.CL] 5 Apr 2022
decoding scheme such as beam search [4, 13, 6]. Additional gains are likely to be obtained when an external language model.
Read more >
Towards better decoding and language model integration in ...
2.5 Language Model Integration. The simplest solution to include a separate language model is to extend the beam search cost with a language...
Read more >
Boosting your Sequence Generation Performance with 'Beam ...
We will start with a greedy-search-decoding technique and introduce beam-search-decoding fused with language model to further improve the ...
Read more >
torchaudio Changelog - pyup.io
Custom language model support for CTC beam search decoding - StreamWriter for audio and video encoding [Beta] Source Separation Models and Bundles
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found