question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Evaluating Wav2vec 2.0 with Transformer LM

See original GitHub issue

What is your question? How to reproduce the WER improvement obtained by using the proposed Transformer LM instead of Viterbi ?

What have you tried? Files used:

head -3 dict.txt
THE 49059384
AND 26362574
OF 24795903

Command used: python examples/speech_recognition/infer.py /path/to/librispeech --task audio_pretraining --nbest 1 --path /path/to/wav2vec2_vox_960h.pt --gen-subset dev_clean --results-path outputdir --w2l-decoder fairseqlm --lm-model /path/to/lm_librispeech_word_transformer.pt --lm-weight 2 --word-score -1 --sil-weight 0 --criterion ctc --labels ltr --max-tokens 4000000

This produces a WER > 50 while using Viterbi gives ~2 WER. When using this lexicon file (from https://github.com/pytorch/fairseq/issues/2734) by adding this argument --lexicon /path/to/librispeech_lexicon.lst, I get a ~6 WER.

What’s your environment?

I don’t know what I did wrong. Thank you fro your answer !

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:8
  • Comments:5

github_iconTop GitHub Comments

1reaction
Guruprasad68commented, May 18, 2021

Hi, To recreate the results, I noticed that LM weight, word insertion penalty and beam size also play an important role. They have used a variety of values based on the finetune data, transformer/Ken LM and the set they are decoding. Please refer to the paper, the ablations section will have the values they used for different experiments. I had followed this for the 1hr BASE finetuned model with both 4-gram KENLM and transformer LM I got their results. With LM weight 2 and Word Insertion Penalty, I was getting around 10-15% more .

0reactions
stale[bot]commented, Apr 29, 2022

Closing this issue after a prolonged period of inactivity. If this issue is still present in the latest release, please create a new issue with up-to-date information. Thank you!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Evaluating Wav2vec 2.0 with Transformer LM · Issue #2977
What is your question? How to reproduce the WER improvement obtained by using the proposed Transformer LM instead of Viterbi ?
Read more >
Boosting Wav2Vec2 with n-grams in Transformers
An Illustrated Tour of Wav2vec 2.0. 1. Decoding audio data with Wav2Vec2 and a language model. As shown in Transformers exemple docs ...
Read more >
How to use Transformer LM with Wav2Vec 2.0 for decoding?
Has anyone been able to get Transformer LM to work with Wav2Vec2.0 fine-tuned model for decoding yet (based on the evaluation command in ......
Read more >
Inference with Wav2vec 2.0 - Medium
Both the n-gram LM and the transformer LM are capable of evaluating the likelihood of a sentence. What is distributed inference? In our...
Read more >
Wav2Vec2.0 on the Edge: Performance Evaluation | DeepAI
Self-supervised Transformer based models, such as wav2vec 2.0 and ... the performance of base Wav2Vec models is evaluated on Raspberry Pi ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found