Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Evaluating Wav2vec 2.0 with Transformer LM

See original GitHub issue

What is your question? How to reproduce the WER improvement obtained by using the proposed Transformer LM instead of Viterbi ?

What have you tried? Files used:

Letter dictionary: here from https://github.com/pytorch/fairseq/blob/master/examples/wav2vec/README.md
Wav2vec model: here from https://github.com/pytorch/fairseq/blob/master/examples/wav2vec/README.md
Transformer LM: here from https://github.com/facebookresearch/wav2letter/tree/master/recipes/sota/2019
LM dict: here + upper-case processing from https://github.com/facebookresearch/wav2letter/tree/master/recipes/sota/2019 (dict.txt placed in the same directory than lm_librispeech_word_transformer.pt)

head -3 dict.txt
THE 49059384
AND 26362574
OF 24795903

Command used: python examples/speech_recognition/infer.py /path/to/librispeech --task audio_pretraining --nbest 1 --path /path/to/wav2vec2_vox_960h.pt --gen-subset dev_clean --results-path outputdir --w2l-decoder fairseqlm --lm-model /path/to/lm_librispeech_word_transformer.pt --lm-weight 2 --word-score -1 --sil-weight 0 --criterion ctc --labels ltr --max-tokens 4000000

This produces a WER > 50 while using Viterbi gives ~2 WER. When using this lexicon file (from https://github.com/pytorch/fairseq/issues/2734) by adding this argument --lexicon /path/to/librispeech_lexicon.lst, I get a ~6 WER.

What’s your environment?

fairseq 0.10.0 (latest stable release)
wav2letter branch v0.2 for python bindings + patch from this issue https://github.com/facebookresearch/wav2letter/issues/775 (otherwise imports from w2l_decoder.py will fail due to missing LexiconFreeDecoder)

I don’t know what I did wrong. Thank you fro your answer !

Issue Analytics

State:
Created 3 years ago
Reactions:8
Comments:5

Top GitHub Comments

1reaction

Guruprasad68commented, May 18, 2021

Hi, To recreate the results, I noticed that LM weight, word insertion penalty and beam size also play an important role. They have used a variety of values based on the finetune data, transformer/Ken LM and the set they are decoding. Please refer to the paper, the ablations section will have the values they used for different experiments. I had followed this for the 1hr BASE finetuned model with both 4-gram KENLM and transformer LM I got their results. With LM weight 2 and Word Insertion Penalty, I was getting around 10-15% more .

0reactions

stale[bot]commented, Apr 29, 2022

Closing this issue after a prolonged period of inactivity. If this issue is still present in the latest release, please create a new issue with up-to-date information. Thank you!