Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Question on Wav2Vec2 replication using words instead of letters and how/why lexicon?

See original GitHub issue

❓ Questions and Help

What is your question?

Firstly, just want to say thank you for making all of the Wav2Vec2 resources and help available, it has made it much easier to replicate your original paper. I have spent the better part of a month replicating everything and going through all the source code in fairseq to get a full understanding of how it all works in relation to the paper. Unfortunately, a few key things are left very much unclear and I was hoping someone could please help me out.

All of the examples and config files (even the pretrained models available for download) all seem to be setup for using letter-based labels in the audio_pretraining task. But is it at all recommended/feasible to train on a word-based label approach instead? Your dictionary would be enormous by contrast, for LibriSpeech, let’s imagine a word vocab size of 50000. While pretraining and even finetuning should be straightforward enough, I imagine the inference (using examples/speech_recognition/infer.py) would be problematic, as Viterbi could not be used (since the state space is now 50000, and it’s O(50000^2)), and while you could probably use the 4-gram KenLM arpa from LibriSpeech’s website (since it’s trained on words, not letters), I’m confused what you would use for a lexicon file in the word case?

I also wanted to ask about KenLM and what exactly the lexicon file is accomplishing and what are the correct ways to generate it. Unfortunately, I can find no documentation anywhere about the lexicon file and how/why it’s necessary, or what is a unit KenLM model (or how you create one) such that you don’t need a lexicon file. The closest thing I have found on how to correctly generate an appropriate lexicon file from a KenLM arpa is this script: https://github.com/facebookresearch/wav2letter/blob/master/recipes/utilities/prepare_librispeech_official_lm.py

And looking at the lexicon file generated, it is of the form:

EVERY E V E R Y |
WORD W O R D |
THAT T H A T |
EXISTS E X I S T S |
IN I N |
YOUR Y O U R |
LABEL L A B E L |
OR O R |
TRANSCRIPTION T R A N S C R I P T I O N |
FILE F I L E |
WILL W I L L |
WRITE W R I T E |
DOWN D O W N |
LIKE L I K E |
THIS T H I S |

So it appears to be some kind of a mapping from words to letters, such that the decoding for letter-based labels in wav2vec2 allows you to utilise the n-gram words but map them back to their letters? In which case, how should the lexicon file look like if you are doing word-based labels instead of letter-based labels? And finally, what is a unit KenLM model, such that you can use the LexiconFreeDecoder, and how do you create such a unit model using the KenLM binaries the the original text corpus?

Finally, how can one utilise the pretrained, LibriSpeech LM Transformer model available for download in the Wav2Letter repo, to decode in infer.py instead of the KenLM model? I ask because that LM Transformer model is a word-based model, and the corresponding dict file has 221k words in it. Does that mean in order to use the LM Transformer model, you need to setup your wav2vec2 finetuning to use word labels instead of letter labels (since the transformer is a word-based model)? And I can see in the W2lFairseqLMDecoder code that the Transformer LM also takes a lexicon file – what does that lexicon file need to look like and how can I generate it? I am assuming it is different in both format and generation from the lexicon file used by KenLM models?

Would greatly appreciate any help.

What have you tried?

What’s your environment?

fairseq Version (e.g., 1.0 or master): Master
PyTorch Version (e.g., 1.0): 1.7.1
OS (e.g., Linux): Linux Ubuntu 16.04
How you installed fairseq (pip, source): Source
Build command you used (if compiling from source): python setup.py bdist_wheel
Python version: 3.6.12
CUDA/cuDNN version: 10.2 / 7.6.5
GPU models and configuration: 4x V100s
Any other relevant information:

Issue Analytics

State:
Created 3 years ago
Comments:6 (3 by maintainers)

Top GitHub Comments

2reactions

alexeibcommented, Jan 27, 2021

that for sure is possible, take a look here: https://hydra.cc/docs/next/tutorials/basic/running_your_app/working_directory/

for sweeps (i.e. running with -m flag) you can set hydra.sweep.dir (and hydra.sweep.subdir)

re: hydra training log being empty - that is an issue with hydra for which I’ve implemented a workaround recently. if you grab the latest code it should now properly log to that file

1reaction

alexeibcommented, Jan 26, 2021

wav2letter will use whatever you put in your lexicon file to look up in the lm. so it is as you say, if you have lowercase word lm then the first column in the lexicon file should be lowercase. and also as you say, the second column should be using the units from the ctc model

Top Results From Across the Web

Question on Wav2Vec2 replication using words instead of ...

Does that mean in order to use the LM Transformer model, you need to setup your wav2vec2 finetuning to use word labels instead...

Speech Perception, Word Recognition and the Structure of the ...

This paper reports the results of three projects concerned with auditory word recognition and the structure of the lexicon. The first project was...

31 Synonyms & Antonyms of REPLICATE - Merriam-Webster

Synonyms for REPLICATE: reproduce, copy, render, imitate, duplicate, clone, reconstruct, simulate; Antonyms of REPLICATE: create, originate, imagine, ...

Wav2Vec2 - Hugging Face

Overview. The Wav2Vec2 model was proposed in wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations by Alexei Baevski, Henry Zhou, ...

107 Synonyms & Antonyms for QUESTION - Thesaurus.com

Find 107 ways to say QUESTION, along with antonyms, related words, and example sentences at Thesaurus.com, ... See definition of question on Dictionary.com....