wav2vec2 with freeze=False and facebook/wav2vec2-base-960h model do not learn
See original GitHub issueHi,
I am trying to use this recipe speechbrain/recipes/LibriSpeech/ASR/CTC
with training data of LibriSpeech-360h and test data LS-test-clean with just following changes:
wav2vec2_hub: facebook/wav2vec2-base-960h
dnn_neurons: 768
freeze_wav2vec: False
with such configuration the model do not learn, the loss remains consistent and also WER. Please have a look at following train_log.
epoch: 1, lr_model: 5.00e-01, lr_wav2vec: 1.00e-04 - train loss: 2.88 - valid loss: 2.94, valid CER: 1.00e+02, valid WER: 1.00e+02
epoch: 2, lr_model: 5.00e-01, lr_wav2vec: 1.00e-04 - train loss: 2.87 - valid loss: 2.93, valid CER: 1.00e+02, valid WER: 1.00e+02
epoch: 3, lr_model: 4.00e-01, lr_wav2vec: 9.00e-05 - train loss: 2.87 - valid loss: 2.93, valid CER: 1.00e+02, valid WER: 1.00e+02
epoch: 4, lr_model: 3.20e-01, lr_wav2vec: 8.10e-05 - train loss: 2.88 - valid loss: 2.93, valid CER: 1.00e+02, valid WER: 1.00e+02
epoch: 5, lr_model: 2.56e-01, lr_wav2vec: 7.29e-05 - train loss: 2.88 - valid loss: 2.93, valid CER: 1.00e+02, valid WER: 1.00e+02
epoch: 6, lr_model: 2.05e-01, lr_wav2vec: 6.56e-05 - train loss: 2.88 - valid loss: 2.93, valid CER: 1.00e+02, valid WER: 1.00e+02
the same work for
freeze_wav2vec: True
But WER fluctuates.
epoch: 1, lr_model: 9.00e-01, lr_wav2vec: 1.00e-04 - train loss: 1.26e-01 - valid loss: 7.98e-02, valid CER: 2.22, valid WER: 14.62
epoch: 2, lr_model: 9.00e-01, lr_wav2vec: 1.00e-04 - train loss: 1.02e-01 - valid loss: 8.03e-02, valid CER: 2.00, valid WER: 11.95
epoch: 3, lr_model: 7.20e-01, lr_wav2vec: 9.00e-05 - train loss: 9.78e-02 - valid loss: 9.00e-02, valid CER: 2.82, valid WER: 21.35
epoch: 4, lr_model: 5.76e-01, lr_wav2vec: 8.10e-05 - train loss: 9.66e-02 - valid loss: 8.10e-02, valid CER: 1.98, valid WER: 12.73
epoch: 5, lr_model: 5.76e-01, lr_wav2vec: 8.10e-05 - train loss: 9.60e-02 - valid loss: 7.99e-02, valid CER: 2.37, valid WER: 17.03
epoch: 6, lr_model: 5.76e-01, lr_wav2vec: 8.10e-05 - train loss: 9.57e-02 - valid loss: 8.02e-02, valid CER: 2.10, valid WER: 13.91
epoch: 7, lr_model: 4.61e-01, lr_wav2vec: 7.29e-05 - train loss: 9.52e-02 - valid loss: 7.95e-02, valid CER: 1.79, valid WER: 10.42
epoch: 8, lr_model: 4.61e-01, lr_wav2vec: 7.29e-05 - train loss: 9.51e-02 - valid loss: 8.09e-02, valid CER: 1.74, valid WER: 9.14
epoch: 9, lr_model: 3.69e-01, lr_wav2vec: 6.56e-05 - train loss: 9.39e-02 - valid loss: 7.93e-02, valid CER: 2.11, valid WER: 14.41
epoch: 10, lr_model: 3.69e-01, lr_wav2vec: 6.56e-05 - train loss: 9.39e-02 - valid loss: 7.94e-02, valid CER: 2.07, valid WER: 14.09
I also tried freeze_wav2vec: False
and freeze_feature_extractor: True
the model learns for LibriSpeech but do not learn for other datasets such as Switchboard, WSJ, AMI.
Please comment if I am doing anything wrong? I haven’t tried wav2vec-large model yet because I was interested in smaller one.
Issue Analytics
- State:
- Created a year ago
- Comments:7
Top Results From Across the Web
Wav2Vec2 - Hugging Face
The Wav2Vec2 model was proposed in wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations by Alexei Baevski, Henry Zhou, ...
Read more >fairseq/README.md at main - Wav2vec 2.0 - GitHub
We learned speech representations in multiple languages as well in ... We release 2 models that are finetuned on data from 2 different...
Read more >Fine-tune and deploy a Wav2Vec2 model for speech ...
This part of training can be self-supervised; the transformer can be trained with unlabeled speech and learn from it. Then the model is...
Read more >Self-training and pre-training, understanding the wav2vec ...
It is not new that speech recognition tasks require huge amounts of ... This is similar to transfer learning where you pre-train a...
Read more >Wav2vec 2.0: Learning the structure of speech from raw audio
Facebook AI is releasing code and models for wav2vec 2.0, ... It's simply not feasible to obtain resources for each dialect and every...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Great! In practice, I think it would be better to even have a warmup phase: freeze the w2v2 for N steps, and then unfreeze it to fine-tune it.
Hi, Just realized that reducing the value of
lr_wav2vec
make base model to work. I set thelr_wav2vec=0.000001
and it started learning for switchboard dataset. The results of couple of epochs are as follows: