question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

wav2vec2 with freeze=False and facebook/wav2vec2-base-960h model do not learn

See original GitHub issue

Hi, I am trying to use this recipe speechbrain/recipes/LibriSpeech/ASR/CTC with training data of LibriSpeech-360h and test data LS-test-clean with just following changes: wav2vec2_hub: facebook/wav2vec2-base-960h dnn_neurons: 768 freeze_wav2vec: False

with such configuration the model do not learn, the loss remains consistent and also WER. Please have a look at following train_log.

epoch: 1, lr_model: 5.00e-01, lr_wav2vec: 1.00e-04 - train loss: 2.88 - valid loss: 2.94, valid CER: 1.00e+02, valid WER: 1.00e+02
epoch: 2, lr_model: 5.00e-01, lr_wav2vec: 1.00e-04 - train loss: 2.87 - valid loss: 2.93, valid CER: 1.00e+02, valid WER: 1.00e+02
epoch: 3, lr_model: 4.00e-01, lr_wav2vec: 9.00e-05 - train loss: 2.87 - valid loss: 2.93, valid CER: 1.00e+02, valid WER: 1.00e+02
epoch: 4, lr_model: 3.20e-01, lr_wav2vec: 8.10e-05 - train loss: 2.88 - valid loss: 2.93, valid CER: 1.00e+02, valid WER: 1.00e+02
epoch: 5, lr_model: 2.56e-01, lr_wav2vec: 7.29e-05 - train loss: 2.88 - valid loss: 2.93, valid CER: 1.00e+02, valid WER: 1.00e+02
epoch: 6, lr_model: 2.05e-01, lr_wav2vec: 6.56e-05 - train loss: 2.88 - valid loss: 2.93, valid CER: 1.00e+02, valid WER: 1.00e+02

the same work for freeze_wav2vec: True But WER fluctuates.

epoch: 1, lr_model: 9.00e-01, lr_wav2vec: 1.00e-04 - train loss: 1.26e-01 - valid loss: 7.98e-02, valid CER: 2.22, valid WER: 14.62
epoch: 2, lr_model: 9.00e-01, lr_wav2vec: 1.00e-04 - train loss: 1.02e-01 - valid loss: 8.03e-02, valid CER: 2.00, valid WER: 11.95
epoch: 3, lr_model: 7.20e-01, lr_wav2vec: 9.00e-05 - train loss: 9.78e-02 - valid loss: 9.00e-02, valid CER: 2.82, valid WER: 21.35
epoch: 4, lr_model: 5.76e-01, lr_wav2vec: 8.10e-05 - train loss: 9.66e-02 - valid loss: 8.10e-02, valid CER: 1.98, valid WER: 12.73
epoch: 5, lr_model: 5.76e-01, lr_wav2vec: 8.10e-05 - train loss: 9.60e-02 - valid loss: 7.99e-02, valid CER: 2.37, valid WER: 17.03
epoch: 6, lr_model: 5.76e-01, lr_wav2vec: 8.10e-05 - train loss: 9.57e-02 - valid loss: 8.02e-02, valid CER: 2.10, valid WER: 13.91
epoch: 7, lr_model: 4.61e-01, lr_wav2vec: 7.29e-05 - train loss: 9.52e-02 - valid loss: 7.95e-02, valid CER: 1.79, valid WER: 10.42
epoch: 8, lr_model: 4.61e-01, lr_wav2vec: 7.29e-05 - train loss: 9.51e-02 - valid loss: 8.09e-02, valid CER: 1.74, valid WER: 9.14
epoch: 9, lr_model: 3.69e-01, lr_wav2vec: 6.56e-05 - train loss: 9.39e-02 - valid loss: 7.93e-02, valid CER: 2.11, valid WER: 14.41
epoch: 10, lr_model: 3.69e-01, lr_wav2vec: 6.56e-05 - train loss: 9.39e-02 - valid loss: 7.94e-02, valid CER: 2.07, valid WER: 14.09

I also tried freeze_wav2vec: False and freeze_feature_extractor: True the model learns for LibriSpeech but do not learn for other datasets such as Switchboard, WSJ, AMI.

Please comment if I am doing anything wrong? I haven’t tried wav2vec-large model yet because I was interested in smaller one.

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:7

github_iconTop GitHub Comments

1reaction
TParcolletcommented, Aug 17, 2022

Great! In practice, I think it would be better to even have a warmup phase: freeze the w2v2 for N steps, and then unfreeze it to fine-tune it.

0reactions
Rehan-Ahmadcommented, Aug 17, 2022

Hi, Just realized that reducing the value of lr_wav2vec make base model to work. I set the lr_wav2vec=0.000001 and it started learning for switchboard dataset. The results of couple of epochs are as follows:

epoch: 1, lr_model: 9.00e-01, lr_wav2vec: 1.00e-06 - train loss: 9.60e-01 - valid loss: 1.30, valid CER: 16.84, valid WER: 35.99
epoch: 2, lr_model: 9.00e-01, lr_wav2vec: 1.00e-06 - train loss: 6.79e-01 - valid loss: 1.20, valid CER: 15.97, valid WER: 35.03
epoch: 3, lr_model: 9.00e-01, lr_wav2vec: 1.00e-06 - train loss: 6.06e-01 - valid loss: 1.19, valid CER: 14.90, valid WER: 33.67
epoch: 4, lr_model: 9.00e-01, lr_wav2vec: 1.00e-06 - train loss: 5.64e-01 - valid loss: 1.15, valid CER: 14.42, valid WER: 32.79
epoch: 5, lr_model: 9.00e-01, lr_wav2vec: 1.00e-06 - train loss: 5.35e-01 - valid loss: 1.19, valid CER: 14.29, valid WER: 32.77
epoch: 6, lr_model: 7.20e-01, lr_wav2vec: 9.00e-07 - train loss: 5.09e-01 - valid loss: 1.19, valid CER: 14.01, valid WER: 32.21
epoch: 7, lr_model: 7.20e-01, lr_wav2vec: 9.00e-07 - train loss: 4.91e-01 - valid loss: 1.17, valid CER: 13.48, valid WER: 31.41
epoch: 8, lr_model: 7.20e-01, lr_wav2vec: 9.00e-07 - train loss: 4.78e-01 - valid loss: 1.17, valid CER: 13.49, valid WER: 31.28
epoch: 9, lr_model: 7.20e-01, lr_wav2vec: 9.00e-07 - train loss: 4.63e-01 - valid loss: 1.20, valid CER: 13.15, valid WER: 30.96

Read more comments on GitHub >

github_iconTop Results From Across the Web

Wav2Vec2 - Hugging Face
The Wav2Vec2 model was proposed in wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations by Alexei Baevski, Henry Zhou, ...
Read more >
fairseq/README.md at main - Wav2vec 2.0 - GitHub
We learned speech representations in multiple languages as well in ... We release 2 models that are finetuned on data from 2 different...
Read more >
Fine-tune and deploy a Wav2Vec2 model for speech ...
This part of training can be self-supervised; the transformer can be trained with unlabeled speech and learn from it. Then the model is...
Read more >
Self-training and pre-training, understanding the wav2vec ...
It is not new that speech recognition tasks require huge amounts of ... This is similar to transfer learning where you pre-train a...
Read more >
Wav2vec 2.0: Learning the structure of speech from raw audio
Facebook AI is releasing code and models for wav2vec 2.0, ... It's simply not feasible to obtain resources for each dialect and every...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found