question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Wav2Vec2ForPreTraining in 4.12 broke SpeechBrain implementation

See original GitHub issue

Environment info

  • transformers version:
  • Platform: Linux
  • Python version: 3.8
  • PyTorch version (GPU?): 1.9 (and 1.10)
  • Using GPU in script?: 1-32 Tesla V100
  • Using distributed or parallel set-up in script?: DDP

Who can help

@patrickvonplaten, @anton-l

Information

Model I am using (Bert, XLNet …): wav2vec-base (original is on facebookai repo)

The problem arises when using:

  • the official example scripts: (give details below)
  • my own modified scripts: (give details below)

The tasks I am working on is:

  • an official GLUE/SQUaD task: (give the name)
  • my own task or dataset: (give details below)

To reproduce

Steps to reproduce the behavior:

  1. Go to the SpeechBrain PR and use the corresponding branch
  2. Install speechbrain (pip install -r requirements.txt / pip install -e .)
  3. Install extra_requirements in recipes/CommonVoice/self-supervised-learning/wav2vec2/extra_requirements.txt)
  4. Download and untar any CommonVoice english version (best using an old one to get less hours to debug …)
  5. start the training with a single GPU (as it doesn’t work either anymore) with: python recipes/CommonVoice/self-supervised-learning/wav2vec2/train.py recipes/CommonVoice/self-supervised-learning/wav2vec2/hparams/wav2vec2_base.yaml --data_folder=/path/to/CV/en --batch_size=adapttoyourgpu(12 if 32GB) --gradient_accumulation=8 or 16

Extra information about the code

The important code can be located in recipes/CommonVoice/self-supervised-learning/wav2vec2/train.py under the brain class for compute_forward and compute_objectives. The entire wrapping of the HF model into SpeechBrain happens at the bottom of the speechbrain/lobes/models/hugginface_wav2vec2.py file.

The batch that is received simply is of the form (batch, signal) just like for HF.

Expected behavior

With 4.11 (code can be found in the same PR from earlier commit) everything was working well ! We were even able to submit papers based on this work. Here is a list of the different logs obtained with the old working version:

epoch: 1, lr: 1.87e-05, steps: 1027, optimizer: AdamW - train loss: 6.41e+03 - valid loss: 4.53e+03, valid acc: 0.14673814177513123
epoch: 2, lr: 3.75e-05, steps: 2054, optimizer: AdamW - train loss: 6.18e+03 - valid loss: 4.45e+03, valid acc: 0.21184375882148743
epoch: 3, lr: 5.62e-05, steps: 3081, optimizer: AdamW - train loss: 5.67e+03 - valid loss: 3.70e+03, valid acc: 0.26702988147735596
epoch: 4, lr: 7.50e-05, steps: 4108, optimizer: AdamW - train loss: 5.19e+03 - valid loss: 3.70e+03, valid acc: 0.301466703414917
epoch: 5, lr: 9.37e-05, steps: 5135, optimizer: AdamW - train loss: 5.15e+03 - valid loss: 3.58e+03, valid acc: 0.33249199390411377
epoch: 6, lr: 1.12e-04, steps: 6162, optimizer: AdamW - train loss: 5.05e+03 - valid loss: 3.49e+03, valid acc: 0.3265174329280853

Now, we the new implementation:

epoch: 1, lr: 1.87e-05, steps: 1027, optimizer: AdamW - train loss: 7.09e+03 - valid loss: 4.87e+03, valid acc: 0.15861859917640686
epoch: 2, lr: 3.75e-05, steps: 2054, optimizer: AdamW - train loss: 6.67e+03 - valid loss: 4.67e+03, valid acc: 0.19915643334388733
epoch: 3, lr: 5.62e-05, steps: 3081, optimizer: AdamW - train loss: 6.39e+03 - valid loss: 4.41e+03, valid acc: 0.22449128329753876
epoch: 4, lr: 7.50e-05, steps: 4108, optimizer: AdamW - train loss: 6.18e+03 - valid loss: 4.25e+03, valid acc: 0.24435752630233765
epoch: 5, lr: 9.37e-05, steps: 5135, optimizer: AdamW - train loss: 6.01e+03 - valid loss: 4.15e+03, valid acc: 0.2056254893541336
epoch: 6, lr: 1.12e-04, steps: 6162, optimizer: AdamW - train loss: 5.88e+03 - valid loss: 4.11e+03, valid acc: 0.2493399679660797
epoch: 7, lr: 1.31e-04, steps: 7189, optimizer: AdamW - train loss: 5.76e+03 - valid loss: 4.02e+03, valid acc: 0.27252206206321716
epoch: 8, lr: 1.50e-04, steps: 8216, optimizer: AdamW - train loss: 5.66e+03 - valid loss: 3.97e+03, valid acc: 0.26998990774154663
epoch: 9, lr: 1.69e-04, steps: 9243, optimizer: AdamW - train loss: 5.59e+03 - valid loss: 3.85e+03, valid acc: 0.24951176345348358
epoch: 10, lr: 1.87e-04, steps: 10270, optimizer: AdamW - train loss: 5.51e+03 - valid loss: 3.80e+03, valid acc: 0.24127712845802307
epoch: 11, lr: 2.06e-04, steps: 11297, optimizer: AdamW - train loss: 5.43e+03 - valid loss: 3.72e+03, valid acc: 0.2344648540019989
epoch: 12, lr: 2.25e-04, steps: 12324, optimizer: AdamW - train loss: 5.37e+03 - valid loss: 3.74e+03, valid acc: 0.20351676642894745
epoch: 13, lr: 2.44e-04, steps: 13351, optimizer: AdamW - train loss: 5.30e+03 - valid loss: 3.72e+03, valid acc: 0.1984717845916748
epoch: 14, lr: 2.62e-04, steps: 14378, optimizer: AdamW - train loss: 5.29e+03 - valid loss: 3.66e+03, valid acc: 0.2088804990053177
epoch: 15, lr: 2.81e-04, steps: 15405, optimizer: AdamW - train loss: 5.25e+03 - valid loss: 3.64e+03, valid acc: 0.21932080388069153
epoch: 16, lr: 3.00e-04, steps: 16432, optimizer: AdamW - train loss: 5.21e+03 - valid loss: 3.62e+03, valid acc: 0.20787915587425232

As a side not, I think that exporting masking and negative_sampling from the forward function is a bad idea for external toolkit compability. If everything was ambedded in the .forward() function, any toolkit could just instantiate your model and run it without worrying about the library version. Now, everytime HuggingFace generates a new transformers version, I will have to check and adapt the potential changes 😦

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:8 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
TParcolletcommented, Jan 3, 2022

It’s fixed, but mysterious research-wise.

0reactions
patrickvonplatencommented, Jan 3, 2022

Ok! And now having replaced mask_time_indices=mask_time_indices with mask_time_indices=torch.ones(...) fixed the problem or still not 100%?

Read more comments on GitHub >

github_iconTop Results From Across the Web

speechbrain/ssl-wav2vec2-base-librispeech - Hugging Face
This HuggingFace repository provides all the necessary tools to extract wav2vec2 embeddings from a pretrained model. For a better experience, we ...
Read more >
select rows limit 1000 - Mbdavid/LiteDB.Studio - IssueHint
Issue Title Created Date Comment Count Updated Date Cannot connect to plug 5 2022‑01‑12 2022‑09‑07 Help with prettier‑plugin‑sort‑imports 5 2021‑05‑22 2022‑08‑04 How is the gt_hoi_py2.pkl file...
Read more >
SpeechBrain: Unifying Speech Technologies and Deep ...
Title: SpeechBrain : Unifying Speech Technologies and Deep Learning With an Open Source ToolkitAuthors: Titouan ParcolletCategory: ...
Read more >
SpeechBrain Basics
SpeechBrain provides a convenient framework for organizing the training loop, in the form of a class known as the "Brain" class, implemented in ......
Read more >
Contributing — SpeechBrain 0.5.0 documentation
Also note the automatic doctests (see here. Comments: We encourage developers to write self-documenting code, and use proper comments where the implementation ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found