Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Wav2Vec2ForPreTraining in 4.12 broke SpeechBrain implementation

See original GitHub issue

Environment info

transformers version:
Platform: Linux
Python version: 3.8
PyTorch version (GPU?): 1.9 (and 1.10)
Using GPU in script?: 1-32 Tesla V100
Using distributed or parallel set-up in script?: DDP

Who can help

@patrickvonplaten, @anton-l

Information

Model I am using (Bert, XLNet …): wav2vec-base (original is on facebookai repo)

The problem arises when using:

the official example scripts: (give details below)
my own modified scripts: (give details below)

The tasks I am working on is:

an official GLUE/SQUaD task: (give the name)
my own task or dataset: (give details below)

To reproduce

Steps to reproduce the behavior:

Go to the SpeechBrain PR and use the corresponding branch
Install speechbrain (pip install -r requirements.txt / pip install -e .)
Install extra_requirements in recipes/CommonVoice/self-supervised-learning/wav2vec2/extra_requirements.txt)
Download and untar any CommonVoice english version (best using an old one to get less hours to debug …)
start the training with a single GPU (as it doesn’t work either anymore) with: python recipes/CommonVoice/self-supervised-learning/wav2vec2/train.py recipes/CommonVoice/self-supervised-learning/wav2vec2/hparams/wav2vec2_base.yaml --data_folder=/path/to/CV/en --batch_size=adapttoyourgpu(12 if 32GB) --gradient_accumulation=8 or 16

Extra information about the code

The important code can be located in recipes/CommonVoice/self-supervised-learning/wav2vec2/train.py under the brain class for compute_forward and compute_objectives. The entire wrapping of the HF model into SpeechBrain happens at the bottom of the speechbrain/lobes/models/hugginface_wav2vec2.py file.

The batch that is received simply is of the form (batch, signal) just like for HF.

Expected behavior

With 4.11 (code can be found in the same PR from earlier commit) everything was working well ! We were even able to submit papers based on this work. Here is a list of the different logs obtained with the old working version:

epoch: 1, lr: 1.87e-05, steps: 1027, optimizer: AdamW - train loss: 6.41e+03 - valid loss: 4.53e+03, valid acc: 0.14673814177513123
epoch: 2, lr: 3.75e-05, steps: 2054, optimizer: AdamW - train loss: 6.18e+03 - valid loss: 4.45e+03, valid acc: 0.21184375882148743
epoch: 3, lr: 5.62e-05, steps: 3081, optimizer: AdamW - train loss: 5.67e+03 - valid loss: 3.70e+03, valid acc: 0.26702988147735596
epoch: 4, lr: 7.50e-05, steps: 4108, optimizer: AdamW - train loss: 5.19e+03 - valid loss: 3.70e+03, valid acc: 0.301466703414917
epoch: 5, lr: 9.37e-05, steps: 5135, optimizer: AdamW - train loss: 5.15e+03 - valid loss: 3.58e+03, valid acc: 0.33249199390411377
epoch: 6, lr: 1.12e-04, steps: 6162, optimizer: AdamW - train loss: 5.05e+03 - valid loss: 3.49e+03, valid acc: 0.3265174329280853

Now, we the new implementation:

epoch: 1, lr: 1.87e-05, steps: 1027, optimizer: AdamW - train loss: 7.09e+03 - valid loss: 4.87e+03, valid acc: 0.15861859917640686
epoch: 2, lr: 3.75e-05, steps: 2054, optimizer: AdamW - train loss: 6.67e+03 - valid loss: 4.67e+03, valid acc: 0.19915643334388733
epoch: 3, lr: 5.62e-05, steps: 3081, optimizer: AdamW - train loss: 6.39e+03 - valid loss: 4.41e+03, valid acc: 0.22449128329753876
epoch: 4, lr: 7.50e-05, steps: 4108, optimizer: AdamW - train loss: 6.18e+03 - valid loss: 4.25e+03, valid acc: 0.24435752630233765
epoch: 5, lr: 9.37e-05, steps: 5135, optimizer: AdamW - train loss: 6.01e+03 - valid loss: 4.15e+03, valid acc: 0.2056254893541336
epoch: 6, lr: 1.12e-04, steps: 6162, optimizer: AdamW - train loss: 5.88e+03 - valid loss: 4.11e+03, valid acc: 0.2493399679660797
epoch: 7, lr: 1.31e-04, steps: 7189, optimizer: AdamW - train loss: 5.76e+03 - valid loss: 4.02e+03, valid acc: 0.27252206206321716
epoch: 8, lr: 1.50e-04, steps: 8216, optimizer: AdamW - train loss: 5.66e+03 - valid loss: 3.97e+03, valid acc: 0.26998990774154663
epoch: 9, lr: 1.69e-04, steps: 9243, optimizer: AdamW - train loss: 5.59e+03 - valid loss: 3.85e+03, valid acc: 0.24951176345348358
epoch: 10, lr: 1.87e-04, steps: 10270, optimizer: AdamW - train loss: 5.51e+03 - valid loss: 3.80e+03, valid acc: 0.24127712845802307
epoch: 11, lr: 2.06e-04, steps: 11297, optimizer: AdamW - train loss: 5.43e+03 - valid loss: 3.72e+03, valid acc: 0.2344648540019989
epoch: 12, lr: 2.25e-04, steps: 12324, optimizer: AdamW - train loss: 5.37e+03 - valid loss: 3.74e+03, valid acc: 0.20351676642894745
epoch: 13, lr: 2.44e-04, steps: 13351, optimizer: AdamW - train loss: 5.30e+03 - valid loss: 3.72e+03, valid acc: 0.1984717845916748
epoch: 14, lr: 2.62e-04, steps: 14378, optimizer: AdamW - train loss: 5.29e+03 - valid loss: 3.66e+03, valid acc: 0.2088804990053177
epoch: 15, lr: 2.81e-04, steps: 15405, optimizer: AdamW - train loss: 5.25e+03 - valid loss: 3.64e+03, valid acc: 0.21932080388069153
epoch: 16, lr: 3.00e-04, steps: 16432, optimizer: AdamW - train loss: 5.21e+03 - valid loss: 3.62e+03, valid acc: 0.20787915587425232

As a side not, I think that exporting masking and negative_sampling from the forward function is a bad idea for external toolkit compability. If everything was ambedded in the .forward() function, any toolkit could just instantiate your model and run it without worrying about the library version. Now, everytime HuggingFace generates a new transformers version, I will have to check and adapt the potential changes 😦

Issue Analytics

State:
Created 2 years ago
Comments:8 (4 by maintainers)

Top GitHub Comments

1reaction

TParcolletcommented, Jan 3, 2022

It’s fixed, but mysterious research-wise.

0reactions

patrickvonplatencommented, Jan 3, 2022

Ok! And now having replaced mask_time_indices=mask_time_indices with mask_time_indices=torch.ones(...) fixed the problem or still not 100%?

Top Results From Across the Web

speechbrain/ssl-wav2vec2-base-librispeech - Hugging Face

This HuggingFace repository provides all the necessary tools to extract wav2vec2 embeddings from a pretrained model. For a better experience, we ...

select rows limit 1000 - Mbdavid/LiteDB.Studio - IssueHint

Issue Title Created Date Comment Count Updated Date Cannot connect to plug 5 2022‑01‑12 2022‑09‑07 Help with prettier‑plugin‑sort‑imports 5 2021‑05‑22 2022‑08‑04 How is the gt_hoi_py2.pkl file...

SpeechBrain: Unifying Speech Technologies and Deep ...

Title: SpeechBrain : Unifying Speech Technologies and Deep Learning With an Open Source ToolkitAuthors: Titouan ParcolletCategory: ...

SpeechBrain Basics

SpeechBrain provides a convenient framework for organizing the training loop, in the form of a class known as the "Brain" class, implemented in ......

Contributing — SpeechBrain 0.5.0 documentation

Also note the automatic doctests (see here. Comments: We encourage developers to write self-documenting code, and use proper comments where the implementation ......