Wav2Vec2ForPreTraining in 4.12 broke SpeechBrain implementation
See original GitHub issueEnvironment info
transformers
version:- Platform: Linux
- Python version: 3.8
- PyTorch version (GPU?): 1.9 (and 1.10)
- Using GPU in script?: 1-32 Tesla V100
- Using distributed or parallel set-up in script?: DDP
Who can help
Information
Model I am using (Bert, XLNet …): wav2vec-base (original is on facebookai repo)
The problem arises when using:
- the official example scripts: (give details below)
- my own modified scripts: (give details below)
The tasks I am working on is:
- an official GLUE/SQUaD task: (give the name)
- my own task or dataset: (give details below)
To reproduce
Steps to reproduce the behavior:
- Go to the SpeechBrain PR and use the corresponding branch
- Install speechbrain (pip install -r requirements.txt / pip install -e .)
- Install extra_requirements in recipes/CommonVoice/self-supervised-learning/wav2vec2/extra_requirements.txt)
- Download and untar any CommonVoice english version (best using an old one to get less hours to debug …)
- start the training with a single GPU (as it doesn’t work either anymore) with:
python recipes/CommonVoice/self-supervised-learning/wav2vec2/train.py recipes/CommonVoice/self-supervised-learning/wav2vec2/hparams/wav2vec2_base.yaml --data_folder=/path/to/CV/en --batch_size=adapttoyourgpu(12 if 32GB) --gradient_accumulation=8 or 16
Extra information about the code
The important code can be located in recipes/CommonVoice/self-supervised-learning/wav2vec2/train.py
under the brain class for compute_forward and compute_objectives. The entire wrapping of the HF model into SpeechBrain happens at the bottom of the speechbrain/lobes/models/hugginface_wav2vec2.py
file.
The batch that is received simply is of the form (batch, signal) just like for HF.
Expected behavior
With 4.11 (code can be found in the same PR from earlier commit) everything was working well ! We were even able to submit papers based on this work. Here is a list of the different logs obtained with the old working version:
epoch: 1, lr: 1.87e-05, steps: 1027, optimizer: AdamW - train loss: 6.41e+03 - valid loss: 4.53e+03, valid acc: 0.14673814177513123
epoch: 2, lr: 3.75e-05, steps: 2054, optimizer: AdamW - train loss: 6.18e+03 - valid loss: 4.45e+03, valid acc: 0.21184375882148743
epoch: 3, lr: 5.62e-05, steps: 3081, optimizer: AdamW - train loss: 5.67e+03 - valid loss: 3.70e+03, valid acc: 0.26702988147735596
epoch: 4, lr: 7.50e-05, steps: 4108, optimizer: AdamW - train loss: 5.19e+03 - valid loss: 3.70e+03, valid acc: 0.301466703414917
epoch: 5, lr: 9.37e-05, steps: 5135, optimizer: AdamW - train loss: 5.15e+03 - valid loss: 3.58e+03, valid acc: 0.33249199390411377
epoch: 6, lr: 1.12e-04, steps: 6162, optimizer: AdamW - train loss: 5.05e+03 - valid loss: 3.49e+03, valid acc: 0.3265174329280853
Now, we the new implementation:
epoch: 1, lr: 1.87e-05, steps: 1027, optimizer: AdamW - train loss: 7.09e+03 - valid loss: 4.87e+03, valid acc: 0.15861859917640686
epoch: 2, lr: 3.75e-05, steps: 2054, optimizer: AdamW - train loss: 6.67e+03 - valid loss: 4.67e+03, valid acc: 0.19915643334388733
epoch: 3, lr: 5.62e-05, steps: 3081, optimizer: AdamW - train loss: 6.39e+03 - valid loss: 4.41e+03, valid acc: 0.22449128329753876
epoch: 4, lr: 7.50e-05, steps: 4108, optimizer: AdamW - train loss: 6.18e+03 - valid loss: 4.25e+03, valid acc: 0.24435752630233765
epoch: 5, lr: 9.37e-05, steps: 5135, optimizer: AdamW - train loss: 6.01e+03 - valid loss: 4.15e+03, valid acc: 0.2056254893541336
epoch: 6, lr: 1.12e-04, steps: 6162, optimizer: AdamW - train loss: 5.88e+03 - valid loss: 4.11e+03, valid acc: 0.2493399679660797
epoch: 7, lr: 1.31e-04, steps: 7189, optimizer: AdamW - train loss: 5.76e+03 - valid loss: 4.02e+03, valid acc: 0.27252206206321716
epoch: 8, lr: 1.50e-04, steps: 8216, optimizer: AdamW - train loss: 5.66e+03 - valid loss: 3.97e+03, valid acc: 0.26998990774154663
epoch: 9, lr: 1.69e-04, steps: 9243, optimizer: AdamW - train loss: 5.59e+03 - valid loss: 3.85e+03, valid acc: 0.24951176345348358
epoch: 10, lr: 1.87e-04, steps: 10270, optimizer: AdamW - train loss: 5.51e+03 - valid loss: 3.80e+03, valid acc: 0.24127712845802307
epoch: 11, lr: 2.06e-04, steps: 11297, optimizer: AdamW - train loss: 5.43e+03 - valid loss: 3.72e+03, valid acc: 0.2344648540019989
epoch: 12, lr: 2.25e-04, steps: 12324, optimizer: AdamW - train loss: 5.37e+03 - valid loss: 3.74e+03, valid acc: 0.20351676642894745
epoch: 13, lr: 2.44e-04, steps: 13351, optimizer: AdamW - train loss: 5.30e+03 - valid loss: 3.72e+03, valid acc: 0.1984717845916748
epoch: 14, lr: 2.62e-04, steps: 14378, optimizer: AdamW - train loss: 5.29e+03 - valid loss: 3.66e+03, valid acc: 0.2088804990053177
epoch: 15, lr: 2.81e-04, steps: 15405, optimizer: AdamW - train loss: 5.25e+03 - valid loss: 3.64e+03, valid acc: 0.21932080388069153
epoch: 16, lr: 3.00e-04, steps: 16432, optimizer: AdamW - train loss: 5.21e+03 - valid loss: 3.62e+03, valid acc: 0.20787915587425232
As a side not, I think that exporting masking and negative_sampling from the forward function is a bad idea for external toolkit compability. If everything was ambedded in the .forward() function, any toolkit could just instantiate your model and run it without worrying about the library version. Now, everytime HuggingFace generates a new transformers version, I will have to check and adapt the potential changes 😦
Issue Analytics
- State:
- Created 2 years ago
- Comments:8 (4 by maintainers)
It’s fixed, but mysterious research-wise.
Ok! And now having replaced
mask_time_indices=mask_time_indices
withmask_time_indices=torch.ones(...)
fixed the problem or still not 100%?