Wav2Vec 2.0 pretraining stuck
See original GitHub issue❓ Questions and Help
What is your question?
Why loss and accuracy plots look like this?
Code
Running hydra train as it’s shown in readme.md for w2v2 base.
Changes to config:
restore_file: /root/wav2vec_small.pt
reset_dataloader: true
reset_lr_scheduler: true
reset_meters: true
reset_optimizer: true
num_workers: 24
max_tokens: 1500000
ddp_backend: no_c10d
I’ve parsed hydra_train.log file for my pretraining then plotted it.
What have you tried?
.
What’s your environment?
- fairseq Version (e.g., 1.0 or master): master
- PyTorch Version (e.g., 1.0): 1.9.0a0+c3d40fd
- OS (e.g., Linux): Ubuntu 20.04 LTS
- How you installed fairseq (
pip
, source): source - Build command you used (if compiling from source): pip install --editable ./
- Python version: 3.8.10
- CUDA/cuDNN version: 11.3
- GPU models and configuration: RTX 3090
- Any other relevant information: It seems like this strange shape occurs once loss_1 and loss_2 become 0
Issue Analytics
- State:
- Created 2 years ago
- Comments:12
Top Results From Across the Web
Wav2Vec 2.0 pretraining stuck · Issue #3661 - GitHub
Running hydra train as it's shown in readme.md for w2v2 base. ... I've parsed hydra_train.log file for my pretraining then plotted it. What...
Read more >arXiv:2107.13530v2 [eess.AS] 7 Feb 2022
Our model can decrease pre-training times by 32% when learning a new language task, and learn this new audio- language representation without ...
Read more >Running A Wav2Vec Model With 1 Billion Parameters For ...
In this video I will be testing a huge pretrained model for Speech ... mainly the memory allocations seems to have some kind...
Read more >Shrinking Bigfoot: Reducing wav2vec 2.0 footprint - arXiv Vanity
Neural network-based speech recognition models achieve superior performances, but successfully training them requires a lot of labeled data [amodei2015deep] .
Read more >wav2vec-2.0 - File Exchange - MATLAB Central - MathWorks
This repo enables you to load the pretrained wav2vec 2.0 baseline 960 hours model into MATLAB and perform speech-to-text transcription [1].
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Went on and checked stable version 0.10.2 Process seems to be the same Loss freezes at 6.658 in training phase Although validation is quite nice and validation loss goes to low values as well as accuracy (about 0.02 and 0.99) Can someone explain this strange thing? Is it ok to use checkpoint like this? P.S. valid set to be 0.01 of original dataset
Hi @Etoye,
I managed to solve it. I had to reduce the learning rate and also introduce gradient accumulation (simulating multiple GPUs). Specifically, in the optimization part of the base model config I have:
optimization: max_update: 400000 lr: [0.0002] update_freq: [8]
I was able to train a base model successfully with this on just one GPU with fp16. I think the default learning rate was 0.0005.