Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Wav2Vec 2.0 pretraining stuck

See original GitHub issue

❓ Questions and Help

What is your question?

Why loss and accuracy plots look like this?

Code

Running hydra train as it’s shown in readme.md for w2v2 base. Changes to config: restore_file: /root/wav2vec_small.pt
reset_dataloader: true
reset_lr_scheduler: true
reset_meters: true
reset_optimizer: true
num_workers: 24 max_tokens: 1500000
ddp_backend: no_c10d

I’ve parsed hydra_train.log file for my pretraining then plotted it.

What have you tried?

What’s your environment?

fairseq Version (e.g., 1.0 or master): master
PyTorch Version (e.g., 1.0): 1.9.0a0+c3d40fd
OS (e.g., Linux): Ubuntu 20.04 LTS
How you installed fairseq (pip, source): source
Build command you used (if compiling from source): pip install --editable ./
Python version: 3.8.10
CUDA/cuDNN version: 11.3
GPU models and configuration: RTX 3090
Any other relevant information: It seems like this strange shape occurs once loss_1 and loss_2 become 0

Issue Analytics

State:
Created 2 years ago
Comments:12

Top GitHub Comments

2reactions

jubick1337commented, Jun 30, 2021

Went on and checked stable version 0.10.2 Process seems to be the same Loss freezes at 6.658 in training phase Although validation is quite nice and validation loss goes to low values as well as accuracy (about 0.02 and 0.99) Can someone explain this strange thing? Is it ok to use checkpoint like this? P.S. valid set to be 0.01 of original dataset

1reaction

bmildecommented, Feb 15, 2022

Hi @Etoye,

I managed to solve it. I had to reduce the learning rate and also introduce gradient accumulation (simulating multiple GPUs). Specifically, in the optimization part of the base model config I have:

optimization: max_update: 400000 lr: [0.0002] update_freq: [8]

I was able to train a base model successfully with this on just one GPU with fp16. I think the default learning rate was 0.0005.

Top Results From Across the Web

Wav2Vec 2.0 pretraining stuck · Issue #3661 - GitHub

Running hydra train as it's shown in readme.md for w2v2 base. ... I've parsed hydra_train.log file for my pretraining then plotted it. What...

arXiv:2107.13530v2 [eess.AS] 7 Feb 2022

Our model can decrease pre-training times by 32% when learning a new language task, and learn this new audio- language representation without ...

Running A Wav2Vec Model With 1 Billion Parameters For ...

In this video I will be testing a huge pretrained model for Speech ... mainly the memory allocations seems to have some kind...

Shrinking Bigfoot: Reducing wav2vec 2.0 footprint - arXiv Vanity

Neural network-based speech recognition models achieve superior performances, but successfully training them requires a lot of labeled data [amodei2015deep] .

wav2vec-2.0 - File Exchange - MATLAB Central - MathWorks

This repo enables you to load the pretrained wav2vec 2.0 baseline 960 hours model into MATLAB and perform speech-to-text transcription [1].