Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

wav2vec pretrain help

See original GitHub issue

❓ Questions and Help

i’m trying to pretrain custom wav2vec2 model using my own dataset. The dataset size is about 10k hour. The official wav2vec2 base model was used as parameter initialization. The training loss suddenly drop a lot after a few epoch training and validation loss become higher.

Before asking:

search the issues.
search the docs.

What is your question?

Training loss (purple )doesn’t look right. Validation loss (red) become higher
does code perplexity curve look normal?
does gradient curve look normal?

@alexeib can you kindly help? Thanks.

Code

use the same config as wav2vec2 base model.

What have you tried?

i tried lower the learning rate and fp32 training instead of fp16, but doesn’t help.

What’s your environment?

fairseq Version (e.g., 1.0 or master): master
PyTorch Version (e.g., 1.0)1.7.1
OS (e.g., Linux): Linnux
How you installed fairseq (pip, source): source
Build command you used (if compiling from source):pip install -e
Python version:3.7
CUDA/cuDNN version:11.0
GPU models and configuration: 4 V100
Any other relevant information:

Issue Analytics

State:
Created 2 years ago
Comments:9 (3 by maintainers)

Top GitHub Comments

1reaction

alexeibcommented, Oct 30, 2021

extractor_mode: layer_norm is much more stable and typically has similar performance to default (just make sure you set feature_grad_mult to 1.0 and task.normalize=true)

layer_norm_first allows you to train beyond 500k updates without crashing in fp16 mode. by itself it is not as accurate as post layer norm, but when you train for longer you outperform post layer norm models. for this to be effective you you need to significantly increase the learning rate as compared to post layer norm model (by 20-30x)

0reactions

stale[bot]commented, Apr 18, 2022

Closing this issue after a prolonged period of inactivity. If this issue is still present in the latest release, please create a new issue with up-to-date information. Thank you!

Top Results From Across the Web

Self-training and pre-training, understanding the wav2vec ...

If a pre-trained model captures the structure of speech, then it should require few labeled examples to fine-tune it for speech recognition. The ......

Wav2Vec2 - Hugging Face

The Wav2Vec2 model was proposed in wav2vec 2.0: A Framework for ... A notebook on how to leverage a pretrained Wav2Vec2 model for...

fairseq/README.md at main - Wav2vec 2.0 - GitHub

wav2vec 2.0 learns speech representations on unlabeled data as described in wav2vec ... We also release multilingual pre-trained wav2vec 2.0 (XLSR) models: ...

Wav2vec 2.0: Learning the structure of speech from raw audio

To address this issue, we explore the idea of cross-lingual training. The idea is to pretrain a single model on multiple languages at...

Wav2vec could be more efficient, so we created our ... - ASAPP

So we created our own pre-trained ASR Model for better Conversational AI. By Felix Wu, PhD. Research Scientist at ASAPP.