wav2vec pretrain help
See original GitHub issue❓ Questions and Help
i’m trying to pretrain custom wav2vec2 model using my own dataset. The dataset size is about 10k hour. The official wav2vec2 base model was used as parameter initialization. The training loss suddenly drop a lot after a few epoch training and validation loss become higher.
Before asking:
- search the issues.
- search the docs.
What is your question?
-
Training loss (purple )doesn’t look right. Validation loss (red) become higher
-
does code perplexity curve look normal?
-
does gradient curve look normal?
@alexeib can you kindly help? Thanks.
Code
use the same config as wav2vec2 base model.
What have you tried?
i tried lower the learning rate and fp32 training instead of fp16, but doesn’t help.
What’s your environment?
- fairseq Version (e.g., 1.0 or master): master
- PyTorch Version (e.g., 1.0)1.7.1
- OS (e.g., Linux): Linnux
- How you installed fairseq (
pip
, source): source - Build command you used (if compiling from source):pip install -e
- Python version:3.7
- CUDA/cuDNN version:11.0
- GPU models and configuration: 4 V100
- Any other relevant information:
Issue Analytics
- State:
- Created 2 years ago
- Comments:9 (3 by maintainers)
Top Results From Across the Web
Self-training and pre-training, understanding the wav2vec ...
If a pre-trained model captures the structure of speech, then it should require few labeled examples to fine-tune it for speech recognition. The ......
Read more >Wav2Vec2 - Hugging Face
The Wav2Vec2 model was proposed in wav2vec 2.0: A Framework for ... A notebook on how to leverage a pretrained Wav2Vec2 model for...
Read more >fairseq/README.md at main - Wav2vec 2.0 - GitHub
wav2vec 2.0 learns speech representations on unlabeled data as described in wav2vec ... We also release multilingual pre-trained wav2vec 2.0 (XLSR) models: ...
Read more >Wav2vec 2.0: Learning the structure of speech from raw audio
To address this issue, we explore the idea of cross-lingual training. The idea is to pretrain a single model on multiple languages at...
Read more >Wav2vec could be more efficient, so we created our ... - ASAPP
So we created our own pre-trained ASR Model for better Conversational AI. By Felix Wu, PhD. Research Scientist at ASAPP.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
extractor_mode: layer_norm is much more stable and typically has similar performance to default (just make sure you set feature_grad_mult to 1.0 and task.normalize=true)
layer_norm_first allows you to train beyond 500k updates without crashing in fp16 mode. by itself it is not as accurate as post layer norm, but when you train for longer you outperform post layer norm models. for this to be effective you you need to significantly increase the learning rate as compared to post layer norm model (by 20-30x)
Closing this issue after a prolonged period of inactivity. If this issue is still present in the latest release, please create a new issue with up-to-date information. Thank you!