I tried to run a librispeech transformer recipe with 8 GPU but a word error rate remains very large.
See original GitHub issueI tried to run a librispeech transformer recipe with 8 GPU by using DDP (https://github.com/speechbrain/speechbrain/blob/develop/recipes/LibriSpeech/ASR/transformer/train.py) but a word error rate remains very large(around 100%) in spite of 10 epochs.
epoch: 1, lr: 1.00e+00, steps: 4056, optimizer: Adam - train loss: 2.50e+02 - valid loss: 1.32e+02, valid ACC: 1.97e-01
epoch: 2, lr: 1.17e-04, steps: 12845, optimizer: Adam - train loss: 2.11e+02 - valid loss: 1.27e+02, valid ACC: 2.24e-01
epoch: 3, lr: 1.97e-04, steps: 21634, optimizer: Adam - train loss: 2.04e+02 - valid loss: 1.25e+02, valid ACC: 2.43e-01
epoch: 4, lr: 2.07e-04, steps: 30423, optimizer: Adam - train loss: 1.98e+02 - valid loss: 1.23e+02, valid ACC: 2.58e-01
epoch: 5, lr: 1.82e-04, steps: 39212, optimizer: Adam - train loss: 1.93e+02 - valid loss: 1.21e+02, valid ACC: 2.67e-01
epoch: 6, lr: 1.65e-04, steps: 48001, optimizer: Adam - train loss: 1.89e+02 - valid loss: 1.21e+02, valid ACC: 2.71e-01
epoch: 7, lr: 1.51e-04, steps: 56790, optimizer: Adam - train loss: 1.85e+02 - valid loss: 1.21e+02, valid ACC: 2.70e-01
epoch: 8, lr: 1.41e-04, steps: 65579, optimizer: Adam - train loss: 1.82e+02 - valid loss: 1.22e+02, valid ACC: 2.67e-01
epoch: 9, lr: 1.32e-04, steps: 74368, optimizer: Adam - train loss: 1.79e+02 - valid loss: 1.23e+02, valid ACC: 2.64e-01
epoch: 10, lr: 1.25e-04, steps: 83157, optimizer: Adam - train loss: 1.76e+02 - valid loss: 1.24e+02, valid ACC: 2.61e-01, valid WER: 96.31
I ran the following command.
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m torch.distributed.launch --nproc_per_node=8 train.py hparams/transformer.yaml --distributed_launch --distributed_backend='nccl'
I reduced batch_size from 16 to 4 in order to avoid Out Of Memory Error and changed gradient_accumulation from 4 to 1 according to https://github.com/speechbrain/speechbrain/issues/899. I also tried to train setting gradient_accumulation to 4 and 2, but the results were no different. My environment is as follows.
PyTorch version: 1.10.0
Is debug build: False
CUDA used to build PyTorch: 11.1
ROCM used to build PyTorch: N/A
OS: Ubuntu 18.04.6 LTS (x86_64)
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Clang version: Could not collect
CMake version: version 3.22.1
Libc version: glibc-2.27
Python version: 3.8.12 | packaged by conda-forge | (default, Oct 12 2021, 21:59:51) [GCC 9.4.0] (64-bit runtime)
Python platform: Linux-5.4.0-1063-aws-x86_64-with-glibc2.10
Is CUDA available: True
CUDA runtime version: 11.1.105
GPU models and configuration:
GPU 0: Tesla V100-SXM2-16GB
GPU 1: Tesla V100-SXM2-16GB
GPU 2: Tesla V100-SXM2-16GB
GPU 3: Tesla V100-SXM2-16GB
GPU 4: Tesla V100-SXM2-16GB
GPU 5: Tesla V100-SXM2-16GB
GPU 6: Tesla V100-SXM2-16GB
GPU 7: Tesla V100-SXM2-16GB
Nvidia driver version: 460.106.00
A commit hash of the speechbrain is d6bfe13. Could you give me any hint? Thanks for your help.
Issue Analytics
- State:
- Created 2 years ago
- Comments:15
Top Results From Across the Web
Bad WER for new dataset · Issue #1420 · espnet ... - GitHub
I'm interested in ESPnet with a new dataset, I have done some experiments, but I got bad WER. The dataset, which I used...
Read more >README.md · speechbrain/asr-wav2vec2-librispeech at main
This repository provides all the necessary tools to perform automatic speech recognition from an end-to-end system pretrained on LibriSpeech ( ...
Read more >LibriSpeech test-clean Benchmark (Speech Recognition)
Rank Model Word Error Rate (WER) Year
2 w2v‑BERT XXL 1.4 2021
3 Conv + Transformer + wav2vec2.0 + pseudo labeling 1.5 2020
5 SpeechStew (1B)...
Read more >Performance Evaluation of Offline Speech - ProQuest
On the Jetson Nano GPU, the inference latency is three to five times better, compared to Raspberry Pi. The word error rate on...
Read more >Shrinking Bigfoot: Reducing wav2vec 2.0 footprint - arXiv Vanity
With only 10 minutes of labeled data, wav2vec 2.0 achieves word error rates (WER) of 4.8% and 8.2 % on LibriSpeech [librispeech] dev-clean...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hey folks, we updated the whole librispeech recipe. Now the model should be 1. better; 2. much smaller and therefore easier to train with less GPUs 😃
Thank you for very kind supports. We could train models by using the new script. So, I will close this issue.