question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

conformer-ctc small not converge

See original GitHub issue

Describe the bug Not a bug, ask for help and training techniques conformer-ctc small can’t converge on Librispeech 960h

Basic environments:

  • OS information: Ubuntu 18.04.1 LTS
  • python version: [e.g. 3.8.5 (default, Sep 24 2020, 16:55:52) [GCC 7.5.0]`]
  • espnet version: [e.g. espnet 0.9.6]
  • Git hash [e.g. c84da5743b7ef70c0c6212715859bdebdcf873b2]
    • Commit date [e.g. Tue Sep 1 09:32:54 2020 -0400]
  • pytorch version [e.g. pytorch 1.7.1]

Environments from torch.utils.collect_env: Collecting environment information… PyTorch version: 1.7.1 Is debug build: False CUDA used to build PyTorch: 10.2

OS: Ubuntu 18.04.1 LTS GCC version: (GCC) 7.5.0 CMake version: version 3.10.2

Python version: 3.8 Is CUDA available: Yes CUDA runtime version: 10.0.130 GPU models and configuration: GPU 0: Tesla V100-PCIE GPU 1: Tesla V100-PCIE GPU 2: Tesla V100-PCIE GPU 3: Tesla V100-PCIE

Nvidia driver version: 470.63.01 cuDNN version: /usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.5

Versions of relevant libraries: Versions of relevant libraries: [pip3] numpy==1.20.1 [pip3] pytorch-wpe==0.0.1 [pip3] torch==1.7.1 [pip3] torch-complex==0.2.1 [conda] Could not collect

Task information:

  • Task: ASR
  • librispeech 960h
  • ESPnet2

To Reproduce Actually I want to reproduce this model (https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo/models/stt_en_conformer_ctc_small) in espnet, but I can’t get it converge.

I have tried several parameters, these are my config files. The init one is the default conformer yaml in espnet and I change the ctc weight to 1.0 and the dimensions of encoder. It can’t converge and I tried to change the config it refering to (https://github.com/NVIDIA/NeMo/blob/main/examples/asr/conf/conformer/conformer_ctc_bpe.yaml)

train_asr_conformer_ctc.txt train_asr_conformer_ctc_v1.txt train_asr_conformer_ctc_v2.txt

I also find that the loss doesn’t going down anymore at about epoch2 (maybe it’s due to gradient vanishing?, the same happens for v0 and v2, v1 diverge due to the large lr) loss_ctc

The training logs for v0 and v2 is here (there is warning for no valid stats in the log, does this have any impact on the results?) trainv0.log trainv2.log

I have run the whole process just for one time , the other exp are conducted from stage 10vim

Do you have any suggestions ? Thanks a lot if you could help

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:16 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
BuaaAlbancommented, Feb 10, 2022

train_asr_conformer_ctc_v5.txt It should be this one, you can try it

0reactions
cuongld-vbeecommented, Feb 10, 2022

2021.11.15Update: The model has converged. Thanks a lot for your help! The following are the results. image

May you upload your config file for this result.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Conformer training from scratch not achieving the reported WER
Unfortunately Conformer-CTC models for LS need a lot of time to converge to SOTA numbers. Among them the small version is the worst....
Read more >
STT En Conformer-CTC Small - NVIDIA NGC
Conformer -CTC model is a non-autoregressive variant of Conformer model [1] for Automatic Speech Recognition which uses CTC loss/decoding instead ...
Read more >
EFFICIENT CONFORMER - Archive ouverte HAL
We found RNN-T models to converge faster with fewer epochs than CTC models, achieving lower greedy WER.
Read more >
arXiv:2109.01163v2 [eess.AS] 8 Sep 2021
We found RNN-T models to converge faster with fewer epochs than CTC models, achieving lower greedy WER.
Read more >
An Improvement to Conformer-Based Model for High ... - NCBI
However, current capsule networks have a small number of layers, ... in the conformer is replaced by the capsule network, the convergence ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found