Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

CTC loss was 0 when joint training on Ubuntu

See original GitHub issue

Hi all, I have reproduce the result on Centos7.2+cuda8.0, and I also installed the same code on Ubuntu16.04+cuda9.1, everything was ok during installation. However, actually, attention task was ok, while the ctc task loss was 0 all the time during joint ctc/attention training, “mltalpha=0.5”. Here is the log file:

    {
        "main/loss": 35.740950293540955, 
        "main/loss_ctc": 0.0, 
        "iteration": 100, 
        "eps": 1e-08, 
        "main/loss_att": 71.48190058708191, 
        "elapsed_time": 41.73378896713257, 
        "epoch": 0, 
        "main/acc": 0.11974184565721473
    }, 
    {
        "main/loss": 28.910011546611784, 
        "main/loss_ctc": 0.0, 
        "iteration": 200, 
        "eps": 1e-08, 
        "main/loss_att": 57.82002309322357, 
        "elapsed_time": 81.66535997390747, 
        "epoch": 0, 
        "main/acc": 0.17701563459985487
    },

The backend is pytorch and I think it is the problem of warp-ctc on Ubuntu? I have not idea of how to fix it, Please help!

Issue Analytics

State:
Created 5 years ago
Comments:11 (5 by maintainers)

Top GitHub Comments

1reaction

fotwocommented, Jul 4, 2018

Hi, @Fhrozen, @sw005320. I have reinstalled the chainer_ctc and warp-ctc following the last Makefile strictly. And both --ctc_type warpctc --backend chainer and --ctc_type warpctc --backend pytorch work now. During my former installation of warp-ctc, I did not git checkout to the right version, while it still works on my Centos machine. And the former fault on chainer_ctc installation may be using soft link for the warp-ctc which is needed in chainer_ctc/ext.(I do not test chainer_ctc on Centos, but I think it will probably evoke the same errors too.). Sorry for my silly faults. Thanks again!

0reactions

Fhrozencommented, Jul 4, 2018

Just to confirm, I will like to know which recipe are you using. I tested with voxforge. With --backend pytorch --ctc_type warpctc, the ctc is not zero:

{
        "main/loss": 210.09339630126954, 
        "main/loss_ctc": 231.01422760009766, 
        "iteration": 100, 
        "eps": 1e-08, 
        "main/loss_att": 189.17256477355957, 
        "elapsed_time": 100.62332081794739, 
        "epoch": 0, 
        "main/acc": 0.2435916450817478
    }, 
    {
        "main/loss": 187.8331169128418, 
        "main/loss_ctc": 214.7629539489746, 
        "iteration": 200, 
        "eps": 1e-08, 
        "main/loss_att": 160.9032810974121, 
        "elapsed_time": 199.0161759853363, 
        "epoch": 0, 
        "main/acc": 0.3145206625358909
    },

When using --ctc_type warpctc --backend chainer:

{
        "main/loss": 203.49114990234375, 
        "main/loss_ctc": 226.7288055419922, 
        "iteration": 100, 
        "eps": 1e-08, 
        "main/loss_att": 180.25355529785156, 
        "elapsed_time": 124.7856810092926, 
        "epoch": 0, 
        "main/acc": 0.26898226141929626
    }, 
    {
        "main/loss": 183.5249481201172, 
        "main/loss_ctc": 216.84886169433594, 
        "iteration": 200, 
        "eps": 1e-08, 
        "main/loss_att": 150.20103454589844, 
        "elapsed_time": 245.9992392063141, 
        "epoch": 0, 
        "main/acc": 0.3597940504550934
    }

I tried several ways to reproduce your error, but it was unsuccessful. (Using more GPU memory or incorrect number of gpus). Just to outline, I am using Ubuntu 16.04, with CUDA 9.0 and no virtualenv. (<=IMO, this is irrelevant, I also tried with docker but it executed successfully).

The Log does not give so much detail were the error is generated. line 77 runs GPU instruction but not sure if this is related.

My only suggestion is to delete warpctc and chainer ctc and build both of them again, if possible try to copy the logs of the installation.

Top Results From Across the Web

Changing CTC Rules to Reduce Memory Consumption in ...

See how your ASR pipeline can benefit from modifying some of the Connectionist Temporal Classification (CTC) loss rules for training and ...

Breaking down the CTC Loss - Sewade Ogun's Website

In this article, we will breakdown the inner workings of the CTC loss computation using the forward-backward algorithm.

CTCLoss — PyTorch 1.13 documentation

The Connectionist Temporal Classification loss. Calculates loss between a continuous (unsegmented) time series and a target sequence. CTCLoss sums over the ...

Connectionist Temporal Classification Layer - MATLAB Central

CTC Loss calculations for speech recognition neural networks. 0.0. (0). 82 Downloads. Updated 27 Mar 2019. View License.

End-to-End Sentence-Level Multi-View Lipreading ... - MDPI

The CTC loss function parameterizes the distribution of the label token sequence without ... (a–d) Training loss at (a) 0°; (b) 30°; (c)...