question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

CTC loss was 0 when joint training on Ubuntu

See original GitHub issue

Hi all, I have reproduce the result on Centos7.2+cuda8.0, and I also installed the same code on Ubuntu16.04+cuda9.1, everything was ok during installation. However, actually, attention task was ok, while the ctc task loss was 0 all the time during joint ctc/attention training, “mltalpha=0.5”. Here is the log file:

    {
        "main/loss": 35.740950293540955, 
        "main/loss_ctc": 0.0, 
        "iteration": 100, 
        "eps": 1e-08, 
        "main/loss_att": 71.48190058708191, 
        "elapsed_time": 41.73378896713257, 
        "epoch": 0, 
        "main/acc": 0.11974184565721473
    }, 
    {
        "main/loss": 28.910011546611784, 
        "main/loss_ctc": 0.0, 
        "iteration": 200, 
        "eps": 1e-08, 
        "main/loss_att": 57.82002309322357, 
        "elapsed_time": 81.66535997390747, 
        "epoch": 0, 
        "main/acc": 0.17701563459985487
    }, 

The backend is pytorch and I think it is the problem of warp-ctc on Ubuntu? I have not idea of how to fix it, Please help!

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:11 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
fotwocommented, Jul 4, 2018

Hi, @Fhrozen, @sw005320. I have reinstalled the chainer_ctc and warp-ctc following the last Makefile strictly. And both --ctc_type warpctc --backend chainer and --ctc_type warpctc --backend pytorch work now. During my former installation of warp-ctc, I did not git checkout to the right version, while it still works on my Centos machine. And the former fault on chainer_ctc installation may be using soft link for the warp-ctc which is needed in chainer_ctc/ext.(I do not test chainer_ctc on Centos, but I think it will probably evoke the same errors too.). Sorry for my silly faults. Thanks again!

0reactions
Fhrozencommented, Jul 4, 2018

Just to confirm, I will like to know which recipe are you using. I tested with voxforge. With --backend pytorch --ctc_type warpctc, the ctc is not zero:

{
        "main/loss": 210.09339630126954, 
        "main/loss_ctc": 231.01422760009766, 
        "iteration": 100, 
        "eps": 1e-08, 
        "main/loss_att": 189.17256477355957, 
        "elapsed_time": 100.62332081794739, 
        "epoch": 0, 
        "main/acc": 0.2435916450817478
    }, 
    {
        "main/loss": 187.8331169128418, 
        "main/loss_ctc": 214.7629539489746, 
        "iteration": 200, 
        "eps": 1e-08, 
        "main/loss_att": 160.9032810974121, 
        "elapsed_time": 199.0161759853363, 
        "epoch": 0, 
        "main/acc": 0.3145206625358909
    },

When using --ctc_type warpctc --backend chainer:

{
        "main/loss": 203.49114990234375, 
        "main/loss_ctc": 226.7288055419922, 
        "iteration": 100, 
        "eps": 1e-08, 
        "main/loss_att": 180.25355529785156, 
        "elapsed_time": 124.7856810092926, 
        "epoch": 0, 
        "main/acc": 0.26898226141929626
    }, 
    {
        "main/loss": 183.5249481201172, 
        "main/loss_ctc": 216.84886169433594, 
        "iteration": 200, 
        "eps": 1e-08, 
        "main/loss_att": 150.20103454589844, 
        "elapsed_time": 245.9992392063141, 
        "epoch": 0, 
        "main/acc": 0.3597940504550934
    }

I tried several ways to reproduce your error, but it was unsuccessful. (Using more GPU memory or incorrect number of gpus). Just to outline, I am using Ubuntu 16.04, with CUDA 9.0 and no virtualenv. (<=IMO, this is irrelevant, I also tried with docker but it executed successfully).

The Log does not give so much detail were the error is generated. line 77 runs GPU instruction but not sure if this is related.

My only suggestion is to delete warpctc and chainer ctc and build both of them again, if possible try to copy the logs of the installation.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Changing CTC Rules to Reduce Memory Consumption in ...
See how your ASR pipeline can benefit from modifying some of the Connectionist Temporal Classification (CTC) loss rules for training and ...
Read more >
Breaking down the CTC Loss - Sewade Ogun's Website
In this article, we will breakdown the inner workings of the CTC loss computation using the forward-backward algorithm.
Read more >
CTCLoss — PyTorch 1.13 documentation
The Connectionist Temporal Classification loss. Calculates loss between a continuous (unsegmented) time series and a target sequence. CTCLoss sums over the ...
Read more >
Connectionist Temporal Classification Layer - MATLAB Central
CTC Loss calculations for speech recognition neural networks. 0.0. (0). 82 Downloads. Updated 27 Mar 2019. View License.
Read more >
End-to-End Sentence-Level Multi-View Lipreading ... - MDPI
The CTC loss function parameterizes the distribution of the label token sequence without ... (a–d) Training loss at (a) 0°; (b) 30°; (c)...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found