Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Incorrect use of torchaudio's rnnt_loss

See original GitHub issue

It is suprising that no one has noticed that the current code is feeding the output of log_softmax to torchaudio’s rnnt_loss, which is not correct.

torchaudio’s rnnt_loss accepts only logits as input.

I am wondering whether anyone has succeeded in training a transducer model with the current usage of torchaudio’s rnnt_loss using speechbrain.

The relevant code is listed below:

https://github.com/speechbrain/speechbrain/blob/24720c1532355dfeeda78304939dcc80283244bf/recipes/CommonVoice/ASR/transducer/train.py#L75

https://github.com/speechbrain/speechbrain/blob/24720c1532355dfeeda78304939dcc80283244bf/recipes/CommonVoice/ASR/transducer/train.py#L136-L138

https://github.com/speechbrain/speechbrain/blob/24720c1532355dfeeda78304939dcc80283244bf/recipes/CommonVoice/ASR/transducer/hparams/train_fr.yaml#L183-L185

https://github.com/speechbrain/speechbrain/blob/24720c1532355dfeeda78304939dcc80283244bf/speechbrain/nnet/losses.py#L59-L77

Issue Analytics

State:
Created a year ago
Reactions:1
Comments:6

Top GitHub Comments

1reaction

vsokolov00commented, Mar 29, 2022

It’s strange but I didn’t notice any improvements during the training process, after fixing this bug it produces even worse results during the validation… I’ll make the update once I decode it on the test set

0reactions

Adel-Moumencommented, Oct 28, 2022

Hello,

This issue has been solved with #1368, so I’m closing this issue. Thanks again for reporting this problem! 😃

Top Results From Across the Web

torchaudio.transformes.RNNTLoss · Issue #2196 · pytorch/audio

Describe the bug. If I don't use the max seq length for logit length, it will cause run time error, input length mismatch....

torchaudio.functional.rnnt_loss - PyTorch

The RNN Transducer loss extends the CTC loss by defining a distribution over output sequences of all lengths, and by jointly modelling both...

Somshubra Majumdar on Twitter: "All this to say, now we have ...

Recently Torchaudio 0.10 added RNNT loss too, so now its a lot more available to users. The days of being limited by the...

torchaudio Changelog - pyup.io

Removed invalid token blanking logic from RNN-T decoder (2180) ... Moved TorchAudio conda package to use pytorch-mutex (1904)

Speech Command Recognition with torchaudio

We use torchaudio to download and represent the dataset. ... 1 pin_memory = True else: num_workers = 0 pin_memory = False train_loader =...

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Incorrect use of torchaudio's rnnt_loss

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

How to load exact pretrained model for the ready-to-use recipe?

I tried to run a librispeech transformer recipe with 8 GPU but a word error rate remains very large.