question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Negative CTC loss while training TFWav2Vec2ForCTC model

See original GitHub issue

System Info

transformers version: 4.21.0.dev0

  • Platform: Linux-5.13.0-48-generic-x86_64-with-glibc2.31
  • Python version: 3.7.13
  • PyTorch version (GPU?): 1.11.0 (False)
  • Tensorflow version (GPU?): 2.9.1 (True)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?: <fill in>
  • Using distributed or parallel set-up in script?: <fill in>

Who can help?

@Rocketknight1 @gante

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, …)
  • My own task or dataset (give details below)

Reproduction

Colab link to reproduce: https://colab.research.google.com/drive/1HXOdDhaIWcLF_4xF-zKZ_gYRf-sMfHkL?usp=sharing

Epoch 1/5
 28/3859 [..............................] - ETA: 47:03 - loss: -0.5141

Expected behavior

The model should train with positive CTC loss. I have been able to figure out the source of the error which is that the target sequence never reaches the model at the forward pass and CTC loss is calculated over empty targets (None).

I have also figured out the solution which is:

add @unpack_inputs here: https://github.com/huggingface/transformers/blob/main/src/transformers/models/wav2vec2/modeling_tf_wav2vec2.py#L1583

However, with this, the CTC loss now gets the targets and calculates the loss but it raises another error:

InvalidArgumentError                      Traceback (most recent call last)
/tmp/ipykernel_33658/3396866883.py in <module>
      3 tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)
      4 
----> 5 model.fit(train, validation_data = validation, epochs=5)

~/anaconda3/envs/gsoc-2/lib/python3.7/site-packages/keras/utils/traceback_utils.py in error_handler(*args, **kwargs)
     65     except Exception as e:  # pylint: disable=broad-except
     66       filtered_tb = _process_traceback_frames(e.__traceback__)
---> 67       raise e.with_traceback(filtered_tb) from None
     68     finally:
     69       del filtered_tb

~/anaconda3/envs/gsoc-2/lib/python3.7/site-packages/transformers/modeling_tf_utils.py in train_step(self, data)
   1024 
   1025             if self._using_dummy_loss:
-> 1026                 loss = self.compiled_loss(y_pred.loss, y_pred.loss, sample_weight, regularization_losses=self.losses)
   1027             else:
   1028                 loss = None

InvalidArgumentError: slice index 0 of dimension 0 out of bounds. [Op:StridedSlice] name: strided_slice/

To solve this I added loss = tf.reshape(loss, (1,)) after CTC loss calculation here: https://github.com/huggingface/transformers/blob/main/src/transformers/models/wav2vec2/modeling_tf_wav2vec2.py#L1707

These solve the error and I can train my model now. I am hoping the changes get pushed to the main branch.

The issue was previously mentioned here: https://github.com/huggingface/transformers/issues/15114 But since @Rocketknight1 mentioned that he is working with loss calculation across HF TF models, I thought I would open a new issue.

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:6 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
Sreyan88commented, Jul 12, 2022

Hi @Rocketknight1 ,

Yes, I am, trying to figure out #18096 though it’s a bit difficult for me as I am a bit new to Keras/Tensorflow. @gante 's suggestion did not work so I am still investigating!

Thank You for the reply!

0reactions
Rocketknight1commented, Jul 12, 2022

Hi @Sreyan88 - I can’t figure out where that error is coming from. In your example scripts above, you’re running everything eagerly, which means that AutoGraph should not be doing anything. I think this is probably related to the issues in #18096, but let me know if you resolve those and this issue is still occurring!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Trying to train the TFWav2Vec2ForCTC model #15114 - GitHub
My main issue is I think I am passing valid inputs but the loss output is negative which is not possible in case...
Read more >
Wav2Vec2 - Hugging Face
Wav2Vec2 model was trained using connectionist temporal classification (CTC) so the model output has to be decoded using Wav2Vec2CTCTokenizer.
Read more >
CTCLoss returns negative loss after some batches
Training seems to work: the loss starts at about 30 for my first input, and then gradually goes down after every batch. But...
Read more >
Why does CTC result in peaky behavior? - arXiv
loss and related training criteria. Our analysis provides a deep understanding why peaky behav- ior occurs and when it is suboptimal.
Read more >
Changing CTC Rules to Reduce Memory Consumption in ...
CTC loss does not tell the model when exactly to emit a prediction. ... can be used in training and decoding and does...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found