Negative CTC loss while training TFWav2Vec2ForCTC model
See original GitHub issueSystem Info
transformers
version: 4.21.0.dev0
- Platform: Linux-5.13.0-48-generic-x86_64-with-glibc2.31
- Python version: 3.7.13
- PyTorch version (GPU?): 1.11.0 (False)
- Tensorflow version (GPU?): 2.9.1 (True)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: <fill in>
- Using distributed or parallel set-up in script?: <fill in>
Who can help?
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, …) - My own task or dataset (give details below)
Reproduction
Colab link to reproduce: https://colab.research.google.com/drive/1HXOdDhaIWcLF_4xF-zKZ_gYRf-sMfHkL?usp=sharing
Epoch 1/5
28/3859 [..............................] - ETA: 47:03 - loss: -0.5141
Expected behavior
The model should train with positive CTC loss. I have been able to figure out the source of the error which is that the target sequence never reaches the model at the forward pass and CTC loss is calculated over empty targets (None).
I have also figured out the solution which is:
add @unpack_inputs
here:
https://github.com/huggingface/transformers/blob/main/src/transformers/models/wav2vec2/modeling_tf_wav2vec2.py#L1583
However, with this, the CTC loss now gets the targets and calculates the loss but it raises another error:
InvalidArgumentError Traceback (most recent call last)
/tmp/ipykernel_33658/3396866883.py in <module>
3 tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)
4
----> 5 model.fit(train, validation_data = validation, epochs=5)
~/anaconda3/envs/gsoc-2/lib/python3.7/site-packages/keras/utils/traceback_utils.py in error_handler(*args, **kwargs)
65 except Exception as e: # pylint: disable=broad-except
66 filtered_tb = _process_traceback_frames(e.__traceback__)
---> 67 raise e.with_traceback(filtered_tb) from None
68 finally:
69 del filtered_tb
~/anaconda3/envs/gsoc-2/lib/python3.7/site-packages/transformers/modeling_tf_utils.py in train_step(self, data)
1024
1025 if self._using_dummy_loss:
-> 1026 loss = self.compiled_loss(y_pred.loss, y_pred.loss, sample_weight, regularization_losses=self.losses)
1027 else:
1028 loss = None
InvalidArgumentError: slice index 0 of dimension 0 out of bounds. [Op:StridedSlice] name: strided_slice/
To solve this I added loss = tf.reshape(loss, (1,))
after CTC loss calculation here:
https://github.com/huggingface/transformers/blob/main/src/transformers/models/wav2vec2/modeling_tf_wav2vec2.py#L1707
These solve the error and I can train my model now. I am hoping the changes get pushed to the main branch.
The issue was previously mentioned here: https://github.com/huggingface/transformers/issues/15114 But since @Rocketknight1 mentioned that he is working with loss calculation across HF TF models, I thought I would open a new issue.
Issue Analytics
- State:
- Created a year ago
- Comments:6 (6 by maintainers)
Top GitHub Comments
Hi @Rocketknight1 ,
Yes, I am, trying to figure out #18096 though it’s a bit difficult for me as I am a bit new to Keras/Tensorflow. @gante 's suggestion did not work so I am still investigating!
Thank You for the reply!
Hi @Sreyan88 - I can’t figure out where that error is coming from. In your example scripts above, you’re running everything eagerly, which means that AutoGraph should not be doing anything. I think this is probably related to the issues in #18096, but let me know if you resolve those and this issue is still occurring!