Evaluation of wav2vec2 model all labeled string return "<unk>" value
See original GitHub issueSystem Info
-
transformers
version: 4.22.0.dev0 -
Platform: Linux-5.15.0-48-generic-x86_64-with-glibc2.10
-
Python version: 3.8.8
-
Huggingface_hub version: 0.8.1
-
PyTorch version (GPU?): 1.12.1+cu116 (True)
-
Tensorflow version (GPU?): not installed (NA)
-
Flax version (CPU?/GPU?/TPU?): not installed (NA)
-
Jax version: not installed
-
JaxLib version: not installed
-
Using GPU in script?: Yes
-
Using distributed or parallel set-up in script?: Both have same issue
-
$ pip freeze |grep datasets datasets==2.4.0
Who can help?
@patrickvonplaten @anton-l @sanchit-gandhi
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, β¦) - My own task or dataset (give details below)
Reproduction
Steps to reproduce the issue:
- Download the issue_report folder to your local
- open a command prompt and cd to the issue_report
- run eval cmd: python ctc_finetune.py --eval
- The loss is 1.0086 as the value of label_str always βunkβ, which printed at the line # 566 of ctc_finetune.py
- To re-generate the datase cache files, please run : python customise_dataset.py
Here is the log printed at end of evaluation as following, please see the full_log.log for more details : ***** Running Evaluation ***** Num examples = 91 Batch size = 4 100%|βββββββββββββββββββββββββββββββββββββββββββ| 23/23 [00:03<00:00, 5.54it/s] pred_str[0]: THERE WERE BARRELS OF WINE IN THE SHU CELLOR label_str[0]: <unk><unk><unk><unk><unk> <unk><unk><unk><unk> <unk><unk><unk><unk><unk><unk><unk> <unk><unk> <unk><unk><unk><unk> <unk><unk> <unk><unk><unk> <unk><unk><unk><unk> <unk><unk><unk><unk><unk><unk> 100%|βββββββββββββββββββββββββββββββββββββββββββ| 23/23 [00:03<00:00, 6.12it/s] ***** eval metrics ***** eval_loss = 4704.6416 eval_runtime = 0:00:06.64 eval_samples = 91 eval_samples_per_second = 13.697 eval_steps_per_second = 3.462 eval_wer = 1.0086
Expected behavior
As I use original pre-trained model : facebook/wav2vec2-large-robust-ft-libri-960h for evaluation, the only changes is my customized dataset. I could not figure out where is wrong with my own modified scripts which had just minor change from the official example scripts. So I am not sure my encountered issue whether itβs my scripts issue or the finetune libs issue. Thanks in advance for helping me on this matter.
Issue Analytics
- State:
- Created a year ago
- Comments:5 (3 by maintainers)
Top GitHub Comments
Yeah, that is the root cause. After I changed it to upper case, the issue go aways: Thank you so much for the troubleshoot.
***** Running Evaluation ***** Num examples = 91 Batch size = 4 100%|βββββββββββββββββββββββββββββββββββββββββββ| 23/23 [00:03<00:00, 6.42it/s]pred_str[0]: THERE WERE BARRELS OF WINE IN THE SHU CELLOR label_str[0]: THERE WERE BARRELS OF WINE IN THE HUGE CELLAR 100%|βββββββββββββββββββββββββββββββββββββββββββ| 23/23 [00:03<00:00, 6.68it/s] ***** eval metrics ***** eval_loss = 118.7373 eval_runtime = 0:00:05.39 eval_samples = 91 eval_samples_per_second = 16.856 eval_steps_per_second = 4.26 eval_wer = 0.1228
Ok I think thatβs the issue. Your vocabulary likely only contains upper case letters. The tokenizer doesnβt recognise lower case letters so it uses
<unk>
instead.Try converting your transcription column to upper case and see if that fixes it.