Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Evaluation of wav2vec2 model all labeled string return "<unk>" value

See original GitHub issue

System Info

transformers version: 4.22.0.dev0
Platform: Linux-5.15.0-48-generic-x86_64-with-glibc2.10
Python version: 3.8.8
Huggingface_hub version: 0.8.1
PyTorch version (GPU?): 1.12.1+cu116 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?: Yes
Using distributed or parallel set-up in script?: Both have same issue
$ pip freeze |grep datasets datasets==2.4.0

Who can help?

@patrickvonplaten @anton-l @sanchit-gandhi

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, …)
My own task or dataset (give details below)

Reproduction

Steps to reproduce the issue:

Download the issue_report folder to your local
open a command prompt and cd to the issue_report
run eval cmd: python ctc_finetune.py --eval
The loss is 1.0086 as the value of label_str always “unk”, which printed at the line # 566 of ctc_finetune.py
To re-generate the datase cache files, please run : python customise_dataset.py

Here is the log printed at end of evaluation as following, please see the full_log.log for more details : ***** Running Evaluation ***** Num examples = 91 Batch size = 4 100%|███████████████████████████████████████████| 23/23 [00:03<00:00, 5.54it/s] pred_str[0]: THERE WERE BARRELS OF WINE IN THE SHU CELLOR label_str[0]: <unk><unk><unk><unk><unk> <unk><unk><unk><unk> <unk><unk><unk><unk><unk><unk><unk> <unk><unk> <unk><unk><unk><unk> <unk><unk> <unk><unk><unk> <unk><unk><unk><unk> <unk><unk><unk><unk><unk><unk> 100%|███████████████████████████████████████████| 23/23 [00:03<00:00, 6.12it/s] ***** eval metrics ***** eval_loss = 4704.6416 eval_runtime = 0:00:06.64 eval_samples = 91 eval_samples_per_second = 13.697 eval_steps_per_second = 3.462 eval_wer = 1.0086

Expected behavior

As I use original pre-trained model : facebook/wav2vec2-large-robust-ft-libri-960h for evaluation, the only changes is my customized dataset. I could not figure out where is wrong with my own modified scripts which had just minor change from the official example scripts. So I am not sure my encountered issue whether it’s my scripts issue or the finetune libs issue. Thanks in advance for helping me on this matter.

Issue Analytics

State:
Created a year ago
Comments:5 (3 by maintainers)

Top GitHub Comments

1reaction

lgq-liaocommented, Sep 27, 2022

Ok I think that’s the issue. Your vocabulary likely only contains upper case letters. The tokenizer doesn’t recognise lower case letters so it uses <unk> instead.

Try converting your transcription column to upper case and see if that fixes it.

Yeah, that is the root cause. After I changed it to upper case, the issue go aways: Thank you so much for the troubleshoot.

***** Running Evaluation ***** Num examples = 91 Batch size = 4 100%|███████████████████████████████████████████| 23/23 [00:03<00:00, 6.42it/s]pred_str[0]: THERE WERE BARRELS OF WINE IN THE SHU CELLOR label_str[0]: THERE WERE BARRELS OF WINE IN THE HUGE CELLAR 100%|███████████████████████████████████████████| 23/23 [00:03<00:00, 6.68it/s] ***** eval metrics ***** eval_loss = 118.7373 eval_runtime = 0:00:05.39 eval_samples = 91 eval_samples_per_second = 16.856 eval_steps_per_second = 4.26 eval_wer = 0.1228

0reactions

OllieBroadhurstcommented, Sep 26, 2022

Ok I think that’s the issue. Your vocabulary likely only contains upper case letters. The tokenizer doesn’t recognise lower case letters so it uses <unk> instead.

Try converting your transcription column to upper case and see if that fixes it.

Top Results From Across the Web

Wav2Vec2 - Hugging Face

This demonstrates the feasibility of speech recognition with limited amounts of labeled data. Tips: Wav2Vec2 is a speech model that accepts a float...

Improving Mispronunciation Detection with Wav2vec2-based ...

In this work, we leverage unlabeled L2 speech via a pseudo-labeling (PL) procedure and extend the fine-tuning approach based on pre-trained self ...

Wave2Vec2.0 fine-tuning english - Kaggle

We write a mapping function that concatenates all transcriptions into one long transcription and then transforms the string into a set of chars....

torchaudio.pipelines - wav2vec 2.0 / HuBERT - PyTorch

(default: '<unk>' ). Returns. For models fine-tuned on ASR, returns the tuple of strings representing the output class labels. Return type. Tuple[str].

Domain Adaptation with N-gram Language Models for ... - kth .diva

The 4-gram models are evaluated by the error rate (ERR) of recognizing in-domain words in a ... However, a few Swedish Wav2vec2 models...