Add error message to Wav2Vec2 & Hubert if labels > vocab_size
See original GitHub issue🚀 Feature request
Add better error message to HubertForCTC
, Wav2Vec2ForCTC
if labels are bigger than vocab size.
Motivation
Following this issue: https://github.com/huggingface/transformers/issues/12264 it is clear that an error message should be thrown if any of the any of the labels are > self.config.vocab_size
or else silent errors can sneak into the training script.
So we should modify: Wav2Vec2ForCTC
, TFWav2Vec2ForCTC
, and HubertForCTC
to add a nice error message in this case.
Your contribution
This is a first good issue and should be rather easy to accomplish. I’m happy to give more guidance if needed.
Issue Analytics
- State:
- Created 2 years ago
- Reactions:1
- Comments:5 (3 by maintainers)
Top Results From Across the Web
Wav2Vec2 - Hugging Face
Using just ten minutes of labeled data and pre-training on 53k hours of unlabeled data still achieves 4.8/8.2 WER. This demonstrates the feasibility...
Read more >HuBERT: How to Apply BERT to Speech, Visually Explained
Illustrated architecture and training process of HuBERT, a model for self-supervised speech representation.
Read more >WavLM: Large-Scale Self-Supervised Pre-Training for Full ...
show that speech separation models trained on top of HuBERT ... method [24], our model achieves a 12.6% diarization error rate reduction.
Read more >A Self-supervised Model for Speech Representation Learning
The proposed self-supervised model is trained on 10k hours of unlabeled data ... of the pre-trained neural models wav2vec2, HuBERT and DistilHuBERT on...
Read more >share - DeepAI
A Fine-tuned Wav2vec 2.0/HuBERT Benchmark For Speech Emotion ... that the labelling of data is fairly a time and money costing process.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Thanks for informing. I had seen it, but since the issue is still open, I thought something might be left.
I will create a PR to fix this.