wav2vec 2.0 - provide example dataset line in data prep scripts
See original GitHub issue🚀 Feature Request
Provide an example line formatted in a way expected by the training/finetuning scripts. Like:
file_a
file_a_line_format
file_b
file_b_line_format
Motivation
Right now you have to reason about the whole data preparation scripts to grasp what is the expected format, and they only work for LibriSpeech anyway: https://github.com/pytorch/fairseq/tree/master/examples/wav2vec. I mean, I still don’t get it what files are necessary to run the fine-tuning and how each file looks like.
Pitch
Adding like 3 lines of documentation in each preprocessing script or README is not something too hard I imagine.
Issue Analytics
- State:
- Created 3 years ago
- Comments:10
Top Results From Across the Web
wav2vec 2.0 - provide example dataset line in data prep scripts
Provide an example line formatted in a way expected by the training/finetuning scripts. Like: file_a file_a_line_format file_b ...
Read more >Fine-tuning Wav2Vec2 with an LM head | TensorFlow Hub
In this notebook, we will load the pre-trained wav2vec2 model from TFHub and will fine-tune it on LibriSpeech dataset by appending Language Modeling...
Read more >Speech to Text with Wav2Vec 2.0 - KDnuggets
In this blog, we see how to convert speech into text using Facebook Wav2Vec 2.0 model. Facebook recently introduced and open-sourced their new ......
Read more >Speech Recognition with Wav2Vec2 - PyTorch
This tutorial shows how to perform speech recognition using pre-trained models from wav2vec 2.0 [paper]. Overview. The process of speech recognition looks like ......
Read more >arXiv:2211.17196v1 [cs.CL] 30 Nov 2022
a limited data preparation effort (less than 20 lines of code for a ... with the wav2vec-U baseline and EURO wav2vec 2.0, it...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hey, for starters I think the guy in this answer got a lot of things right, especially when it comes to the proper dataset formatting so I think you could try starting here: https://github.com/pytorch/fairseq/issues/2493#issuecomment-719915281
By the way, I kinda managed to put everything in place and I have working inference and training pipelines, maybe I will put some pull request with the relevant docs that would help the future users.
One new thing that bothers me is that the
dict.ltr.txt
is being looked for in the dataset directory also on the inference, which caused me 1 hour of debugging because my inference directory contained the dict file that was compatible with the previous version of the model, not the one I was testing (the order of labels was wrong). So first of all, during the inferenceexamples/speech_recognition/infer.py
, the code should search for the labels file in the--path
directory, not in the dataset directory, and also thedict.ltr.txt
should be copied to the model save directory right at the beginning of training, as it is inherently a part of the model and it won’t work correctly unless a properly-ordered label file with all the labels is supplied. The second thing is, in case, as mentioned here: https://github.com/pytorch/fairseq/issues/2514, the count does not really matter, then I think it would be more intuitive to just provide the dictionary in ASCII/UTF-8 ordering, that way it will stay the same between different datasets instead of changing order when the counts are different.