Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

wav2vec 2.0 - provide example dataset line in data prep scripts

See original GitHub issue

🚀 Feature Request

Provide an example line formatted in a way expected by the training/finetuning scripts. Like:

file_a
file_a_line_format

file_b
file_b_line_format

Motivation

Right now you have to reason about the whole data preparation scripts to grasp what is the expected format, and they only work for LibriSpeech anyway: https://github.com/pytorch/fairseq/tree/master/examples/wav2vec. I mean, I still don’t get it what files are necessary to run the fine-tuning and how each file looks like.

Pitch

Adding like 3 lines of documentation in each preprocessing script or README is not something too hard I imagine.

Issue Analytics

State:
Created 3 years ago
Comments:10

Top GitHub Comments

2reactions

kwasnydamcommented, Nov 12, 2020

By the way, I kinda managed to put everything in place and I have working inference and training pipelines, maybe I will put some pull request with the relevant docs that would help the future users. One new thing that bothers me is that the dict.ltr.txt is being looked for in the dataset directory also on the inference, which caused me 1 hour of debugging because my inference directory contained the dict file that was compatible with the previous version of the model, not the one I was testing (the order of labels was wrong). So first of all, during the inference examples/speech_recognition/infer.py, the code should search for the labels file in the --path directory, not in the dataset directory, and also the dict.ltr.txt should be copied to the model save directory right at the beginning of training, as it is inherently a part of the model and it won’t work correctly unless a properly-ordered label file with all the labels is supplied. The second thing is, in case, as mentioned here: #2514, the count does not really matter, then I think it would be more intuitive to just provide the dictionary in ASCII/UTF-8 ordering, that way it will stay the same between different datasets instead of changing order when the counts are different.

As a new researcher in this field, it would be great if you can put everything in one place. I’m struggle with this wav2vec for a couple of weeks now.

Thank you in advance.

Hey, for starters I think the guy in this answer got a lot of things right, especially when it comes to the proper dataset formatting so I think you could try starting here: https://github.com/pytorch/fairseq/issues/2493#issuecomment-719915281

2reactions

kwasnydamcommented, Nov 6, 2020

By the way, I kinda managed to put everything in place and I have working inference and training pipelines, maybe I will put some pull request with the relevant docs that would help the future users.

One new thing that bothers me is that the dict.ltr.txt is being looked for in the dataset directory also on the inference, which caused me 1 hour of debugging because my inference directory contained the dict file that was compatible with the previous version of the model, not the one I was testing (the order of labels was wrong). So first of all, during the inference examples/speech_recognition/infer.py, the code should search for the labels file in the --path directory, not in the dataset directory, and also the dict.ltr.txt should be copied to the model save directory right at the beginning of training, as it is inherently a part of the model and it won’t work correctly unless a properly-ordered label file with all the labels is supplied. The second thing is, in case, as mentioned here: https://github.com/pytorch/fairseq/issues/2514, the count does not really matter, then I think it would be more intuitive to just provide the dictionary in ASCII/UTF-8 ordering, that way it will stay the same between different datasets instead of changing order when the counts are different.

Top Results From Across the Web

wav2vec 2.0 - provide example dataset line in data prep scripts

Provide an example line formatted in a way expected by the training/finetuning scripts. Like: file_a file_a_line_format file_b ...

Fine-tuning Wav2Vec2 with an LM head | TensorFlow Hub

In this notebook, we will load the pre-trained wav2vec2 model from TFHub and will fine-tune it on LibriSpeech dataset by appending Language Modeling...

Speech to Text with Wav2Vec 2.0 - KDnuggets

In this blog, we see how to convert speech into text using Facebook Wav2Vec 2.0 model. Facebook recently introduced and open-sourced their new ......

Speech Recognition with Wav2Vec2 - PyTorch

This tutorial shows how to perform speech recognition using pre-trained models from wav2vec 2.0 [paper]. Overview. The process of speech recognition looks like ......

arXiv:2211.17196v1 [cs.CL] 30 Nov 2022

a limited data preparation effort (less than 20 lines of code for a ... with the wav2vec-U baseline and EURO wav2vec 2.0, it...