question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. ItĀ collects links to all the places you might be looking at while hunting down a tough bug.

And, if youā€™re still stuck at the end, weā€™re happy to hop on a call to see how we can help out.

issue while fine tuning the wav2vec model

See original GitHub issue

šŸ› Bug

When i run the fine tuning script on the wav2vec trained model. Iā€™m getting below error everytime.

Traceback (most recent call last):
  File "fairseq_cli/hydra_train.py", line 45, in hydra_main
    distributed_utils.call_main(cfg, pre_main)
  File "/home/robot/airobotics/speech/fairseq/fairseq/distributed/utils.py", line 366, in call_main
    main(cfg, **kwargs)
  File "/home/robot/airobotics/speech/fairseq/fairseq_cli/train.py", line 85, in main
    task.load_dataset(valid_sub_split, combine=False, epoch=1)
  File "/home/robot/airobotics/speech/fairseq/fairseq/tasks/audio_pretraining.py", line 206, in load_dataset
    **self._get_mask_precompute_kwargs(task_cfg),
  File "/home/robot/airobotics/speech/fairseq/fairseq/data/audio/raw_audio_dataset.py", line 256, in __init__
    with open(manifest_path, "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: '/home/robot/dataset/fairseq/libri_speech/audio/dev_other.tsv'

Iā€™m following this documentation to train and fine tune the wav2vec model on librespeech dataset.

To Reproduce

Steps to reproduce the behavior (always include the command you ran): I have generated the manifest and label data according to the instructions providd in that page. But when i run this command: fairseq-hydra-train task.data=/home/robot/dataset/fairseq/libri_speech/audio model.w2v_path=/home/robot/dataset/fairseq/libri_speech/models/wav2vec_small.pt --config-dir config/finetuning/ --config-name base_100h Iā€™m getting the above error.

Environment

fairseq branch: master os: ubuntu 20.04 cuda: nvcc: NVIDIA Ā® Cuda compiler driver Copyright Ā© 2005-2021 NVIDIA Corporation Built on Sun_Feb_14_21:12:58_PST_2021 Cuda compilation tools, release 11.2, V11.2.152 Build cuda_11.2.r11.2/compiler.29618528_0

Additional context

Since the documentation fore the speech recognition with wav2vec is little complex in the readme gude provided. Lots of developers are facing issue and taking longer time to use this framework. So i think it is better if the documentation guide is improved for better understanding and less error.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:5

github_iconTop GitHub Comments

2reactions
dzubkecommented, Apr 8, 2021

If you look at the base_100h config file at examples/wav2vec/configs/fintuning/base_100h.yaml youā€™ll see that the dataset.valid_subset field is set to dev_other which means the script is looking in your data directory (/home/robot/dataset/fairseq/libri_speech/audio/) for the files: dev_other.tsv, dev_other.wrd, and dev_other.ltr (if youā€™re doing character recognition) in the same way as it is looking for your training dataset files.

I donā€™t know how to turn validation off, so youā€™ll have to create a dev/validation set in the same way as you created your training set and put those dev_other files in the data directory. If your validation set is named something else like valid.tsv you can just change the value in the base_100h.yaml file for the dataset.valid_subset field.

0reactions
shuvohishabcommented, Oct 2, 2022

If you look at the base_100h config file at examples/wav2vec/configs/fintuning/base_100h.yaml youā€™ll see that the dataset.valid_subset field is set to dev_other which means the script is looking in your data directory (/home/robot/dataset/fairseq/libri_speech/audio/) for the files: dev_other.tsv, dev_other.wrd, and dev_other.ltr (if youā€™re doing character recognition) in the same way as it is looking for your training dataset files.

I donā€™t know how to turn validation off, so youā€™ll have to create a dev/validation set in the same way as you created your training set and put those dev_other files in the data directory. If your validation set is named something else like valid.tsv you can just change the value in the base_100h.yaml file for the dataset.valid_subset field.

Thank you!! @dzubke

Read more comments on GitHub >

github_iconTop Results From Across the Web

issue while fine tuning the wav2vec model #3409 - GitHub
I'm following this documentation to train and fine tune the wav2vec model on librespeech dataset. To Reproduce. Steps to reproduce the behaviorĀ ...
Read more >
Fine-Tune Wav2Vec2 for English ASR with Transformers
Wav2Vec2 is fine-tuned using Connectionist Temporal ... The model has to learn to predict when a word finished or else the model prediction...
Read more >
fairseq Users | Hello guys, i'm trying to use the wav2vec model ...
Bug When i run the fine tuning script on the wav2vec trained model. I'm getting below error everytime. Traceback (most recent call last):...
Read more >
Fine-tuning Wav2Vec2 with an LM head | TensorFlow Hub
In this notebook, we will load the pre-trained wav2vec2 model from TFHub and will fine-tune it on LibriSpeech dataset by appending Language Modeling...
Read more >
Fine-tune and deploy a Wav2Vec2 model for speech ...
You can fine-tune and optimize all models from Hugging Face, and SageMaker ... During training, it's able to demarcate each character of theĀ ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found