question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to use pretrained wav2vec encoder for finetuning transformer based ASR?

See original GitHub issue

Hi

I am trying to use a pretrained wav2vec model present in my local machine for asr training (no existing w2v_url). I am not sure how to change the config yaml file.

For example, this is default encoder conf in train_asr_transformer3_w2v_large_lv60_960h_finetuning_last_1layer.yaml right now

encoder_conf:    
    output_size: 1024    
    normalize_before: false    
    freeze_finetune_updates: 100000    
    w2v_url: https://dl.fbaipublicfiles.com/fairseq/wav2vec/wav2vec2_vox_960h_new.pt    
    w2v_dir_path: ./downloads/wav2vec_pretrained_models

my pretrained model file is checkpoint_best.pt and it is located in ./pretrained_wav2vec_model/

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:8 (1 by maintainers)

github_iconTop GitHub Comments

1reaction
nellorebhanutejacommented, May 25, 2021

@simpleoier thanks for the reply

I noticed, the error is coming from the file fairseq/checkpoint_utils.py installed in conda environment. args is getting assigned None.

However, this error is now solved after updating pytorch version.

Earlier pytorch version: 1.6.0 Current version: 1.8.1

Everything’s now working fine! Thank you!

1reaction
simpleoiercommented, May 25, 2021

@nellorebhanuteja Yes, I tried it and it works. Here is the output log, showing that the model is loaded.

(asr:360) INFO: Vocabulary size: 5000
(filelock:274) INFO: Lock 46999932003984 acquired on ./downloads_try/wav2vec_pretrained_models/wav2vec2.0_base_960h_finetuning/wav2vec_small_960h.pt.lock
INFO: Wav2Vec model ./downloads_try/wav2vec_pretrained_models/wav2vec2.0_base_960h_finetuning/wav2vec_small_960h.pt already exists.
INFO: Lock 46999932003984 released on ./downloads_try/wav2vec_pretrained_models/wav2vec2.0_base_960h_finetuning/wav2vec_small_960h.pt.lock
(wav2vec2_encoder:145) INFO: Pretrained Wav2Vec model parameters reloaded!
(abs_task:1090) INFO: Setting encoder.encoders.mask_emb.requires_grad = False
...

Here is the config

encoder: wav2vec2
encoder_conf:
    output_size: 512
    normalize_before: true
    freeze_finetune_updates: 1000000
    w2v_url: https://downloads_try/wav2vec_pretrained_models/wav2vec2.0_base_960h_finetuning/wav2vec_small_960h.pt
    w2v_dir_path: ./downloads_try/wav2vec_pretrained_models/wav2vec2.0_base_960h_finetuning

Here is the stuff in download dir path

$ ls downloads_try/wav2vec_pretrained_models/wav2vec2.0_base_960h_finetuning 
dict.ltr.txt  wav2vec_small_960h.pt  wav2vec_small_960h.pt.lock

AttributeError: ‘NoneType’ object has no attribute ‘data’

The error information you provided is very limited. Can you make sure if the model file is correctly passed to fairseq model loader? What exactly is the NoneType referring to?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Fine-Tune Wav2Vec2 for English ASR with Transformers
Using a novel contrastive pretraining objective, Wav2Vec2 learns powerful speech representations from more than 50.000 hours of unlabeled speech ...
Read more >
Self-training and pre-training, understanding the wav2vec ...
Solution: Lower the dimensionality of the speech sample through an “encoder network”, and then use a context network to predict the next ...
Read more >
Fine-tune and deploy a Wav2Vec2 model for speech ...
This post shows how to use SageMaker to easily fine-tune the latest ... Wav2Vec2 is a transformer-based architecture for ASR tasks and was ......
Read more >
Efficiently Fusing Pretrained Acoustic and Linguistic Encoders ...
In this work, we fuse a pre-trained acous- tic encoder (wav2vec2.0) and a pre-trained linguistic encoder. (BERT) into an end-to-end ASR ...
Read more >
Pre-Training Transformer Decoder for End-to-End ASR Model ...
In the fine-tuning stage, we initialize the ASR model with pre-trained ... Based on HuBERT encoder, our proposed Speech2C model can also pre-train...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found