question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Unable to load Wav2Vec 2.0 models - wav2vec2_vox_960h_new.pt

See original GitHub issue

🐛 Bug

Hello.

Firstly , thank you for sharing all of the work and results and code. Its no small task.

I am attempting to load wav2vec2_vox_960h_new.pt but am getting the following errors:

TypeError: object of type 'NoneType' has no len()

after calling

model, cfg, task = fairseq.checkpoint_utils.load_model_ensemble_and_task(['wav2vec2_vox_960h_new.pt'])

To Reproduce

install torch for cuda 11.6 via website docs:

conda install pytorch torchvision torchaudio cudatoolkit=11.6 -c pytorch -c conda-forge

install dev fairseq:

git clone https://github.com/pytorch/fairseq
cd fairseq
pip install --editable ./

in python notebook or wherever:

import torch
import fairseq

print(torch.__version__)
print(fairseq.__version__)
# I see 
# 1.12.1
# 0.12.2

use_cuda = torch.cuda.is_available()

print(use_cuda)
# True for me

# load model

model, cfg, task = fairseq.checkpoint_utils.load_model_ensemble_and_task(['wav2vec2_vox_960h_new.pt'])

I am then greeted with the following error

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Input In [5], in <cell line: 1>()
----> 1 model, cfg, task = fairseq.checkpoint_utils.load_model_ensemble_and_task(['wav2vec2_vox_960h_new.pt'])

File ~/miniconda3/envs/pyav-wav2vec/lib/python3.9/site-packages/fairseq/checkpoint_utils.py:473, in load_model_ensemble_and_task(filenames, arg_overrides, task, strict, suffix, num_shards, state)
    471 argspec = inspect.getfullargspec(task.build_model)
    472 if "from_checkpoint" in argspec.args:
--> 473     model = task.build_model(cfg.model, from_checkpoint=True)
    474 else:
    475     model = task.build_model(cfg.model)

File ~/miniconda3/envs/pyav-wav2vec/lib/python3.9/site-packages/fairseq/tasks/audio_pretraining.py:197, in AudioPretrainingTask.build_model(self, model_cfg, from_checkpoint)
    196 def build_model(self, model_cfg: FairseqDataclass, from_checkpoint=False):
--> 197     model = super().build_model(model_cfg, from_checkpoint)
    199     actualized_cfg = getattr(model, "cfg", None)
    200     if actualized_cfg is not None:
    201         # if "w2v_args" in actualized_cfg:

File ~/miniconda3/envs/pyav-wav2vec/lib/python3.9/site-packages/fairseq/tasks/fairseq_task.py:338, in FairseqTask.build_model(self, cfg, from_checkpoint)
    326 """
    327 Build the :class:`~fairseq.models.BaseFairseqModel` instance for this
    328 task.
   (...)
    334     a :class:`~fairseq.models.BaseFairseqModel` instance
    335 """
    336 from fairseq import models, quantization_utils
--> 338 model = models.build_model(cfg, self, from_checkpoint)
    339 model = quantization_utils.quantize_model_scalar(model, cfg)
    340 return model

File ~/miniconda3/envs/pyav-wav2vec/lib/python3.9/site-packages/fairseq/models/__init__.py:106, in build_model(cfg, task, from_checkpoint)
     98             ARCH_CONFIG_REGISTRY[model_type](cfg)
    100 assert model is not None, (
    101     f"Could not infer model type from {cfg}. "
    102     "Available models: {}".format(MODEL_DATACLASS_REGISTRY.keys())
    103     + f" Requested model type: {model_type}"
    104 )
--> 106 return model.build_model(cfg, task)

File ~/miniconda3/envs/pyav-wav2vec/lib/python3.9/site-packages/fairseq/models/wav2vec/wav2vec2_asr.py:208, in Wav2VecCtc.build_model(cls, cfg, task)
    205 @classmethod
    206 def build_model(cls, cfg: Wav2Vec2CtcConfig, task: FairseqTask):
    207     """Build a new model instance."""
--> 208     w2v_encoder = Wav2VecEncoder(cfg, len(task.target_dictionary))
    209     return cls(cfg, w2v_encoder)

TypeError: object of type 'NoneType' has no len()

Code sample

See above

Expected behavior

a properly loaded model.

Environment

  • fairseq Version 0.12.2

  • PyTorch Version 1.12.1

  • OS (e.g., Linux): Linux frank-exchange-of-views 5.15.0-43-generic #46~20.04.1-Ubuntu SMP Thu Jul 14 15:20:17 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

  • How you installed fairseq (pip, source): pip install --editable ./

  • Build command you used (if compiling from source):

  • Python version: 3.8.10

  • CUDA/cuDNN version: 11.6 / 510.85.02

  • GPU models and configuration: 2x 3090

  • Any other relevant information:

It seems almost all wav2vec2 models dont load properly. Ive tried a variety of calls, and looked through git. Documentation for properly loading these models is sorely lacking

I understanding HuggingFace Transformers may be the preferred way these days to use these models, but it seems very odd to me that there’s such a variety of model loading methods, quirks, and special sauce - none of which seems properly documented, reproducible or available.

Is there a resource that I perhaps have missed that properly documents how to use these models?

Thank you in advance

Issue Analytics

  • State:open
  • Created a year ago
  • Reactions:3
  • Comments:5

github_iconTop GitHub Comments

3reactions
SeunghyunSEOcommented, Sep 15, 2022

This is because someone refactored code. wav2vec2_vox_960h_new.pt is originally trained with audio_pretraining task but in the latest fairseq version, audio_finetuning task is needed to load model. if u try to use audio_pretraining to load finetuned model, error occurs because there are no ctc projection head for contrastive learning

so follow the below lines

firstly, download the model

mkdir -p /tmp_w2v2
cd /tmp_w2v2
wget https://dl.fbaipublicfiles.com/fairseq/wav2vec/wav2vec2_vox_960h_new.pt

now we try to load with audio_pretraining task,

import torch
import fairseq

model_path='/tmp_w2v2/wav2vec2_vox_960h_new.pt'
model, cfg, task = fairseq.checkpoint_utils.load_model_ensemble_and_task([model_path])

we got familiar error

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/workspace/fairseq/fairseq/checkpoint_utils.py", line 486, in load_model_ensemble_and_task
    model = task.build_model(cfg.model, from_checkpoint=True)
  File "/workspace/fairseq/fairseq/tasks/audio_pretraining.py", line 218, in build_model
    model = super().build_model(model_cfg, from_checkpoint)
  File "/workspace/fairseq/fairseq/tasks/fairseq_task.py", line 340, in build_model
    model = models.build_model(cfg, self, from_checkpoint)
  File "/workspace/fairseq/fairseq/models/__init__.py", line 111, in build_model
    return model.build_model(cfg, task)
  File "/workspace/fairseq/fairseq/models/wav2vec/wav2vec2_asr.py", line 242, in build_model
    w2v_encoder = Wav2VecEncoder(cfg, len(task.target_dictionary))
TypeError: object of type 'NoneType' has no len()

you need to load model with overriding like below (before using code, you need to dictionary for wav2vec2_vox_960h_new.pt, dict file should look like below)

import os
import torch
import fairseq

model_path='/tmp_w2v2/wav2vec2_vox_960h_new.pt'
path, checkpoint = os.path.split(model_path)

# overrides with audio_finetuning task
overrides = {
    "task": 'audio_finetuning',
    "data": path,
}
models, cfg, task = fairseq.checkpoint_utils.load_model_ensemble_and_task(
    fairseq.utils.split_paths(checkpoint, separator="\\"),
    arg_overrides=overrides,
    strict=True,
)
model = models[0]

you need dict.ltr.txt for wav2vec2_vox_960h_new.pt. you can get dict.ltr.txt easily if you follow the wav2vec2 training guide

(py38) root@557bec2a5c9d:/tmp_w2v2# pwd && ls
/tmp_w2v2
dict.ltr.txt  wav2vec2_vox_960h_new.pt
| 94802
E 51860
T 38431
A 33152
O 31495
N 28855
I 28794
H 27187
S 26071
R 23546
D 18289
L 16308
U 12400
M 10685
W 10317
C 9844
F 9062
G 8924
Y 8226
P 6890
B 6339
V 3936
K 3456
' 1023
X 636
J 598
Q 437
Z 213

or you can my preprocessed dict, dict.ltr.txt

now you successfully loaded model. try to inference using toy net input

use_fp16 = cfg.common.fp16
use_cuda = torch.cuda.is_available()

if use_cuda : model.cuda()
if use_fp16 : model.half()
model.eval()

toy_net_input = {
    "source" : torch.FloatTensor(1,150000),
    "padding_mask" : None
}

def apply_half(t):
    if t.dtype is torch.float32:
        return t.to(dtype=torch.half)
    return t

if use_fp16:
    toy_net_input = fairseq.utils.apply_to_sample(apply_half, toy_net_input)
if use_cuda:
    toy_net_input = fairseq.utils.move_to_cuda(toy_net_input)

toy_net_output = model(**toy_net_input)
>>> toy_net_output['encoder_out'].size()
torch.Size([468, 1, 32])
1reaction
vadecommented, Sep 21, 2022

Thank you @SeunghyunSEO - will try that shortly. Much obliged.

Read more comments on GitHub >

github_iconTop Results From Across the Web

facebook/wav2vec2-large-960h - Hugging Face
When lowering the amount of labeled data to one hour, wav2vec 2.0 ... from datasets import load_dataset import torch # load model and ......
Read more >
fairseq - PyPI
Fairseq(-py) is a sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language modeling ...
Read more >
Wave2Vec - ASR-WOLOF - Kaggle
In this notebook we are going to see how to convert speech into text using Facebook Wav2Vec 2.0 model.Wav2Vec2 is a speech model...
Read more >
Speech Recognition with Wav2Vec2 — Torchaudio ... - PyTorch
... speech recognition using using pre-trained models from wav2vec 2.0 [paper]. ... import IPython import matplotlib.pyplot as plt from torchaudio.utils ...
Read more >
Return predictions wav2vec fairseq - python - Stack Overflow
After trying various things I was able to figure this out and trained a wav2vec model from scratch. Some background: wav2vec uses ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found