Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Unable to load Wav2Vec 2.0 models - wav2vec2_vox_960h_new.pt

See original GitHub issue

🐛 Bug

Hello.

Firstly , thank you for sharing all of the work and results and code. Its no small task.

I am attempting to load wav2vec2_vox_960h_new.pt but am getting the following errors:

TypeError: object of type 'NoneType' has no len()

after calling

model, cfg, task = fairseq.checkpoint_utils.load_model_ensemble_and_task(['wav2vec2_vox_960h_new.pt'])

To Reproduce

install torch for cuda 11.6 via website docs:

conda install pytorch torchvision torchaudio cudatoolkit=11.6 -c pytorch -c conda-forge

install dev fairseq:

git clone https://github.com/pytorch/fairseq
cd fairseq
pip install --editable ./

in python notebook or wherever:

import torch
import fairseq

print(torch.__version__)
print(fairseq.__version__)
# I see 
# 1.12.1
# 0.12.2

use_cuda = torch.cuda.is_available()

print(use_cuda)
# True for me

# load model

model, cfg, task = fairseq.checkpoint_utils.load_model_ensemble_and_task(['wav2vec2_vox_960h_new.pt'])

I am then greeted with the following error

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Input In [5], in <cell line: 1>()
----> 1 model, cfg, task = fairseq.checkpoint_utils.load_model_ensemble_and_task(['wav2vec2_vox_960h_new.pt'])

File ~/miniconda3/envs/pyav-wav2vec/lib/python3.9/site-packages/fairseq/checkpoint_utils.py:473, in load_model_ensemble_and_task(filenames, arg_overrides, task, strict, suffix, num_shards, state)
    471 argspec = inspect.getfullargspec(task.build_model)
    472 if "from_checkpoint" in argspec.args:
--> 473     model = task.build_model(cfg.model, from_checkpoint=True)
    474 else:
    475     model = task.build_model(cfg.model)

File ~/miniconda3/envs/pyav-wav2vec/lib/python3.9/site-packages/fairseq/tasks/audio_pretraining.py:197, in AudioPretrainingTask.build_model(self, model_cfg, from_checkpoint)
    196 def build_model(self, model_cfg: FairseqDataclass, from_checkpoint=False):
--> 197     model = super().build_model(model_cfg, from_checkpoint)
    199     actualized_cfg = getattr(model, "cfg", None)
    200     if actualized_cfg is not None:
    201         # if "w2v_args" in actualized_cfg:

File ~/miniconda3/envs/pyav-wav2vec/lib/python3.9/site-packages/fairseq/tasks/fairseq_task.py:338, in FairseqTask.build_model(self, cfg, from_checkpoint)
    326 """
    327 Build the :class:`~fairseq.models.BaseFairseqModel` instance for this
    328 task.
   (...)
    334     a :class:`~fairseq.models.BaseFairseqModel` instance
    335 """
    336 from fairseq import models, quantization_utils
--> 338 model = models.build_model(cfg, self, from_checkpoint)
    339 model = quantization_utils.quantize_model_scalar(model, cfg)
    340 return model

File ~/miniconda3/envs/pyav-wav2vec/lib/python3.9/site-packages/fairseq/models/__init__.py:106, in build_model(cfg, task, from_checkpoint)
     98             ARCH_CONFIG_REGISTRY[model_type](cfg)
    100 assert model is not None, (
    101     f"Could not infer model type from {cfg}. "
    102     "Available models: {}".format(MODEL_DATACLASS_REGISTRY.keys())
    103     + f" Requested model type: {model_type}"
    104 )
--> 106 return model.build_model(cfg, task)

File ~/miniconda3/envs/pyav-wav2vec/lib/python3.9/site-packages/fairseq/models/wav2vec/wav2vec2_asr.py:208, in Wav2VecCtc.build_model(cls, cfg, task)
    205 @classmethod
    206 def build_model(cls, cfg: Wav2Vec2CtcConfig, task: FairseqTask):
    207     """Build a new model instance."""
--> 208     w2v_encoder = Wav2VecEncoder(cfg, len(task.target_dictionary))
    209     return cls(cfg, w2v_encoder)

TypeError: object of type 'NoneType' has no len()

Code sample

See above

Expected behavior

a properly loaded model.

Environment

fairseq Version 0.12.2
PyTorch Version 1.12.1
OS (e.g., Linux): Linux frank-exchange-of-views 5.15.0-43-generic #46~20.04.1-Ubuntu SMP Thu Jul 14 15:20:17 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
How you installed fairseq (pip, source): pip install --editable ./
Build command you used (if compiling from source):
Python version: 3.8.10
CUDA/cuDNN version: 11.6 / 510.85.02
GPU models and configuration: 2x 3090
Any other relevant information:

It seems almost all wav2vec2 models dont load properly. Ive tried a variety of calls, and looked through git. Documentation for properly loading these models is sorely lacking

I understanding HuggingFace Transformers may be the preferred way these days to use these models, but it seems very odd to me that there’s such a variety of model loading methods, quirks, and special sauce - none of which seems properly documented, reproducible or available.

Is there a resource that I perhaps have missed that properly documents how to use these models?

Thank you in advance

Issue Analytics

State:
Created a year ago
Reactions:3
Comments:5

Top GitHub Comments

3reactions

SeunghyunSEOcommented, Sep 15, 2022

This is because someone refactored code. wav2vec2_vox_960h_new.pt is originally trained with audio_pretraining task but in the latest fairseq version, audio_finetuning task is needed to load model. if u try to use audio_pretraining to load finetuned model, error occurs because there are no ctc projection head for contrastive learning

so follow the below lines

firstly, download the model

mkdir -p /tmp_w2v2
cd /tmp_w2v2
wget https://dl.fbaipublicfiles.com/fairseq/wav2vec/wav2vec2_vox_960h_new.pt

now we try to load with audio_pretraining task,

import torch
import fairseq

model_path='/tmp_w2v2/wav2vec2_vox_960h_new.pt'
model, cfg, task = fairseq.checkpoint_utils.load_model_ensemble_and_task([model_path])

we got familiar error

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/workspace/fairseq/fairseq/checkpoint_utils.py", line 486, in load_model_ensemble_and_task
    model = task.build_model(cfg.model, from_checkpoint=True)
  File "/workspace/fairseq/fairseq/tasks/audio_pretraining.py", line 218, in build_model
    model = super().build_model(model_cfg, from_checkpoint)
  File "/workspace/fairseq/fairseq/tasks/fairseq_task.py", line 340, in build_model
    model = models.build_model(cfg, self, from_checkpoint)
  File "/workspace/fairseq/fairseq/models/__init__.py", line 111, in build_model
    return model.build_model(cfg, task)
  File "/workspace/fairseq/fairseq/models/wav2vec/wav2vec2_asr.py", line 242, in build_model
    w2v_encoder = Wav2VecEncoder(cfg, len(task.target_dictionary))
TypeError: object of type 'NoneType' has no len()

you need to load model with overriding like below (before using code, you need to dictionary for wav2vec2_vox_960h_new.pt, dict file should look like below)

import os
import torch
import fairseq

model_path='/tmp_w2v2/wav2vec2_vox_960h_new.pt'
path, checkpoint = os.path.split(model_path)

# overrides with audio_finetuning task
overrides = {
    "task": 'audio_finetuning',
    "data": path,
}
models, cfg, task = fairseq.checkpoint_utils.load_model_ensemble_and_task(
    fairseq.utils.split_paths(checkpoint, separator="\\"),
    arg_overrides=overrides,
    strict=True,
)
model = models[0]

you need dict.ltr.txt for wav2vec2_vox_960h_new.pt. you can get dict.ltr.txt easily if you follow the wav2vec2 training guide

(py38) root@557bec2a5c9d:/tmp_w2v2# pwd && ls
/tmp_w2v2
dict.ltr.txt  wav2vec2_vox_960h_new.pt

or you can my preprocessed dict, dict.ltr.txt

now you successfully loaded model. try to inference using toy net input

use_fp16 = cfg.common.fp16
use_cuda = torch.cuda.is_available()

if use_cuda : model.cuda()
if use_fp16 : model.half()
model.eval()

toy_net_input = {
    "source" : torch.FloatTensor(1,150000),
    "padding_mask" : None
}

def apply_half(t):
    if t.dtype is torch.float32:
        return t.to(dtype=torch.half)
    return t

if use_fp16:
    toy_net_input = fairseq.utils.apply_to_sample(apply_half, toy_net_input)
if use_cuda:
    toy_net_input = fairseq.utils.move_to_cuda(toy_net_input)

toy_net_output = model(**toy_net_input)

>>> toy_net_output['encoder_out'].size()
torch.Size([468, 1, 32])

1reaction

vadecommented, Sep 21, 2022

Thank you @SeunghyunSEO - will try that shortly. Much obliged.

Top Results From Across the Web

facebook/wav2vec2-large-960h - Hugging Face

When lowering the amount of labeled data to one hour, wav2vec 2.0 ... from datasets import load_dataset import torch # load model and ......

fairseq - PyPI

Fairseq(-py) is a sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language modeling ...

Wave2Vec - ASR-WOLOF - Kaggle

In this notebook we are going to see how to convert speech into text using Facebook Wav2Vec 2.0 model.Wav2Vec2 is a speech model...

Speech Recognition with Wav2Vec2 — Torchaudio ... - PyTorch

... speech recognition using using pre-trained models from wav2vec 2.0 [paper]. ... import IPython import matplotlib.pyplot as plt from torchaudio.utils ...

Return predictions wav2vec fairseq - python - Stack Overflow

After trying various things I was able to figure this out and trained a wav2vec model from scratch. Some background: wav2vec uses ...