Unable to load Wav2Vec 2.0 models - wav2vec2_vox_960h_new.pt
See original GitHub issue🐛 Bug
Hello.
Firstly , thank you for sharing all of the work and results and code. Its no small task.
I am attempting to load wav2vec2_vox_960h_new.pt
but am getting the following errors:
TypeError: object of type 'NoneType' has no len()
after calling
model, cfg, task = fairseq.checkpoint_utils.load_model_ensemble_and_task(['wav2vec2_vox_960h_new.pt'])
To Reproduce
install torch for cuda 11.6 via website docs:
conda install pytorch torchvision torchaudio cudatoolkit=11.6 -c pytorch -c conda-forge
install dev fairseq:
git clone https://github.com/pytorch/fairseq
cd fairseq
pip install --editable ./
in python notebook or wherever:
import torch
import fairseq
print(torch.__version__)
print(fairseq.__version__)
# I see
# 1.12.1
# 0.12.2
use_cuda = torch.cuda.is_available()
print(use_cuda)
# True for me
# load model
model, cfg, task = fairseq.checkpoint_utils.load_model_ensemble_and_task(['wav2vec2_vox_960h_new.pt'])
I am then greeted with the following error
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Input In [5], in <cell line: 1>()
----> 1 model, cfg, task = fairseq.checkpoint_utils.load_model_ensemble_and_task(['wav2vec2_vox_960h_new.pt'])
File ~/miniconda3/envs/pyav-wav2vec/lib/python3.9/site-packages/fairseq/checkpoint_utils.py:473, in load_model_ensemble_and_task(filenames, arg_overrides, task, strict, suffix, num_shards, state)
471 argspec = inspect.getfullargspec(task.build_model)
472 if "from_checkpoint" in argspec.args:
--> 473 model = task.build_model(cfg.model, from_checkpoint=True)
474 else:
475 model = task.build_model(cfg.model)
File ~/miniconda3/envs/pyav-wav2vec/lib/python3.9/site-packages/fairseq/tasks/audio_pretraining.py:197, in AudioPretrainingTask.build_model(self, model_cfg, from_checkpoint)
196 def build_model(self, model_cfg: FairseqDataclass, from_checkpoint=False):
--> 197 model = super().build_model(model_cfg, from_checkpoint)
199 actualized_cfg = getattr(model, "cfg", None)
200 if actualized_cfg is not None:
201 # if "w2v_args" in actualized_cfg:
File ~/miniconda3/envs/pyav-wav2vec/lib/python3.9/site-packages/fairseq/tasks/fairseq_task.py:338, in FairseqTask.build_model(self, cfg, from_checkpoint)
326 """
327 Build the :class:`~fairseq.models.BaseFairseqModel` instance for this
328 task.
(...)
334 a :class:`~fairseq.models.BaseFairseqModel` instance
335 """
336 from fairseq import models, quantization_utils
--> 338 model = models.build_model(cfg, self, from_checkpoint)
339 model = quantization_utils.quantize_model_scalar(model, cfg)
340 return model
File ~/miniconda3/envs/pyav-wav2vec/lib/python3.9/site-packages/fairseq/models/__init__.py:106, in build_model(cfg, task, from_checkpoint)
98 ARCH_CONFIG_REGISTRY[model_type](cfg)
100 assert model is not None, (
101 f"Could not infer model type from {cfg}. "
102 "Available models: {}".format(MODEL_DATACLASS_REGISTRY.keys())
103 + f" Requested model type: {model_type}"
104 )
--> 106 return model.build_model(cfg, task)
File ~/miniconda3/envs/pyav-wav2vec/lib/python3.9/site-packages/fairseq/models/wav2vec/wav2vec2_asr.py:208, in Wav2VecCtc.build_model(cls, cfg, task)
205 @classmethod
206 def build_model(cls, cfg: Wav2Vec2CtcConfig, task: FairseqTask):
207 """Build a new model instance."""
--> 208 w2v_encoder = Wav2VecEncoder(cfg, len(task.target_dictionary))
209 return cls(cfg, w2v_encoder)
TypeError: object of type 'NoneType' has no len()
Code sample
See above
Expected behavior
a properly loaded model.
Environment
-
fairseq Version 0.12.2
-
PyTorch Version 1.12.1
-
OS (e.g., Linux):
Linux frank-exchange-of-views 5.15.0-43-generic #46~20.04.1-Ubuntu SMP Thu Jul 14 15:20:17 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
-
How you installed fairseq (
pip
, source): pip install --editable ./ -
Build command you used (if compiling from source):
-
Python version: 3.8.10
-
CUDA/cuDNN version: 11.6 / 510.85.02
-
GPU models and configuration: 2x 3090
-
Any other relevant information:
It seems almost all wav2vec2 models dont load properly. Ive tried a variety of calls, and looked through git. Documentation for properly loading these models is sorely lacking
I understanding HuggingFace Transformers may be the preferred way these days to use these models, but it seems very odd to me that there’s such a variety of model loading methods, quirks, and special sauce - none of which seems properly documented, reproducible or available.
Is there a resource that I perhaps have missed that properly documents how to use these models?
Thank you in advance
Issue Analytics
- State:
- Created a year ago
- Reactions:3
- Comments:5
Top GitHub Comments
This is because someone refactored code.
wav2vec2_vox_960h_new.pt
is originally trained withaudio_pretraining
task but in the latest fairseq version,audio_finetuning
task is needed to load model. if u try to useaudio_pretraining
to load finetuned model, error occurs because there are no ctc projection head for contrastive learningso follow the below lines
firstly, download the model
now we try to load with
audio_pretraining
task,we got familiar error
you need to load model with overriding like below (before using code, you need to dictionary for wav2vec2_vox_960h_new.pt, dict file should look like below)
you need
dict.ltr.txt
forwav2vec2_vox_960h_new.pt
. you can getdict.ltr.txt
easily if you follow the wav2vec2 training guideor you can my preprocessed dict, dict.ltr.txt
now you successfully loaded model. try to inference using toy net input
Thank you @SeunghyunSEO - will try that shortly. Much obliged.