Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Loading HuBERT pretrained model, dictionaries cannot be properly loaded.

See original GitHub issue

🐛 Bug

I want to extract pretrained representations from HuBERT. Following https://github.com/pytorch/fairseq/tree/master/examples/hubert#load-a-pretrained-model, when loading a pretrained model from the provided checkpoints, something related to the dictionaries caused some errors.

To Reproduce

Install the latest fairseq from source and download the pretrained model checkpoint.
Run the following with python.

ckpt_path = "/path/to/the/checkpoint.pt"
models, cfg, task = fairseq.checkpoint_utils.load_model_ensemble_and_task([ckpt_path], strict=False)

The stack trace:

...
  File "/home/huang18/VC/Experiments/espnet/tools/venv/envs/espnet/lib/python3.7/site-packages/fairseq/checkpoint_utils.py", line 446, in load_model_ensemble_and_task
    model = task.build_model(cfg.model)
  File "/home/huang18/VC/Experiments/espnet/tools/venv/envs/espnet/lib/python3.7/site-packages/fairseq/tasks/fairseq_task.py", line 324, in build_model
    model = models.build_model(cfg, self)
  File "/home/huang18/VC/Experiments/espnet/tools/venv/envs/espnet/lib/python3.7/site-packages/fairseq/models/__init__.py", line 96, in build_model
    return model.build_model(cfg, task)
  File "/home/huang18/VC/Experiments/espnet/tools/venv/envs/espnet/lib/python3.7/site-packages/fairseq/models/hubert/hubert.py", line 322, in build_model
    model = HubertModel(cfg, task.cfg, task.dictionaries)
  File "/home/huang18/VC/Experiments/espnet/tools/venv/envs/espnet/lib/python3.7/site-packages/fairseq/models/hubert/hubert.py", line 299, in __init__
    if any([d is None for d in dictionaries]):
TypeError: 'method' object is not iterable

If I print the dictionaries, it gives <bound method HubertPretrainingTask.load_dictionaries of <fairseq.tasks.hubert_pretraining.HubertPretrainingTask object at 0x7fa533d4e2b0>>. I think it is a function defined here: https://github.com/pytorch/fairseq/blob/afc77bdf4bb51453ce76f1572ef2ee6ddcda8eeb/fairseq/tasks/hubert_pretraining.py#L148. Instead of returning the dictionary, I think the function itself is mistakenly returned. I don’t know if this is some error related to the codebase or the contents in the checkpoints.

Expected behavior

Successful load of pretrained model.

Environment

fairseq Version (e.g., 1.0 or master): fairseq-1.0.0a0+afc77bd
PyTorch Version (e.g., 1.0): 1.8.1
OS (e.g., Linux): Linux
How you installed fairseq (pip, source): source
Build command you used (if compiling from source): pip install --upgrade git+https://github.com/pytorch/fairseq.git
Python version: 3.7
CUDA/cuDNN version: 11.2
GPU models and configuration: Not related
Any other relevant information: none.

Additional context

None.

Issue Analytics

State:
Created 2 years ago
Reactions:6
Comments:11 (3 by maintainers)

Top GitHub Comments

2reactions

chiluencommented, Jul 19, 2021

@xiyue961 I find out a useful link : https://github.com/pytorch/fairseq/issues/2514

Just create a dict.km.txt file like below: the first column is the labels which occur in train.km the second column is the count of each label It’s work for me to pretrain HuBERT

1reaction

kfmncommented, Jun 28, 2021

I have similar problems. I am trying to load a pretrained HuBERT model with the following simple code:

import fairseq

ckpt_path = "./hubert_large_ll60k.pt"
models, cfg, task = fairseq.checkpoint_utils.load_model_ensemble_and_task([ckpt_path], strict=False)

While loading an error occurs with the following stack messages:

Traceback (most recent call last): File “simplest_example.py”, line 4, in <module> models, cfg, task = fairseq.checkpoint_utils.load_model_ensemble_and_task([ckpt_path], strict=False) File “/mnt/lc/korenevsky/fairseq/fairseq/checkpoint_utils.py”, line 446, in load_model_ensemble_and_task model = task.build_model(cfg.model) File “/mnt/lc/korenevsky/fairseq/fairseq/tasks/fairseq_task.py”, line 324, in build_model model = models.build_model(cfg, self) File “/mnt/lc/korenevsky/fairseq/fairseq/models/init.py”, line 96, in build_model return model.build_model(cfg, task) File “/mnt/lc/korenevsky/fairseq/fairseq/models/hubert/hubert.py”, line 321, in build_model model = HubertModel(cfg, task.cfg, task.dictionaries) File “/mnt/lc/korenevsky/fairseq/fairseq/tasks/hubert_pretraining.py”, line 140, in dictionaries return self.state.dictionaries File “/mnt/lc/korenevsky/fairseq/fairseq/tasks/fairseq_task.py”, line 41, in getattr self._state[name] = self._factoriesname File “/mnt/lc/korenevsky/fairseq/fairseq/tasks/hubert_pretraining.py”, line 150, in load_dictionaries dictionaries = [Dictionary.load(f"{label_dir}/dict.{label}.txt") for label in self.cfg.labels] File “/mnt/lc/korenevsky/fairseq/fairseq/tasks/hubert_pretraining.py”, line 150, in <listcomp> dictionaries = [Dictionary.load(f"{label_dir}/dict.{label}.txt") for label in self.cfg.labels] File “/mnt/lc/korenevsky/fairseq/fairseq/data/dictionary.py”, line 216, in load d.add_from_file(f) File “/mnt/lc/korenevsky/fairseq/fairseq/data/dictionary.py”, line 229, in add_from_file raise fnfe File “/mnt/lc/korenevsky/fairseq/fairseq/data/dictionary.py”, line 226, in add_from_file with open(PathManager.get_local_path(f), “r”, encoding=“utf-8”) as fd: FileNotFoundError: [Errno 2] No such file or directory: ‘/checkpoint/wnhsu/experiments/hubert/kmeans_20210121/km_dataset_librivox.model_iter_2.all/dict.lyr9.km500.txt’

It seems like the code tries to load dictionary from the wrong path /checkpoint/wnhsu/experiments/hubert/kmeans_20210121/km_dataset_librivox.model_iter_2.all/dict.lyr9.km500.txt and fails

The error occurs on both pretrained models hubert_base_ls960.pt and hubert_large_ll60k.pt but DOES NOT occur on finetuned model hubert_large_ll60k_finetune_ls960.pt

Top Results From Across the Web

Loading HuBERT pretrained model, dictionaries cannot be ...

🐛 Bug. I want to extract pretrained representations from HuBERT. Following https://github.com/pytorch/fairseq/tree/master/examples/hubert#load- ...

Models - Hugging Face

PreTrainedModel takes care of storing the configuration of the models and handles methods for loading, downloading and saving models as well as a...

Speech Recognition with Wav2Vec2 - PyTorch

Wav2Vec2 (and HuBERT) models are trained in self-supervised manner. ... process will automatically fetch the pre-trained weights and load it into the model....

Command-line Tools — fairseq 0.12.2 documentation

--finetune-from-model, finetune from a pretrained model; note that meters and lr scheduler ... if set, does not load lr scheduler state from the...

Pretrained Model — ESPnet 202211 documentation

Done The following package was automatically installed and is no longer ... setup.py develop for espnet Successfully installed chainer-6.0.0 ...