question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Loading HuBERT pretrained model, dictionaries cannot be properly loaded.

See original GitHub issue

🐛 Bug

I want to extract pretrained representations from HuBERT. Following https://github.com/pytorch/fairseq/tree/master/examples/hubert#load-a-pretrained-model, when loading a pretrained model from the provided checkpoints, something related to the dictionaries caused some errors.

To Reproduce

  1. Install the latest fairseq from source and download the pretrained model checkpoint.
  2. Run the following with python.
ckpt_path = "/path/to/the/checkpoint.pt"
models, cfg, task = fairseq.checkpoint_utils.load_model_ensemble_and_task([ckpt_path], strict=False)

The stack trace:

...
  File "/home/huang18/VC/Experiments/espnet/tools/venv/envs/espnet/lib/python3.7/site-packages/fairseq/checkpoint_utils.py", line 446, in load_model_ensemble_and_task
    model = task.build_model(cfg.model)
  File "/home/huang18/VC/Experiments/espnet/tools/venv/envs/espnet/lib/python3.7/site-packages/fairseq/tasks/fairseq_task.py", line 324, in build_model
    model = models.build_model(cfg, self)
  File "/home/huang18/VC/Experiments/espnet/tools/venv/envs/espnet/lib/python3.7/site-packages/fairseq/models/__init__.py", line 96, in build_model
    return model.build_model(cfg, task)
  File "/home/huang18/VC/Experiments/espnet/tools/venv/envs/espnet/lib/python3.7/site-packages/fairseq/models/hubert/hubert.py", line 322, in build_model
    model = HubertModel(cfg, task.cfg, task.dictionaries)
  File "/home/huang18/VC/Experiments/espnet/tools/venv/envs/espnet/lib/python3.7/site-packages/fairseq/models/hubert/hubert.py", line 299, in __init__
    if any([d is None for d in dictionaries]):
TypeError: 'method' object is not iterable

If I print the dictionaries, it gives <bound method HubertPretrainingTask.load_dictionaries of <fairseq.tasks.hubert_pretraining.HubertPretrainingTask object at 0x7fa533d4e2b0>>. I think it is a function defined here: https://github.com/pytorch/fairseq/blob/afc77bdf4bb51453ce76f1572ef2ee6ddcda8eeb/fairseq/tasks/hubert_pretraining.py#L148. Instead of returning the dictionary, I think the function itself is mistakenly returned. I don’t know if this is some error related to the codebase or the contents in the checkpoints.

Expected behavior

Successful load of pretrained model.

Environment

  • fairseq Version (e.g., 1.0 or master): fairseq-1.0.0a0+afc77bd
  • PyTorch Version (e.g., 1.0): 1.8.1
  • OS (e.g., Linux): Linux
  • How you installed fairseq (pip, source): source
  • Build command you used (if compiling from source): pip install --upgrade git+https://github.com/pytorch/fairseq.git
  • Python version: 3.7
  • CUDA/cuDNN version: 11.2
  • GPU models and configuration: Not related
  • Any other relevant information: none.

Additional context

None.

Issue Analytics

  • State:open
  • Created 2 years ago
  • Reactions:6
  • Comments:11 (3 by maintainers)

github_iconTop GitHub Comments

2reactions
chiluencommented, Jul 19, 2021

@xiyue961 I find out a useful link : https://github.com/pytorch/fairseq/issues/2514

Just create a dict.km.txt file like below: the first column is the labels which occur in train.km the second column is the count of each label It’s work for me to pretrain HuBERT

10 100
23 300
35 500
...
1reaction
kfmncommented, Jun 28, 2021

I have similar problems. I am trying to load a pretrained HuBERT model with the following simple code:

import fairseq

ckpt_path = "./hubert_large_ll60k.pt"
models, cfg, task = fairseq.checkpoint_utils.load_model_ensemble_and_task([ckpt_path], strict=False)

While loading an error occurs with the following stack messages:

Traceback (most recent call last): File “simplest_example.py”, line 4, in <module> models, cfg, task = fairseq.checkpoint_utils.load_model_ensemble_and_task([ckpt_path], strict=False) File “/mnt/lc/korenevsky/fairseq/fairseq/checkpoint_utils.py”, line 446, in load_model_ensemble_and_task model = task.build_model(cfg.model) File “/mnt/lc/korenevsky/fairseq/fairseq/tasks/fairseq_task.py”, line 324, in build_model model = models.build_model(cfg, self) File “/mnt/lc/korenevsky/fairseq/fairseq/models/init.py”, line 96, in build_model return model.build_model(cfg, task) File “/mnt/lc/korenevsky/fairseq/fairseq/models/hubert/hubert.py”, line 321, in build_model model = HubertModel(cfg, task.cfg, task.dictionaries) File “/mnt/lc/korenevsky/fairseq/fairseq/tasks/hubert_pretraining.py”, line 140, in dictionaries return self.state.dictionaries File “/mnt/lc/korenevsky/fairseq/fairseq/tasks/fairseq_task.py”, line 41, in getattr self._state[name] = self._factoriesname File “/mnt/lc/korenevsky/fairseq/fairseq/tasks/hubert_pretraining.py”, line 150, in load_dictionaries dictionaries = [Dictionary.load(f"{label_dir}/dict.{label}.txt") for label in self.cfg.labels] File “/mnt/lc/korenevsky/fairseq/fairseq/tasks/hubert_pretraining.py”, line 150, in <listcomp> dictionaries = [Dictionary.load(f"{label_dir}/dict.{label}.txt") for label in self.cfg.labels] File “/mnt/lc/korenevsky/fairseq/fairseq/data/dictionary.py”, line 216, in load d.add_from_file(f) File “/mnt/lc/korenevsky/fairseq/fairseq/data/dictionary.py”, line 229, in add_from_file raise fnfe File “/mnt/lc/korenevsky/fairseq/fairseq/data/dictionary.py”, line 226, in add_from_file with open(PathManager.get_local_path(f), “r”, encoding=“utf-8”) as fd: FileNotFoundError: [Errno 2] No such file or directory: ‘/checkpoint/wnhsu/experiments/hubert/kmeans_20210121/km_dataset_librivox.model_iter_2.all/dict.lyr9.km500.txt’

It seems like the code tries to load dictionary from the wrong path /checkpoint/wnhsu/experiments/hubert/kmeans_20210121/km_dataset_librivox.model_iter_2.all/dict.lyr9.km500.txt and fails

The error occurs on both pretrained models hubert_base_ls960.pt and hubert_large_ll60k.pt but DOES NOT occur on finetuned model hubert_large_ll60k_finetune_ls960.pt

Read more comments on GitHub >

github_iconTop Results From Across the Web

Loading HuBERT pretrained model, dictionaries cannot be ...
🐛 Bug. I want to extract pretrained representations from HuBERT. Following https://github.com/pytorch/fairseq/tree/master/examples/hubert#load- ...
Read more >
Models - Hugging Face
PreTrainedModel takes care of storing the configuration of the models and handles methods for loading, downloading and saving models as well as a...
Read more >
Speech Recognition with Wav2Vec2 - PyTorch
Wav2Vec2 (and HuBERT) models are trained in self-supervised manner. ... process will automatically fetch the pre-trained weights and load it into the model....
Read more >
Command-line Tools — fairseq 0.12.2 documentation
--finetune-from-model, finetune from a pretrained model; note that meters and lr scheduler ... if set, does not load lr scheduler state from the...
Read more >
Pretrained Model — ESPnet 202211 documentation
Done The following package was automatically installed and is no longer ... setup.py develop for espnet Successfully installed chainer-6.0.0 ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found