Loading HuBERT pretrained model, dictionaries cannot be properly loaded.
See original GitHub issue🐛 Bug
I want to extract pretrained representations from HuBERT. Following https://github.com/pytorch/fairseq/tree/master/examples/hubert#load-a-pretrained-model, when loading a pretrained model from the provided checkpoints, something related to the dictionaries caused some errors.
To Reproduce
- Install the latest fairseq from source and download the pretrained model checkpoint.
- Run the following with python.
ckpt_path = "/path/to/the/checkpoint.pt"
models, cfg, task = fairseq.checkpoint_utils.load_model_ensemble_and_task([ckpt_path], strict=False)
The stack trace:
...
File "/home/huang18/VC/Experiments/espnet/tools/venv/envs/espnet/lib/python3.7/site-packages/fairseq/checkpoint_utils.py", line 446, in load_model_ensemble_and_task
model = task.build_model(cfg.model)
File "/home/huang18/VC/Experiments/espnet/tools/venv/envs/espnet/lib/python3.7/site-packages/fairseq/tasks/fairseq_task.py", line 324, in build_model
model = models.build_model(cfg, self)
File "/home/huang18/VC/Experiments/espnet/tools/venv/envs/espnet/lib/python3.7/site-packages/fairseq/models/__init__.py", line 96, in build_model
return model.build_model(cfg, task)
File "/home/huang18/VC/Experiments/espnet/tools/venv/envs/espnet/lib/python3.7/site-packages/fairseq/models/hubert/hubert.py", line 322, in build_model
model = HubertModel(cfg, task.cfg, task.dictionaries)
File "/home/huang18/VC/Experiments/espnet/tools/venv/envs/espnet/lib/python3.7/site-packages/fairseq/models/hubert/hubert.py", line 299, in __init__
if any([d is None for d in dictionaries]):
TypeError: 'method' object is not iterable
If I print the dictionaries
, it gives <bound method HubertPretrainingTask.load_dictionaries of <fairseq.tasks.hubert_pretraining.HubertPretrainingTask object at 0x7fa533d4e2b0>>
. I think it is a function defined here: https://github.com/pytorch/fairseq/blob/afc77bdf4bb51453ce76f1572ef2ee6ddcda8eeb/fairseq/tasks/hubert_pretraining.py#L148. Instead of returning the dictionary, I think the function itself is mistakenly returned. I don’t know if this is some error related to the codebase or the contents in the checkpoints.
Expected behavior
Successful load of pretrained model.
Environment
- fairseq Version (e.g., 1.0 or master): fairseq-1.0.0a0+afc77bd
- PyTorch Version (e.g., 1.0): 1.8.1
- OS (e.g., Linux): Linux
- How you installed fairseq (
pip
, source): source - Build command you used (if compiling from source):
pip install --upgrade git+https://github.com/pytorch/fairseq.git
- Python version: 3.7
- CUDA/cuDNN version: 11.2
- GPU models and configuration: Not related
- Any other relevant information: none.
Additional context
None.
Issue Analytics
- State:
- Created 2 years ago
- Reactions:6
- Comments:11 (3 by maintainers)
Top GitHub Comments
@xiyue961 I find out a useful link : https://github.com/pytorch/fairseq/issues/2514
Just create a dict.km.txt file like below: the first column is the labels which occur in train.km the second column is the count of each label It’s work for me to pretrain HuBERT
I have similar problems. I am trying to load a pretrained HuBERT model with the following simple code:
While loading an error occurs with the following stack messages:
Traceback (most recent call last): File “simplest_example.py”, line 4, in <module> models, cfg, task = fairseq.checkpoint_utils.load_model_ensemble_and_task([ckpt_path], strict=False) File “/mnt/lc/korenevsky/fairseq/fairseq/checkpoint_utils.py”, line 446, in load_model_ensemble_and_task model = task.build_model(cfg.model) File “/mnt/lc/korenevsky/fairseq/fairseq/tasks/fairseq_task.py”, line 324, in build_model model = models.build_model(cfg, self) File “/mnt/lc/korenevsky/fairseq/fairseq/models/init.py”, line 96, in build_model return model.build_model(cfg, task) File “/mnt/lc/korenevsky/fairseq/fairseq/models/hubert/hubert.py”, line 321, in build_model model = HubertModel(cfg, task.cfg, task.dictionaries) File “/mnt/lc/korenevsky/fairseq/fairseq/tasks/hubert_pretraining.py”, line 140, in dictionaries return self.state.dictionaries File “/mnt/lc/korenevsky/fairseq/fairseq/tasks/fairseq_task.py”, line 41, in getattr self._state[name] = self._factoriesname File “/mnt/lc/korenevsky/fairseq/fairseq/tasks/hubert_pretraining.py”, line 150, in load_dictionaries dictionaries = [Dictionary.load(f"{label_dir}/dict.{label}.txt") for label in self.cfg.labels] File “/mnt/lc/korenevsky/fairseq/fairseq/tasks/hubert_pretraining.py”, line 150, in <listcomp> dictionaries = [Dictionary.load(f"{label_dir}/dict.{label}.txt") for label in self.cfg.labels] File “/mnt/lc/korenevsky/fairseq/fairseq/data/dictionary.py”, line 216, in load d.add_from_file(f) File “/mnt/lc/korenevsky/fairseq/fairseq/data/dictionary.py”, line 229, in add_from_file raise fnfe File “/mnt/lc/korenevsky/fairseq/fairseq/data/dictionary.py”, line 226, in add_from_file with open(PathManager.get_local_path(f), “r”, encoding=“utf-8”) as fd: FileNotFoundError: [Errno 2] No such file or directory: ‘/checkpoint/wnhsu/experiments/hubert/kmeans_20210121/km_dataset_librivox.model_iter_2.all/dict.lyr9.km500.txt’
It seems like the code tries to load dictionary from the wrong path /checkpoint/wnhsu/experiments/hubert/kmeans_20210121/km_dataset_librivox.model_iter_2.all/dict.lyr9.km500.txt and fails
The error occurs on both pretrained models hubert_base_ls960.pt and hubert_large_ll60k.pt but DOES NOT occur on finetuned model hubert_large_ll60k_finetune_ls960.pt