Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add new speakers and resume training from checkpoint in speaker_id

See original GitHub issue

Hi,

I’ve used your speaker_id module to train a model on a custom dataset. Initially the n_classes parameter has been set to 4 in train.yaml. Now I would like to increase this parameter and add new speakers and resume training from the saved checkpoint. I’ve tried doing this but encountered the following error:

The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows.
torchvision is not available - cannot save figures
The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows.
The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows.
./data\rirs_noises.zip exists. Skipping download
speechbrain.core - Beginning experiment!
speechbrain.core - Experiment folder: ./results/custom_augment
mini_librispeech_prepare - Creating train.json, valid.json, and test.json
mini_librispeech_prepare - train.json successfully created!
mini_librispeech_prepare - valid.json successfully created!
mini_librispeech_prepare - test.json successfully created!
speechbrain.dataio.encoder - Load called, but CategoricalEncoder is not empty. Loaded data will overwrite everything. This is normal if there is e.g. an unk label defined at init.
speechbrain.core - Info: ckpt_interval_minutes arg from hparam file is used
speechbrain.core - 4.5M trainable parameters in SpkIdBrain
speechbrain.utils.checkpoints - Loading a checkpoint from results\custom_augment\save\CKPT+2022-09-26+05-27-32+00
speechbrain.core - Exception:
Traceback (most recent call last):
  File "E:\SpeechBrain\speechbrain\templates\speaker_id\train.py", line 328, in <module>
    spk_id_brain.fit(
  File "E:\SpeechBrain\long-speech\lib\site-packages\speechbrain\core.py", line 1143, in fit
    self.on_fit_start()
  File "E:\SpeechBrain\long-speech\lib\site-packages\speechbrain\core.py", line 797, in on_fit_start
    self.checkpointer.recover_if_possible(
  File "E:\SpeechBrain\long-speech\lib\site-packages\speechbrain\utils\checkpoints.py", line 840, in recover_if_possible
    self.load_checkpoint(chosen_ckpt, device)
  File "E:\SpeechBrain\long-speech\lib\site-packages\speechbrain\utils\checkpoints.py", line 853, in load_checkpoint
    self._call_load_hooks(checkpoint, device)
  File "E:\SpeechBrain\long-speech\lib\site-packages\speechbrain\utils\checkpoints.py", line 988, in _call_load_hooks
    default_hook(obj, loadpath, end_of_epoch, device)
  File "E:\SpeechBrain\long-speech\lib\site-packages\speechbrain\utils\checkpoints.py", line 93, in torch_recovery
    obj.load_state_dict(torch.load(path, map_location=device), strict=True)
  File "E:\SpeechBrain\long-speech\lib\site-packages\torch\nn\modules\module.py", line 1482, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for Classifier:
        size mismatch for out.w.weight: copying a param with shape torch.Size([4, 512]) from checkpoint, the shape in current model is torch.Size([7, 512]).
        size mismatch for out.w.bias: copying a param with shape torch.Size([4]) from checkpoint, the shape in current model is torch.Size([7]).

I’ve tried to increase the n_classes parameter to 7 and made the additions in label_encoder.txt manually. I understand that this addition of new speakers in causing this issue. Is there any solution or workaround for this, to continue building on the existing model by taking advantage of the checkpointing feature?

Issue Analytics

State:
Created a year ago
Comments:8

Top GitHub Comments

1reaction

eviltyphacommented, Oct 6, 2022

Hi @anautsch,

model = torch.load("results/.../save/CKPT.../classifier.ckpt")
model['out.w.weight'] # Edit
model['out.w.bias'] # Edit

out.w.weight dimensions are [n_classes, emb_dim], you’ll have to add n lists of zeros depending on the number of speakers you want to increase. out.w.bias dimensions are [n_classes], for this append n zeros to the end.

model = torch.load("results/.../save/CKPT.../optimizer.ckpt")
model['state'][28]['exp_avg'] # Edit
model['state'][28]['exp_avg_sq'] # Edit
model['state'][29]['exp_avg'] # Edit
model['state'][29]['exp_avg_sq'] # Edit

model['state'][28]['exp_avg'] and model['state'][28]['exp_avg_sq'] dimensions are [n_classes, emb_dim], 2D so you’ll have to append n lists of zeros. model['state'][29]['exp_avg'] and model['state'][29]['exp_avg_sq'] have dimensions [n_classes], this would require appending n zeros at the end.

All of these values are of type Tensor so I had to convert them to lists, append, then convert back to tensors and assign them to their respective keys. I’m not sure how adding zeros would affect the performance of the model or even if it’s the right way to go about.

The training rate seemed to be decreasing every time the training resumed, need to check it once. I trained it on a very limited dataset, but the score with which it was verifying speakers seemed decent, this verification was on the audio files used for training not new ones.

0reactions

eviltyphacommented, Oct 7, 2022

Thanks for your help and support