Add new speakers and resume training from checkpoint in speaker_id
See original GitHub issueHi,
I’ve used your speaker_id module to train a model on a custom dataset. Initially the n_classes
parameter has been set to 4 in train.yaml
. Now I would like to increase this parameter and add new speakers and resume training from the saved checkpoint. I’ve tried doing this but encountered the following error:
The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows.
torchvision is not available - cannot save figures
The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows.
The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows.
./data\rirs_noises.zip exists. Skipping download
speechbrain.core - Beginning experiment!
speechbrain.core - Experiment folder: ./results/custom_augment
mini_librispeech_prepare - Creating train.json, valid.json, and test.json
mini_librispeech_prepare - train.json successfully created!
mini_librispeech_prepare - valid.json successfully created!
mini_librispeech_prepare - test.json successfully created!
speechbrain.dataio.encoder - Load called, but CategoricalEncoder is not empty. Loaded data will overwrite everything. This is normal if there is e.g. an unk label defined at init.
speechbrain.core - Info: ckpt_interval_minutes arg from hparam file is used
speechbrain.core - 4.5M trainable parameters in SpkIdBrain
speechbrain.utils.checkpoints - Loading a checkpoint from results\custom_augment\save\CKPT+2022-09-26+05-27-32+00
speechbrain.core - Exception:
Traceback (most recent call last):
File "E:\SpeechBrain\speechbrain\templates\speaker_id\train.py", line 328, in <module>
spk_id_brain.fit(
File "E:\SpeechBrain\long-speech\lib\site-packages\speechbrain\core.py", line 1143, in fit
self.on_fit_start()
File "E:\SpeechBrain\long-speech\lib\site-packages\speechbrain\core.py", line 797, in on_fit_start
self.checkpointer.recover_if_possible(
File "E:\SpeechBrain\long-speech\lib\site-packages\speechbrain\utils\checkpoints.py", line 840, in recover_if_possible
self.load_checkpoint(chosen_ckpt, device)
File "E:\SpeechBrain\long-speech\lib\site-packages\speechbrain\utils\checkpoints.py", line 853, in load_checkpoint
self._call_load_hooks(checkpoint, device)
File "E:\SpeechBrain\long-speech\lib\site-packages\speechbrain\utils\checkpoints.py", line 988, in _call_load_hooks
default_hook(obj, loadpath, end_of_epoch, device)
File "E:\SpeechBrain\long-speech\lib\site-packages\speechbrain\utils\checkpoints.py", line 93, in torch_recovery
obj.load_state_dict(torch.load(path, map_location=device), strict=True)
File "E:\SpeechBrain\long-speech\lib\site-packages\torch\nn\modules\module.py", line 1482, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for Classifier:
size mismatch for out.w.weight: copying a param with shape torch.Size([4, 512]) from checkpoint, the shape in current model is torch.Size([7, 512]).
size mismatch for out.w.bias: copying a param with shape torch.Size([4]) from checkpoint, the shape in current model is torch.Size([7]).
I’ve tried to increase the n_classes
parameter to 7 and made the additions in label_encoder.txt
manually. I understand that this addition of new speakers in causing this issue. Is there any solution or workaround for this, to continue building on the existing model by taking advantage of the checkpointing feature?
Issue Analytics
- State:
- Created a year ago
- Comments:8
Top Results From Across the Web
Add ability to resume training from latest checkpoint without ...
Add some kind of method to recursively go over everything in logs/, and find the latest saved checkpoint (find by date saved).
Read more >Saving and Loading Your Model to Resume Training in PyTorch
So in this post, we will be talking about how to save your model in the form of checkpoints and how to load...
Read more >Resume Training from Checkpoint Network - MathWorks
This example shows how to save checkpoint networks while training a deep learning network and resume training from a previously saved network.
Read more >deepvoice3_pytorch PyTorch Model - Model Zoo
DeepVoice3: Multi-speaker text-to-speech demo ... Add dilated convolution, more channels, more layers and add guided attention loss, etc.
Read more >Checkpoints — NVIDIA NeMo
Checkpoints #. There are two main ways to load pretrained checkpoints in NeMo: Using the restore_from() method to load a local checkpoint file...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hi @anautsch,
out.w.weight
dimensions are[n_classes, emb_dim]
, you’ll have to addn
lists of zeros depending on the number of speakers you want to increase.out.w.bias
dimensions are[n_classes]
, for this appendn
zeros to the end.model['state'][28]['exp_avg']
andmodel['state'][28]['exp_avg_sq']
dimensions are[n_classes, emb_dim]
, 2D so you’ll have to appendn
lists of zeros.model['state'][29]['exp_avg']
andmodel['state'][29]['exp_avg_sq']
have dimensions[n_classes]
, this would require appendingn
zeros at the end.All of these values are of type
Tensor
so I had to convert them to lists, append, then convert back to tensors and assign them to their respective keys. I’m not sure how adding zeros would affect the performance of the model or even if it’s the right way to go about.The training rate seemed to be decreasing every time the training resumed, need to check it once. I trained it on a very limited dataset, but the score with which it was verifying speakers seemed decent, this verification was on the audio files used for training not new ones.
Thanks for your help and support