RuntimeError: The expanded size of the tensor (64) must match the existing size (112) at non-singleton dimension 2. Target sizes: [64, 80, 64]. Tensor sizes: [64, 1, 112]
See original GitHub issueHey,
I’m trying to run a training with Tacotron 1 using GST. I get the error on the first batch already.
Pytorch version: 1.8 and 1.7.1 (both yielded the same error) Python version: 3.8.0
Traceback (most recent call last): File "TTS/bin/train_tacotron.py", line 721, in <module> main(args) File "TTS/bin/train_tacotron.py", line 619, in main train_avg_loss_dict, global_step = train(train_loader, model, File "TTS/bin/train_tacotron.py", line 168, in train decoder_output, postnet_output, alignments, stop_tokens = model( File "/home/big-boy/anaconda3/envs/PyCapacitron/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "/home/big-boy/projects/TTS/TTS/tts/models/tacotron.py", line 173, in forward decoder_outputs = decoder_outputs * output_mask.unsqueeze(1).expand_as(decoder_outputs) RuntimeError: The expanded size of the tensor (64) must match the existing size (112) at non-singleton dimension 2. Target sizes: [64, 80, 64]. Tensor sizes: [64, 1, 112]
My hyperparams: // TRAINING “batch_size”: 64, “eval_batch_size”: 16, “r”: 4, “gradual_training”: [ [0, 7, 64], [1, 5, 64], [50000, 3, 32], [130000, 2, 32], [290000, 1, 32] ], “mixed_precision”: true,
// MULTI-SPEAKER and GST “use_speaker_embedding”: false, // use speaker embedding to enable multi-speaker learning. “use_gst”: true, “use_external_speaker_embedding_file”: false, “external_speaker_embedding_file”: “…/…/speakers-vctk-en.json”, “gst”: { // gst parameter if gst is enabled “gst_style_input”: null, // Condition the style input either on a // -> wave file [path to wave] or // -> dictionary using the style tokens {‘token1’: ‘value’, ‘token2’: ‘value’} example {“0”: 0.15, “1”: 0.15, “5”: -0.15} // with the dictionary being len(dict) <= len(gst_style_tokens). “gst_embedding_dim”: 512, “gst_num_heads”: 4, “gst_style_tokens”: 10, “gst_use_speaker_embedding”: false },
Issue Analytics
- State:
- Created 3 years ago
- Comments:28 (22 by maintainers)
Top GitHub Comments
Training is running, thank you guys so much!
I had to downgrade to librosa==0.6.3, though because of this:
librosa.util.exceptions.ParameterError: Audio buffer is not Fortran-contiguous. Use numpy.asfortranarray to ensure Fortran contiguity.
did you pip install -e . again after the checkout ?