Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

RuntimeError: The expanded size of the tensor (64) must match the existing size (112) at non-singleton dimension 2. Target sizes: [64, 80, 64]. Tensor sizes: [64, 1, 112]

See original GitHub issue

Hey,

I’m trying to run a training with Tacotron 1 using GST. I get the error on the first batch already.

Pytorch version: 1.8 and 1.7.1 (both yielded the same error) Python version: 3.8.0

Traceback (most recent call last): File "TTS/bin/train_tacotron.py", line 721, in <module> main(args) File "TTS/bin/train_tacotron.py", line 619, in main train_avg_loss_dict, global_step = train(train_loader, model, File "TTS/bin/train_tacotron.py", line 168, in train decoder_output, postnet_output, alignments, stop_tokens = model( File "/home/big-boy/anaconda3/envs/PyCapacitron/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "/home/big-boy/projects/TTS/TTS/tts/models/tacotron.py", line 173, in forward decoder_outputs = decoder_outputs * output_mask.unsqueeze(1).expand_as(decoder_outputs) RuntimeError: The expanded size of the tensor (64) must match the existing size (112) at non-singleton dimension 2. Target sizes: [64, 80, 64]. Tensor sizes: [64, 1, 112]

My hyperparams: // TRAINING “batch_size”: 64, “eval_batch_size”: 16, “r”: 4, “gradual_training”: [ [0, 7, 64], [1, 5, 64], [50000, 3, 32], [130000, 2, 32], [290000, 1, 32] ], “mixed_precision”: true,

// MULTI-SPEAKER and GST “use_speaker_embedding”: false, // use speaker embedding to enable multi-speaker learning. “use_gst”: true, “use_external_speaker_embedding_file”: false, “external_speaker_embedding_file”: “…/…/speakers-vctk-en.json”, “gst”: { // gst parameter if gst is enabled “gst_style_input”: null, // Condition the style input either on a // -> wave file [path to wave] or // -> dictionary using the style tokens {‘token1’: ‘value’, ‘token2’: ‘value’} example {“0”: 0.15, “1”: 0.15, “5”: -0.15} // with the dictionary being len(dict) <= len(gst_style_tokens). “gst_embedding_dim”: 512, “gst_num_heads”: 4, “gst_style_tokens”: 10, “gst_use_speaker_embedding”: false },