time dimension doesn't match
See original GitHub issue^MTraining: 0%| | 0/200000 [00:00<?, ?it/s] ^MEpoch 1: 0%| | 0/454 [00:00<?, ?it/s]^[[APrepare training … Number of StyleSpeech Parameters: 28197333 Removing weight norm… Traceback (most recent call last): File “train.py”, line 224, in <module> main(args, configs) File “train.py”, line 98, in main output = (None, None, model((batch[2:-5]))) File “/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/StyleSpeech/lib/python3.7/site-packages/torch/nn/modules/module.py”, line 889, in _call_impl result = self.forward(*input, **kwargs) File “/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/StyleSpeech/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py”, line 165, in forward return self.module(*inputs[0], **kwargs[0]) File “/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/StyleSpeech/lib/python3.7/site-packages/torch/nn/modules/module.py”, line 889, in _call_impl result = self.forward(*input, **kwargs) File “/share/mini1/res/t/vc/studio/timap-en/libritts/StyleSpeech/model/StyleSpeech.py”, line 144, in forward d_control, File “/share/mini1/res/t/vc/studio/timap-en/libritts/StyleSpeech/model/StyleSpeech.py”, line 88, in G d_control, File “/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/StyleSpeech/lib/python3.7/site-packages/torch/nn/modules/module.py”, line 889, in _call_impl result = self.forward(*input, **kwargs) File “/share/mini1/res/t/vc/studio/timap-en/libritts/StyleSpeech/model/modules.py”, line 417, in forward x = x + pitch_embedding RuntimeError: The size of tensor a (132) must match the size of tensor b (130) at non-singleton dimension 1 ^MTraining: 0%| | 1/200000 [00:02<166:02:12, 2.99s/it]
I think it might because of mfa I used. As mentioned in https://montreal-forced-aligner.readthedocs.io/en/latest/getting_started.html, I installed mfa through conda.
Then I used
mfa align raw_data/LibriTTS lexicon/librispeech-lexicon.txt english preprocessed_data/LibriTTS
instead of the way you showed.
But I can’t find a way to run it as the way you showed, because I installed mfa through conda.
Issue Analytics
- State:
- Created 2 years ago
- Reactions:1
- Comments:24 (11 by maintainers)
Top GitHub Comments
gotcha, I should mention this first, you have to modify
/text
as in your case where the target language is not English. In the current code, the output oftext_to_sequence
function is different from the MFA output based on ‘raw_data/mls/ german-lexicon.txt’. To resolve this, you have to match the output of both functions. This is also important at inference time, where we will use the same function in/text
.exactly. The missing phonemes must also be missed here, which is the part you must modify along with your languages. Again, you need to make sure that the output of
text_to_sequence
function should always be matched with the TextGrid’s phoneme sequence (MFA lexicons).