question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

GlowTTS with MultiBand Melgan

See original GitHub issue

I am trying to get GlowTTS working with Multiband Melgan but I am running into many issues with the different MB Melgan models I am trying. I managed to get this working with normal Melgan from https://github.com/seungwonpark/melgan , but Multiband Melgan seems to be expecting different input or some normalization I can’t figure out.

What I Tried

Using Mozilla-TTS Multiband Melgan and taking most of my implementation from https://colab.research.google.com/drive/1u_16ZzHjKYFn1HNVuA4Qf_i2MMFB9olY?usp=sharing#scrollTo=x8IDS6fO8uW2

  • Initially I got a lot of mechanical noise and nothing else.
  • I tried copying the normalization that is done during the melgan synthesis and that made it so that I could here and understand the words in the synthesis but with a lot of background noise
  • I then came across https://github.com/kan-bayashi/ParallelWaveGAN/issues/169 where it was mentioned that the normalization includes decompression and logs. I used @seantempesta code with some differences (the stats file provided by MozillaTTS gives standard deviation and not variance so I skipped var and imported sigma directly). This made it so there was no background noise and it sounded like someone talking but all the words were garbled up.

I tried using https://github.com/kan-bayashi/ParallelWaveGAN Multiband Melgan but I kept running into tensor size issues during the inference and I couldn’t figure out why because the tensor size is the same as what I sent to MozillaTTS as well as to the normal Melgan.

I also tried the Multiband Melgan model from https://github.com/TensorSpeech/TensorflowTTS but I ran into similar tensor size issues.

Question

Has anyone managed to get any model of Multiband Melgan working with GlowTTS? Is there a specific repository that is better to use? Is this really up to differences in normalization prior to sending the mel spectrogram to the Multiband Melgan? What is the normalization that needs to be done to the mel spectrograms that come out of GlowTTS in order from them to work with Multiband Melgan?

Please let me know if more information is needed from me (i didn’t want to elaborate on every specific error I got as to not make this post go into too many directions at once).

Thanks in advance for your time and any help you can provide!

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:6

github_iconTop GitHub Comments

1reaction
Zarbuvitcommented, Oct 8, 2020

Its working! @seantempesta thank you for that repo! I ended up using mostly https://github.com/rishikksh20/melgan with editing the denoiser according to what @seantempesta did in his repo: https://github.com/seantempesta/melgan-1

As for the garbling words - completely my personal problem! I missnamed my models and used a different method of converting to phonemes in training and in inference. I am sorry for any time I caused you to waste on my stupidity.

Thank you for your help!

1reaction
seantempestacommented, Oct 2, 2020

Hey @Zarbuvit . I feel your frustration. I ended up trying all of those libraries and none of them worked well with glow-tts. Then I came across a forked version of @seungwanpark 's melgan written by @rishikksh20 and it worked perfectly!

Multi-band Melgan that works with glow-tts https://github.com/rishikksh20/melgan

I forked his project and have been re-working it so it can be used as a package for inference: https://github.com/seantempesta/melgan-1

(Note: I may have totally broken the training aspects as I’ve only tested the inference parts since I repackaged it)

Read more comments on GitHub >

github_iconTop Results From Across the Web

Multi-band MelGAN: Faster Waveform Generation for ... - arXiv
Abstract: In this paper, we propose multi-band MelGAN, a much faster waveform generation model targeting to high-quality text-to-speech.
Read more >
Multi-Band Melgan: Faster Waveform Generation For High ...
In this paper, we propose Robust MelGAN as a universal neural vocoder for high-fidelity TTS. Specifically, we build on multi-band MelGAN ...
Read more >
State Of The Art of Speech Synthesis at the End of May 2021
Glow-TTS : A Generative Flow for Text-to-Speech via Monotonic Alignment Search(2020), ... Multi-band MelGAN: Faster Waveform Generation for High-Quality ...
Read more >
Glow-TTS Samples Korean
Audio Samples from "Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search" (Korean, Non-official) ... GT(Multi-band MelGAN).
Read more >
Vocoder — malaya-speech documentation
Load Multiband MelGAN model#. def mbmelgan(model: str = 'female', quantized: bool = False, **kwargs): """ Load Multiband MelGAN Vocoder model.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found