FS2 + MBMelgan Speech sounds much worse when exported to TFLite in Android
See original GitHub issueHi,
I have trained a FS2 model and fine-tuned my MBMelgan model. Here is a sample of the speech produced in Python before exporting to TFLite: Normal FS2 and MBMelgan.
Once converted to TFLite and used in Android (based on TensorFlowTTS/examples/android/) the models sound much worse…
-
Using my FS2 model + repo’s pretrained
multiband_melgan.v1_24k
, both exported as TFLite. Speech -
Using my FS2 model + my MB MelGAN, both exported as TFLite. Speech
-
In 1 and 2 the speech sounds lower quality, the voice is less life-like and sounds less like my speaker. It’s also much lower pitch, even though
f0_ratio
1.0 is passed into both models. -
When using my vocoder in (2) you can hear low-frequency crackling you do not hear before converting to TFLite.
Is it normal to expect the model to sound worse once exported to TFLite?
Why do you think my vocoder adds crackling noise but the repo’s pre-trained one does not?
Thanks.
Issue Analytics
- State:
- Created 3 years ago
- Comments:30 (22 by maintainers)
Hi @OscarVanL, this is what I also observed in German model I trained. Basically, I obtained good results after removing optimizations with:
Default optimizations do not disturb the performance of
Tacotron 2
seriously, but the performance ofMulti-band MelGAN
quickly degrades with optimizations. So I applied them to the former but not to the latter.Sometimes it’s so easy to narrow your vision on something you miss the obvious…
Thanks for your excellent Android example model, it works really well.