Changing Processor in Android Example from Character to Phoneme Symbols
See original GitHub issueHi,
Sorry to ask your help once again @mapledxf
I am trying to customise the Android example to use a custom FastSpeech2 model with the MB-MelGan Universal Vocoder. The problem is, speech is different when using these models on Android compared to on my PC with Python inference…
With the input text: “hello world this is a test of the voice text to speech”
- In a notebook…
This is converted to phoneme symbol IDs: [37, 10, 46, 51, 74, 70, 30, 46, 24, 74, 25, 39, 58, 74, 39, 72, 74, 10, 74, 60, 27, 58, 60, 74, 11, 69, 74, 25, 10, 74, 69, 54, 58, 74, 60, 27, 45, 58, 60, 74, 60, 67, 74, 58, 56, 42, 23]
It sounds like this
- On the Android example…
symbolsToSequence()
processes the same text into character IDs:
[45, 42, 49, 49, 52, 11, 60, 52, 55, 49, 41, 11, 57, 45, 46, 56, 11, 46, 56, 11, 38, 11, 57, 42, 56, 57, 11, 52, 43, 11, 57, 45, 42, 11, 59, 52, 46, 40, 42, 11, 57, 42, 61, 57, 11, 57, 52, 11, 56, 53, 42, 42, 40, 45]
https://github.com/TensorSpeech/TensorFlowTTS/blob/e42595abbf21208c81e0fabaa0b1eaeaca2c4053/examples/android/app/src/main/java/com/tensorspeech/tensorflowtts/utils/Processor.java#L176-L189
This sounds messed up using a Phoneme-trained FS2 model…
The Processor in the Android example is designed for a character symbol set. I’m asking for advice on how to modify the Android example to use the correct phoneme symbols for Libritts…
If I am correct, I should just need to rewrite the same logic in the Libritts processor into Java using a Java G2P library. Would this work?
Issue Analytics
- State:
- Created 3 years ago
- Comments:20 (7 by maintainers)
@OscarVanL @ronggong I’m also finding the G2P library that works on both java and python =))). I tried to train Tensorflow Seq2Seq for G2P and convert it to tflite. It worked but somehow my TF implementation has poor performance 😄. In my experiments, phoneme with stress is much better than no stress 😄
BTW, see the g2p_en implementation (https://github.com/Kyubyong/g2p/blob/master/g2p_en/g2p.py#L51-L146), i think we can port it into tensorflow and convert to tflite 😃)). It’s just seq2seq without attention 😄
Those are different phoneme sets, the G2P library gives more varieties with stress.