question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Mandarin Chinese TTS with the input text in Chinese characters

See original GitHub issue

I’ve reproduced the results in the CSMSC recipe. I’d like to use Chinese characters in the input text. For example, I randomly chose an utterance (009831) from CSMSC, and extracted 3 different annotations/labels as follows: From CSMSC/ProsodyLabeling/000001-010000.txt, 至于#1当初#1报考#1南科大#3,他也#1只是想#1逃避#1高考#3,随便#1考着#1玩玩#4。 zhi4 yu2 dang1 chu1 bao4 kao3 nan2 ke1 da4 ta1 ye2 zhi3 shi4 xiang3 tao2 bi4 gao1 kao3 sui2 bian4 kao3 zhe5 wan2 wan5

From CSMSC/PhoneLabeling/009831.interval zh iii4 v2 d ang1 ch u1 b ao4 k ao3 n an2 k e1 d a4 sp1 t a1 ie2 zh iii3 sh iii4 x iang3 t ao2 b i4 g ao1 k ao3 sp1 s uei2 b ian4 k ao3 zh e5 uan2 uan5

In the recipe, the 3rd annotation extracted from .interval file was adopted as the phone units in the dict file (data/lang_phn/train_no_dev_units.txt), and the text files in the model training and decoding. However, the natural/normal Chinese input text is the 1st annotation above without prosodic marks: 至于当初报考南科大,他也只是想逃避高考,随便考着玩玩。

So I took the idea from the Mandarin demo, and used pypinyin to convert the Chinese characters to Pinyin. The printout after the conversion: Cleaned text: ['zhi4', 'yu2', 'dang1', 'chu1', 'bao4', 'kao3', 'nan2', 'ke1', 'da4', ',', 'ta1', 'ye3', 'zhi3', 'shi4', 'xiang3', 'tao2', 'bi4', 'gao1', 'kao3', ',', 'sui2', 'bian4', 'kao3', 'zhe', 'wan2', 'wan2', '。'] WARN: ü2 is not included in dict. WARN: , is not included in dict. WARN: , is not included in dict. WARN: ui2 is not included in dict. WARN: e is not included in dict. WARN: 。 is not included in dict.

The above converted results are similar to the 2nd annotation above, but not close to the input text of the decoding in the recipe. The differences caused warnings, and made the synthesized voice lower quality than the one in the recipe. Has anyone had any experiences and suggestions to fix this issue?

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (4 by maintainers)

github_iconTop GitHub Comments

8reactions
r9y9commented, Apr 8, 2020

FYI, I just noticed that kakaobrain released a new g2p package for Mandarin Chinese: https://github.com/kakaobrain/g2pM. I’m not familir with Mandarin, but the result looks promising and worth considering to replace pypinyin to it. That would solve some of pronunciation issues.

1reaction
kan-bayashicommented, Apr 6, 2020

@unilight I updated text frontend in the notebook by using your code. Thank you so much!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Text To Speech Mandarin | Speechify
Learning Mandarin Chinese was never this simple as it is now with Speechify's natural-sounding text-to-speech voices. Here's what to know.
Read more >
Text To Speech Chinese Mandarin - Narakeet
Narakeet uses the best text to speech software with natural voices for the Mandarin Chinese accent generators. There's no programming knowledge required. Just ......
Read more >
Free Text-To-Speech and Text-to-MP3 for Chinese Mandarin
Easily convert your Chinese Mandarin text into professional speech for free. ... Input limit: 3,000 characters / Don't forget to turn on your...
Read more >
Text to Speech : Chinese Mandarin female voice - ImTranslator
This text to speech service speaks in high quality, realistic sounding Chinese Mandarin female voice. Just type a word or a phrase, or...
Read more >
Free Chinese (Mandarin, Simplified) Text to Speech 2022
Simply paste or write your text in the text input field. Per VoiceOver limit is 300 characters. Step 2. Select the VoiceOver. Choose...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found