Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

A possible approach to pronunciation customization

See original GitHub issue

Hi, I’m going to re-raise the topic in #12, which is currently closed. I apologize, and I appreciate that this is in some sense bad form.

I also would like the ability to, occasionally, fine-control pronunciation, and I am of the belief that fundamentally it’s not a machine solvable problem, thanks to the literal nightmare that is last names. I know six people who have the same last name by codepoint, but none of them say it the same way, and there’s nothing your software could ever do to cope with that, because it’s unavailable contextual knowledge.

The problem is, if you want to do high quality rendering, getting names right is a sign of respect, so this genuinely matters, and I believe needs to be in some way droppable to user control.

And so I was going to go bug the ocotillo author. Hm. Guess that works out nicely.

I don’t entirely understand where the English <-> Audio mapping comes from, but on a quick glance, it looks like it might be in jbetker/wav2vec2-large-robust-ft-libritts-voxpopuli.

And so I was wondering.

How hard would it be to have two of these?
If the underlying symbolic language was in some way deterministic with regards to end pronunciation - that is, it’s somehow a least worst case - how hard would it be to adapt the jbetker thing to a second syllabetry?

The reason being, y’know, the International Phonetic Alphabet is in Unicode, and does a pretty reasonable job with most real world languages. And that would reduce the job to Googling someone’s name once, putting it in a lookup table in IPA, and promptly forgetting about it for eternity.

Which, to me, sounds pretty good.

Or, if you prefer, ask from Siobhan and Pádraig Moloughney from Worchester, Massachusettes (“shavon and petrick molockney from wooster mass”.)

Let's talk to [ipa:ʃəˈvɔːn] and [ipa:ˈpˠɑːɾˠɪɟː mʌːlɒkːniː] about it is nicely unambiguous, and fits with the symbology in the other request

Issue Analytics

State:
Created a year ago
Reactions:5
Comments:32 (14 by maintainers)

Top GitHub Comments

4reactions

neonbjbcommented, May 1, 2022

I have been thinking about this over the last two days. In retrospect, I think it would have been absolutely possible to have trained Tortoise to speak both conventional alphabet and phonetic alphabet. There are plenty of datasets out there that use the phonetic alphabet that I could have inserted into training (or I could have trained a wav2vec2 model to transcribe into phonetic AND conventional and then picked one version at random while training Tortoise). So I guess the answer to the question/suggestion here is “yes - I am pretty sure that this is possible”.

As it stands, though, if I wanted to train Tortoise to be able to speak the phonetic alphabet, I’d need to change its symbolic lexicon. I’m a bit nervous that this will involve re-training the autoregressive transformer.

I’m willing to try making this fix, because I agree that this would be a major feature addition, but I cannot currently commit to it. My priority right now is implementing a feature to support the suggestion from #16 because I think the finding there is super cool and it won’t tie up my GPUs, which are currently working on something else. 😃

Lets keep this open, and I will try to get around to it.

3reactions

neonbjbcommented, May 13, 2022

I’ve opened up the wandb for this model if anyone is curious to follow along. This project contains all of my training attempts for the autoregressive model. You’ll want to watch the latest runs, titled unified_large_with_phonetic. https://wandb.ai/neonbjb/train_gpt_tts

Top Results From Across the Web

Understanding customization - IBM Cloud Docs

The service's regular pronunciation rules work well for common words. ... Customization is available for all languages. IBM Cloud only IBM Cloud® only....

Pronunciation Can Be Acquired Outside the Classroom ...

This method, called innovative Cued Pronunciation Readings (iCPRs; Martin, 2017, 2018), offers a viable solution because it requires neither ...

customizing phonetic alphabets for the user: a method and ...

PDF | The research presented in this methodology demonstrates how to modify an existing phonetic alphabet especially for English students ...

Pronunciation Modeling In Speech Synthesis

This approach encodes much possible variation (perhaps too much) in each lexical entry. The first two methods differ from ours in their use...

How to Improve Recognition of Specific Words — NVIDIA Riva

1. Word boosting# · 2. Custom vocabulary# · 3. Custom pronunciation (Lexicon mapping)# · 4. Retrain language model# · 5. Fine tune the...