Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Question] Implement preprocessing on datasets?

See original GitHub issue

Coming from TensorFlowTTS, i find Coqui to be more functional and well-maintained. (I still encounter nan losses after 50k+ iterations, but i can leave that for later.)

One main issue is that each iteration seems to take about double the time and memory consumption is higher compared to TensorFlowTTS. From dataset.py, i can see collate_fn computes the spectrograms while batching and does not cache them (unlike the phoneme_cache).

I will rewrite some parts to save the preprocessed phonemes and spectrograms so i can train different models on the same dataset, and visually compare the ground truth spectrograms against the TTS output.

Also, i think LongTensors are not needed as sequence lengths won’t exceed 2 billion.

Issue Analytics

State:
Created 2 years ago
Comments:16 (5 by maintainers)

Top GitHub Comments

2reactions

iamanigeeitcommented, Jan 18, 2022

@erogol @vince62s I think i’ve found the bottleneck. For some reason, creating a phonemizer in gruut is very slow.

import gruut
text = 'this is a very very very very long sentence that you havent handled before'
language = 'en-us'

def testme(text, language, phone_sep='', word_sep=' '):
    phonemizer_args = {
        "remove_stress": True,
        "ipa_minor_breaks": False,  # don't replace commas/semi-colons with IPA |
        "ipa_major_breaks": False,  # don't replace periods with IPA ‖
    }
    ph_list = gruut.text_to_phonemes(
            text,
            lang=language,
            return_format="word_phonemes",
            phonemizer_args=phonemizer_args,
        )
    phones_words = [phone_sep.join(word_phonemes) for word_phonemes in ph_list]
    phones = word_sep.join(phones_words)
    return phones

%timeit testme(text,language)
144 ms ± 173 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

def testme(text, language, phone_sep='', word_sep=' '):
    phonemizer_args = {}
    ph_list = gruut.text_to_phonemes(
            text,
            lang=language,
            return_format="word_phonemes",
            phonemizer_args=phonemizer_args,
        )
    phones_words = [phone_sep.join(word_phonemes) for word_phonemes in ph_list]
    phones = word_sep.join(phones_words)
    return phones

%timeit testme(text,language)
1.19 ms ± 1.36 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

With 8 text samples per CPU, this would slow down every batch by over 1s. If we simply create phonemizer before text_to_phonemes the bottleneck goes away.

phonemizer_args = {
    "remove_stress": True,
    "ipa_minor_breaks": False,  # don't replace commas/semi-colons with IPA |
    "ipa_major_breaks": False,  # don't replace periods with IPA ‖
}
phonemizer = gruut.get_phonemizer(language, **phonemizer_args)
def testme(text, language, phone_sep='', word_sep=' ', phonemizer=phonemizer):
    ph_list = gruut.text_to_phonemes(
            text,
            lang=language,
            return_format="word_phonemes",
            phonemizer=phonemizer,
        )
    phones_words = [phone_sep.join(word_phonemes) for word_phonemes in ph_list]
    phones = word_sep.join(phones_words)
    return phones

%timeit testme(text,language)
1.2 ms ± 898 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

1reaction

iamanigeeitcommented, Jan 20, 2022

https://github.com/coqui-ai/TTS/blob/main/CONTRIBUTING.md

Phonemizer API is gonna change soon #1079

So if you send a PR make sure you check the new API first.

@erogol Thanks for the update! I am rushing a paper for Interspeech 2022 so i might only review the latest version end March… meanwhile, i have found that gruut.Phonemizer can’t be pickled (i.e. i cannot pass it as an argument to _phoneme_worker, so every worker needs to create its own phonemizer).

My current hack is to create a global phonemizers where num_phonemizers = num_workers, then pass the worker_idx to _phoneme_worker.

phonemizers = []
def set_phonemizers(phoneme_language, phonemizer_args, use_espeak_phonemes, num_workers):
    if use_espeak_phonemes:
        # Use a lexicon/g2p model train on eSpeak IPA instead of gruut IPA.
        # This is intended for backwards compatibility with TTS<=v0.0.13
        # pre-trained models.
        phonemizer_args["model_prefix"] = "espeak"
    global phonemizers
    phonemizers = []
    if phonemizer_args:
        for i in range(num_workers):
            phonemizers.append(gruut.get_phonemizer(phoneme_language, **phonemizer_args))
    else:
        for i in range(num_workers):
            phonemizers.append(gruut.get_phonemizer(phoneme_language))
    return phoneme_language

...

tqdm(
        p.imap(_phoneme_worker,
               [(item, cache_path, cleaner_name, phoneme_language, i % num_workers,
                 custom_symbols, character_config, add_blank) for i, item in enumerate(items)]),
        total=len(items)