Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Bug] Phoneme extraction with punctuations is wrongly delimited

See original GitHub issue

Describe the bug Punctuations in extracted phonemes are delimited wrongly.

For instance the sentence tuː foːɹ paʊndz , bʌt hɛviɚ aɪɚnz , should be tuː foːɹ paʊndz, bʌt hɛviɚ aɪɚnz,

So punctuations do not need a space preceding them.

I think the current implementation causes unnatural silences in the trained models.

Issue Analytics

State:
Created 2 years ago
Reactions:1
Comments:9 (3 by maintainers)

Top GitHub Comments

3reactions

synesthesiamcommented, Sep 19, 2021

I’m working on it! I have this pesky job that keeps taking my time 😉

This is taking longer than expected since I’m adding it and preliminary SSML support at this same time. It didn’t seem worth it to me to redo the existing gruut tokenizer (to add proper whitespace preservation) only to scrap it later for SSML.

I will be completing the changes to gruut this week, and my goal is to have it integrated and tested with 🐸 TTS by 1 Oct 👍

2reactions

synesthesiamcommented, Sep 1, 2021

I’m reworking parts of gruut’s tokenization pipeline to preserve whitespace. I’ll delay updating the current pull request until these changes are in.

Top Results From Across the Web

[Feature request] [TTS] Support SSML in input text #752 - GitHub

[Bug] Phoneme extraction with punctuations is wrongly delimited #771 ... Regarding punctuation, I know that dashes and underscores (and ...

Assessment: In Practice | Reading Rockets

Learn what you are measuring with each literacy skill assessment, the age or grade when a skill should be mastered, and when during...

Capitalization and Punctuation Restoration: a Survey - arXiv

Furthermore, punctuation marks are classified as delimiting, separating and disambiguating. Some marks, like the comma, may belong to multiple categories ...

US20020138265A1 - Error correction in speech recognition ...

A list of recognition candidates may be associated with each recognized speech utterance. The step of generating the sequence of phonemes for the...

Speech Synthesis and Recognition

Note that punctuation symbols can be used in this notation, but words must not have initial capitals. An alternative is represent phonetic text...