question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Bug] Phoneme extraction with punctuations is wrongly delimited

See original GitHub issue

Describe the bug Punctuations in extracted phonemes are delimited wrongly.

For instance the sentence tuː foːɹ paʊndz , bʌt hɛviɚ aɪɚnz , should be tuː foːɹ paʊndz, bʌt hɛviɚ aɪɚnz,

So punctuations do not need a space preceding them.

I think the current implementation causes unnatural silences in the trained models.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:1
  • Comments:9 (3 by maintainers)

github_iconTop GitHub Comments

3reactions
synesthesiamcommented, Sep 19, 2021

I’m working on it! I have this pesky job that keeps taking my time 😉

This is taking longer than expected since I’m adding it and preliminary SSML support at this same time. It didn’t seem worth it to me to redo the existing gruut tokenizer (to add proper whitespace preservation) only to scrap it later for SSML.

I will be completing the changes to gruut this week, and my goal is to have it integrated and tested with 🐸 TTS by 1 Oct 👍

2reactions
synesthesiamcommented, Sep 1, 2021

I’m reworking parts of gruut’s tokenization pipeline to preserve whitespace. I’ll delay updating the current pull request until these changes are in.

Read more comments on GitHub >

github_iconTop Results From Across the Web

[Feature request] [TTS] Support SSML in input text #752 - GitHub
[Bug] Phoneme extraction with punctuations is wrongly delimited #771 ... Regarding punctuation, I know that dashes and underscores (and ...
Read more >
Assessment: In Practice | Reading Rockets
Learn what you are measuring with each literacy skill assessment, the age or grade when a skill should be mastered, and when during...
Read more >
Capitalization and Punctuation Restoration: a Survey - arXiv
Furthermore, punctuation marks are classified as delimiting, separating and disambiguating. Some marks, like the comma, may belong to multiple categories ...
Read more >
US20020138265A1 - Error correction in speech recognition ...
A list of recognition candidates may be associated with each recognized speech utterance. The step of generating the sequence of phonemes for the...
Read more >
Speech Synthesis and Recognition
Note that punctuation symbols can be used in this notation, but words must not have initial capitals. An alternative is represent phonetic text...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found