Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

The problem about last phoneme alignment

See original GitHub issue

Hi, thanks for this great job, I have tried to integrate it on the top of my asr module, most of the phonemes were aligned perfect except the last, as can see in the below.

ctc1

ctc2

the top figure was the original wavform, and the bottom was the alignment result. I found the wavform approach the end was cut down, and the index_duration was right because the phonemes except the last were aligned accurately.

So how can I solve this problem? thanks in advance.

Issue Analytics

State:
Created 2 years ago
Comments:9 (4 by maintainers)

Top GitHub Comments

1reaction

taylorlucommented, May 14, 2021

My TTS model has the same architecture as Fast Speech, except inject a speaker embedding into every timestep input.

And I think this tool is more suitable for me because the montreal forced alignment tool is a little intricate which is based on kaldi and I hope to train all models in tensorflow only without other framework. Now it solved my problem easily.

Thanks again for your timely response!

0reactions

Sundy1219commented, Aug 18, 2021

现在我有个工程，需要做音的对齐，也是声韵母，请问，用这ctc对齐的思路，能告知一点吗？谢谢＠taylorlu