question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. ItΒ collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Bug] Error while testing a trained VITS Model (Gibberish)

See original GitHub issue

Describe the bug

Thanks for the amazing codebase!

When I try to test my trained VITS model for TTS, the model produces a gibber for 3-4 seconds at the beginning of every sentence and at every punctuation mark (full stop, commas). It is able to speak the rest of the text correctly.

Following is the output generated by the test command:

Using model: vits Setting up Audio Processor… | > sample_rate:22050 | > resample:False | > num_mels:80 | > log_func:np.log10 | > min_level_db:0 | > frame_shift_ms:None | > frame_length_ms:None | > ref_level_db:None | > fft_size:1024 | > power:None | > preemphasis:0.0 | > griffin_lim_iters:None | > signal_norm:None | > symmetric_norm:None | > mel_fmin:0 | > mel_fmax:None | > pitch_fmin:None | > pitch_fmax:None | > spec_gain:20.0 | > stft_pad_mode:reflect | > max_norm:1.0 | > clip_norm:True | > do_trim_silence:False | > trim_db:60 | > do_sound_norm:False | > do_amp_to_db_linear:True | > do_amp_to_db_mel:True | > do_rms_norm:False | > db_level:None | > stats_path:None | > base:10 | > hop_length:256 | > win_length:1024 Text: As company has expanded its datacenters. to meet demands Text splitted to sentences. [β€˜As company has expanded its datacenters.’, β€˜to meet demands’] [β€˜<BLNK>’, β€˜F’, β€˜<BLNK>’, β€˜a’, β€˜<BLNK>’, β€˜i’, β€˜<BLNK>’, β€˜l’, β€˜<BLNK>’, β€˜e’, β€˜<BLNK>’, β€˜d’, β€˜<BLNK>’, ’ ', β€˜<BLNK>’, β€˜t’, β€˜<BLNK>’, β€˜o’, β€˜<BLNK>’, ’ ', β€˜<BLNK>’, β€˜c’, β€˜<BLNK>’, β€˜r’, β€˜<BLNK>’, β€˜e’, β€˜<BLNK>’, β€˜a’, β€˜<BLNK>’, β€˜t’, β€˜<BLNK>’, β€˜e’, β€˜<BLNK>’, ’ ', β€˜<BLNK>’, β€˜s’, β€˜<BLNK>’, β€˜e’, β€˜<BLNK>’, β€˜c’, β€˜<BLNK>’, β€˜u’, β€˜<BLNK>’, β€˜r’, β€˜<BLNK>’, β€˜e’, β€˜<BLNK>’, ’ ', β€˜<BLNK>’, β€˜d’, β€˜<BLNK>’, β€˜i’, β€˜<BLNK>’, β€˜r’, β€˜<BLNK>’, β€˜e’, β€˜<BLNK>’, β€˜c’, β€˜<BLNK>’, β€˜t’, β€˜<BLNK>’, β€˜o’, β€˜<BLNK>’, β€˜r’, β€˜<BLNK>’, β€˜y’, β€˜<BLNK>’, ’ ', β€˜<BLNK>’, β€˜:’, β€˜<BLNK>’, ’ ', β€˜<BLNK>’, β€˜N’, β€˜<BLNK>’, β€˜o’, β€˜<BLNK>’, ’ ', β€˜<BLNK>’, β€˜s’, β€˜<BLNK>’, β€˜u’, β€˜<BLNK>’, β€˜c’, β€˜<BLNK>’, β€˜h’, β€˜<BLNK>’, ’ ', β€˜<BLNK>’, β€˜f’, β€˜<BLNK>’, β€˜i’, β€˜<BLNK>’, β€˜l’, β€˜<BLNK>’, β€˜e’, β€˜<BLNK>’, ’ ', β€˜<BLNK>’, β€˜o’, β€˜<BLNK>’, β€˜r’, β€˜<BLNK>’, ’ ', β€˜<BLNK>’, β€˜d’, β€˜<BLNK>’, β€˜i’, β€˜<BLNK>’, β€˜r’, β€˜<BLNK>’, β€˜e’, β€˜<BLNK>’, β€˜c’, β€˜<BLNK>’, β€˜t’, β€˜<BLNK>’, β€˜o’, β€˜<BLNK>’, β€˜r’, β€˜<BLNK>’, β€˜y’, β€˜<BLNK>’, β€˜Γ¦β€™, β€˜<BLNK>’, β€˜z’, β€˜<BLNK>’, ’ ', β€˜<BLNK>’, β€˜m’, β€˜<BLNK>’, β€˜Λˆβ€™, β€˜<BLNK>’, β€˜a’, β€˜<BLNK>’, β€˜Ιͺ’, β€˜<BLNK>’, β€˜k’, β€˜<BLNK>’, β€˜ΙΉβ€™, β€˜<BLNK>’, β€˜Ι™β€™, β€˜<BLNK>’, β€˜s’, β€˜<BLNK>’, β€˜ΛŒβ€™, β€˜<BLNK>’, β€˜Ι‘β€™, β€˜<BLNK>’, β€˜Λβ€™, β€˜<BLNK>’, β€˜f’, β€˜<BLNK>’, β€˜t’, β€˜<BLNK>’, ’ ', β€˜<BLNK>’, β€˜h’, β€˜<BLNK>’, β€˜Ιβ€™, β€˜<BLNK>’, β€˜z’, β€˜<BLNK>’, ’ ', β€˜<BLNK>’, β€˜Ι›β€™, β€˜<BLNK>’, β€˜k’, β€˜<BLNK>’, β€˜s’, β€˜<BLNK>’, β€˜p’, β€˜<BLNK>’, β€˜Λˆβ€™, β€˜<BLNK>’, β€˜Γ¦β€™, β€˜<BLNK>’, β€˜n’, β€˜<BLNK>’, β€˜d’, β€˜<BLNK>’, β€˜α΅»β€™, β€˜<BLNK>’, β€˜d’, β€˜<BLNK>’, ’ β€˜, β€˜<BLNK>’, β€˜Ιͺ’, β€˜<BLNK>’, β€˜t’, β€˜<BLNK>’, β€˜s’, β€˜<BLNK>’, ’ β€˜, β€˜<BLNK>’, β€˜d’, β€˜<BLNK>’, β€˜Λˆβ€™, β€˜<BLNK>’, β€˜e’, β€˜<BLNK>’, β€˜Ιͺ’, β€˜<BLNK>’, β€˜ΙΎβ€™, β€˜<BLNK>’, β€˜Ι™β€™, β€˜<BLNK>’, β€˜s’, β€˜<BLNK>’, β€˜ΛŒβ€™, β€˜<BLNK>’, β€˜Ι›β€™, β€˜<BLNK>’, β€˜n’, β€˜<BLNK>’, β€˜t’, β€˜<BLNK>’, β€˜Ιšβ€™, β€˜<BLNK>’, β€˜z’, β€˜<BLNK>’, β€˜.’, β€˜<BLNK>’] [!] Character β€˜F’ not found in the vocabulary. Discarding it. [’<BLNK>’, β€˜F’, β€˜<BLNK>’, β€˜a’, β€˜<BLNK>’, β€˜i’, β€˜<BLNK>’, β€˜l’, β€˜<BLNK>’, β€˜e’, β€˜<BLNK>’, β€˜d’, β€˜<BLNK>’, ’ ', β€˜<BLNK>’, β€˜t’, β€˜<BLNK>’, β€˜o’, β€˜<BLNK>’, ’ ', β€˜<BLNK>’, β€˜c’, β€˜<BLNK>’, β€˜r’, β€˜<BLNK>’, β€˜e’, β€˜<BLNK>’, β€˜a’, β€˜<BLNK>’, β€˜t’, β€˜<BLNK>’, β€˜e’, β€˜<BLNK>’, ’ ', β€˜<BLNK>’, β€˜s’, β€˜<BLNK>’, β€˜e’, β€˜<BLNK>’, β€˜c’, β€˜<BLNK>’, β€˜u’, β€˜<BLNK>’, β€˜r’, β€˜<BLNK>’, β€˜e’, β€˜<BLNK>’, ’ ', β€˜<BLNK>’, β€˜d’, β€˜<BLNK>’, β€˜i’, β€˜<BLNK>’, β€˜r’, β€˜<BLNK>’, β€˜e’, β€˜<BLNK>’, β€˜c’, β€˜<BLNK>’, β€˜t’, β€˜<BLNK>’, β€˜o’, β€˜<BLNK>’, β€˜r’, β€˜<BLNK>’, β€˜y’, β€˜<BLNK>’, ’ ', β€˜<BLNK>’, β€˜:’, β€˜<BLNK>’, ’ ', β€˜<BLNK>’, β€˜N’, β€˜<BLNK>’, β€˜o’, β€˜<BLNK>’, ’ ', β€˜<BLNK>’, β€˜s’, β€˜<BLNK>’, β€˜u’, β€˜<BLNK>’, β€˜c’, β€˜<BLNK>’, β€˜h’, β€˜<BLNK>’, ’ ', β€˜<BLNK>’, β€˜f’, β€˜<BLNK>’, β€˜i’, β€˜<BLNK>’, β€˜l’, β€˜<BLNK>’, β€˜e’, β€˜<BLNK>’, ’ ', β€˜<BLNK>’, β€˜o’, β€˜<BLNK>’, β€˜r’, β€˜<BLNK>’, ’ ', β€˜<BLNK>’, β€˜d’, β€˜<BLNK>’, β€˜i’, β€˜<BLNK>’, β€˜r’, β€˜<BLNK>’, β€˜e’, β€˜<BLNK>’, β€˜c’, β€˜<BLNK>’, β€˜t’, β€˜<BLNK>’, β€˜o’, β€˜<BLNK>’, β€˜r’, β€˜<BLNK>’, β€˜y’, β€˜<BLNK>’, β€˜Γ¦β€™, β€˜<BLNK>’, β€˜z’, β€˜<BLNK>’, ’ ', β€˜<BLNK>’, β€˜m’, β€˜<BLNK>’, β€˜Λˆβ€™, β€˜<BLNK>’, β€˜a’, β€˜<BLNK>’, β€˜Ιͺ’, β€˜<BLNK>’, β€˜k’, β€˜<BLNK>’, β€˜ΙΉβ€™, β€˜<BLNK>’, β€˜Ι™β€™, β€˜<BLNK>’, β€˜s’, β€˜<BLNK>’, β€˜ΛŒβ€™, β€˜<BLNK>’, β€˜Ι‘β€™, β€˜<BLNK>’, β€˜Λβ€™, β€˜<BLNK>’, β€˜f’, β€˜<BLNK>’, β€˜t’, β€˜<BLNK>’, ’ ', β€˜<BLNK>’, β€˜h’, β€˜<BLNK>’, β€˜Ιβ€™, β€˜<BLNK>’, β€˜z’, β€˜<BLNK>’, ’ ', β€˜<BLNK>’, β€˜Ι›β€™, β€˜<BLNK>’, β€˜k’, β€˜<BLNK>’, β€˜s’, β€˜<BLNK>’, β€˜p’, β€˜<BLNK>’, β€˜Λˆβ€™, β€˜<BLNK>’, β€˜Γ¦β€™, β€˜<BLNK>’, β€˜n’, β€˜<BLNK>’, β€˜d’, β€˜<BLNK>’, β€˜α΅»β€™, β€˜<BLNK>’, β€˜d’, β€˜<BLNK>’, ’ ', β€˜<BLNK>’, β€˜Ιͺ’, β€˜<BLNK>’, β€˜t’, β€˜<BLNK>’, β€˜s’, β€˜<BLNK>’, ’ ', β€˜<BLNK>’, β€˜d’, β€˜<BLNK>’, β€˜Λˆβ€™, β€˜<BLNK>’, β€˜e’, β€˜<BLNK>’, β€˜Ιͺ’, β€˜<BLNK>’, β€˜ΙΎβ€™, β€˜<BLNK>’, β€˜Ι™β€™, β€˜<BLNK>’, β€˜s’, β€˜<BLNK>’, β€˜ΛŒβ€™, β€˜<BLNK>’, β€˜Ι›β€™, β€˜<BLNK>’, β€˜n’, β€˜<BLNK>’, β€˜t’, β€˜<BLNK>’, β€˜Ιšβ€™, β€˜<BLNK>’, β€˜z’, β€˜<BLNK>’, β€˜.’, β€˜<BLNK>’] [!] Character β€˜N’ not found in the vocabulary. Discarding it. Processing time: 13.786689043045044 Real-time factor: 1.211566180171308

I am also not sure why the last 2 lists of letters and <BLNK> is being generated.

I would be grateful for any help on this. Thank you very much.

To Reproduce

tts --text 'As company has expanded its datacenters. to meet demands' --model_name ./model.pth --config_path ./config.json --out_path ./output.wav

Expected behavior

No response

Logs

No response

Environment

- TTS Version - 0.8.0
- PyTorch - 1.13.0
- Python - 3.8.10
- OS - Linux
- CPU
- TTS, PyTorch installed through Pip

Additional context

No response

Issue Analytics

  • State:closed
  • Created 9 months ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
p0p4kcommented, Dec 8, 2022

image I believe the problem is when converting to phonemes. If you ignore the Failed to create secure directory: No such file or directory error message, rest of the appended gibberish is the phonetized input sentence. The error message (when googled), corresponds to something called Pulseaudio. Maybe that could be the root of error instead? Maybe you can check what the model takes as input after all the phonetizing during training/inference and then debug from there. Good luck.

0reactions
prateek-77commented, Dec 12, 2022

Thank you very much for your help @p0p4k !

Read more comments on GitHub >

github_iconTop Results From Across the Web

What to do when you get an error - Hugging Face Course
The first suggestion is asking us to check whether the model ID is actually correct, so the first order of business is to...
Read more >
When Adversarial Training Meets Vision Transformers
Metareview: The paper studies how to properly conduct adversarial training on ViTs to obtain adversarially robust models. The paper mainlyΒ ...
Read more >
Vision Transformers (ViT) in Image Recognition - 2022 Guide
The ViT model represents an input image as a series of image patches, like the series of word embeddings used when using transformers...
Read more >
[Online Store] Safe Men - Universitas Muhammadiyah Sidoarjo
Watching Yuan Zheng gently take off the scabbard from his back, leaning against the wall together with the knife.He male enhancement pills at...
Read more >
An Empirical Study of Training End-to-End Vision-and ...
arises: can we train a fully transformer-based VLP model with ViTs as the image encoder? Recent works [20,23,53] that tried to adopt vision...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found