[Bug] Error while testing a trained VITS Model (Gibberish)
See original GitHub issueDescribe the bug
Thanks for the amazing codebase!
When I try to test my trained VITS model for TTS, the model produces a gibber for 3-4 seconds at the beginning of every sentence and at every punctuation mark (full stop, commas). It is able to speak the rest of the text correctly.
Following is the output generated by the test command:
Using model: vits Setting up Audio Processorβ¦ | > sample_rate:22050 | > resample:False | > num_mels:80 | > log_func:np.log10 | > min_level_db:0 | > frame_shift_ms:None | > frame_length_ms:None | > ref_level_db:None | > fft_size:1024 | > power:None | > preemphasis:0.0 | > griffin_lim_iters:None | > signal_norm:None | > symmetric_norm:None | > mel_fmin:0 | > mel_fmax:None | > pitch_fmin:None | > pitch_fmax:None | > spec_gain:20.0 | > stft_pad_mode:reflect | > max_norm:1.0 | > clip_norm:True | > do_trim_silence:False | > trim_db:60 | > do_sound_norm:False | > do_amp_to_db_linear:True | > do_amp_to_db_mel:True | > do_rms_norm:False | > db_level:None | > stats_path:None | > base:10 | > hop_length:256 | > win_length:1024 Text: As company has expanded its datacenters. to meet demands Text splitted to sentences. [βAs company has expanded its datacenters.β, βto meet demandsβ] [β<BLNK>β, βFβ, β<BLNK>β, βaβ, β<BLNK>β, βiβ, β<BLNK>β, βlβ, β<BLNK>β, βeβ, β<BLNK>β, βdβ, β<BLNK>β, β ', β<BLNK>β, βtβ, β<BLNK>β, βoβ, β<BLNK>β, β ', β<BLNK>β, βcβ, β<BLNK>β, βrβ, β<BLNK>β, βeβ, β<BLNK>β, βaβ, β<BLNK>β, βtβ, β<BLNK>β, βeβ, β<BLNK>β, β ', β<BLNK>β, βsβ, β<BLNK>β, βeβ, β<BLNK>β, βcβ, β<BLNK>β, βuβ, β<BLNK>β, βrβ, β<BLNK>β, βeβ, β<BLNK>β, β ', β<BLNK>β, βdβ, β<BLNK>β, βiβ, β<BLNK>β, βrβ, β<BLNK>β, βeβ, β<BLNK>β, βcβ, β<BLNK>β, βtβ, β<BLNK>β, βoβ, β<BLNK>β, βrβ, β<BLNK>β, βyβ, β<BLNK>β, β ', β<BLNK>β, β:β, β<BLNK>β, β ', β<BLNK>β, βNβ, β<BLNK>β, βoβ, β<BLNK>β, β ', β<BLNK>β, βsβ, β<BLNK>β, βuβ, β<BLNK>β, βcβ, β<BLNK>β, βhβ, β<BLNK>β, β ', β<BLNK>β, βfβ, β<BLNK>β, βiβ, β<BLNK>β, βlβ, β<BLNK>β, βeβ, β<BLNK>β, β ', β<BLNK>β, βoβ, β<BLNK>β, βrβ, β<BLNK>β, β ', β<BLNK>β, βdβ, β<BLNK>β, βiβ, β<BLNK>β, βrβ, β<BLNK>β, βeβ, β<BLNK>β, βcβ, β<BLNK>β, βtβ, β<BLNK>β, βoβ, β<BLNK>β, βrβ, β<BLNK>β, βyβ, β<BLNK>β, βΓ¦β, β<BLNK>β, βzβ, β<BLNK>β, β ', β<BLNK>β, βmβ, β<BLNK>β, βΛβ, β<BLNK>β, βaβ, β<BLNK>β, βΙͺβ, β<BLNK>β, βkβ, β<BLNK>β, βΙΉβ, β<BLNK>β, βΙβ, β<BLNK>β, βsβ, β<BLNK>β, βΛβ, β<BLNK>β, βΙβ, β<BLNK>β, βΛβ, β<BLNK>β, βfβ, β<BLNK>β, βtβ, β<BLNK>β, β ', β<BLNK>β, βhβ, β<BLNK>β, βΙβ, β<BLNK>β, βzβ, β<BLNK>β, β ', β<BLNK>β, βΙβ, β<BLNK>β, βkβ, β<BLNK>β, βsβ, β<BLNK>β, βpβ, β<BLNK>β, βΛβ, β<BLNK>β, βΓ¦β, β<BLNK>β, βnβ, β<BLNK>β, βdβ, β<BLNK>β, βα΅»β, β<BLNK>β, βdβ, β<BLNK>β, β β, β<BLNK>β, βΙͺβ, β<BLNK>β, βtβ, β<BLNK>β, βsβ, β<BLNK>β, β β, β<BLNK>β, βdβ, β<BLNK>β, βΛβ, β<BLNK>β, βeβ, β<BLNK>β, βΙͺβ, β<BLNK>β, βΙΎβ, β<BLNK>β, βΙβ, β<BLNK>β, βsβ, β<BLNK>β, βΛβ, β<BLNK>β, βΙβ, β<BLNK>β, βnβ, β<BLNK>β, βtβ, β<BLNK>β, βΙβ, β<BLNK>β, βzβ, β<BLNK>β, β.β, β<BLNK>β] [!] Character βFβ not found in the vocabulary. Discarding it. [β<BLNK>β, βFβ, β<BLNK>β, βaβ, β<BLNK>β, βiβ, β<BLNK>β, βlβ, β<BLNK>β, βeβ, β<BLNK>β, βdβ, β<BLNK>β, β ', β<BLNK>β, βtβ, β<BLNK>β, βoβ, β<BLNK>β, β ', β<BLNK>β, βcβ, β<BLNK>β, βrβ, β<BLNK>β, βeβ, β<BLNK>β, βaβ, β<BLNK>β, βtβ, β<BLNK>β, βeβ, β<BLNK>β, β ', β<BLNK>β, βsβ, β<BLNK>β, βeβ, β<BLNK>β, βcβ, β<BLNK>β, βuβ, β<BLNK>β, βrβ, β<BLNK>β, βeβ, β<BLNK>β, β ', β<BLNK>β, βdβ, β<BLNK>β, βiβ, β<BLNK>β, βrβ, β<BLNK>β, βeβ, β<BLNK>β, βcβ, β<BLNK>β, βtβ, β<BLNK>β, βoβ, β<BLNK>β, βrβ, β<BLNK>β, βyβ, β<BLNK>β, β ', β<BLNK>β, β:β, β<BLNK>β, β ', β<BLNK>β, βNβ, β<BLNK>β, βoβ, β<BLNK>β, β ', β<BLNK>β, βsβ, β<BLNK>β, βuβ, β<BLNK>β, βcβ, β<BLNK>β, βhβ, β<BLNK>β, β ', β<BLNK>β, βfβ, β<BLNK>β, βiβ, β<BLNK>β, βlβ, β<BLNK>β, βeβ, β<BLNK>β, β ', β<BLNK>β, βoβ, β<BLNK>β, βrβ, β<BLNK>β, β ', β<BLNK>β, βdβ, β<BLNK>β, βiβ, β<BLNK>β, βrβ, β<BLNK>β, βeβ, β<BLNK>β, βcβ, β<BLNK>β, βtβ, β<BLNK>β, βoβ, β<BLNK>β, βrβ, β<BLNK>β, βyβ, β<BLNK>β, βΓ¦β, β<BLNK>β, βzβ, β<BLNK>β, β ', β<BLNK>β, βmβ, β<BLNK>β, βΛβ, β<BLNK>β, βaβ, β<BLNK>β, βΙͺβ, β<BLNK>β, βkβ, β<BLNK>β, βΙΉβ, β<BLNK>β, βΙβ, β<BLNK>β, βsβ, β<BLNK>β, βΛβ, β<BLNK>β, βΙβ, β<BLNK>β, βΛβ, β<BLNK>β, βfβ, β<BLNK>β, βtβ, β<BLNK>β, β ', β<BLNK>β, βhβ, β<BLNK>β, βΙβ, β<BLNK>β, βzβ, β<BLNK>β, β ', β<BLNK>β, βΙβ, β<BLNK>β, βkβ, β<BLNK>β, βsβ, β<BLNK>β, βpβ, β<BLNK>β, βΛβ, β<BLNK>β, βΓ¦β, β<BLNK>β, βnβ, β<BLNK>β, βdβ, β<BLNK>β, βα΅»β, β<BLNK>β, βdβ, β<BLNK>β, β ', β<BLNK>β, βΙͺβ, β<BLNK>β, βtβ, β<BLNK>β, βsβ, β<BLNK>β, β ', β<BLNK>β, βdβ, β<BLNK>β, βΛβ, β<BLNK>β, βeβ, β<BLNK>β, βΙͺβ, β<BLNK>β, βΙΎβ, β<BLNK>β, βΙβ, β<BLNK>β, βsβ, β<BLNK>β, βΛβ, β<BLNK>β, βΙβ, β<BLNK>β, βnβ, β<BLNK>β, βtβ, β<BLNK>β, βΙβ, β<BLNK>β, βzβ, β<BLNK>β, β.β, β<BLNK>β] [!] Character βNβ not found in the vocabulary. Discarding it. Processing time: 13.786689043045044 Real-time factor: 1.211566180171308
I am also not sure why the last 2 lists of letters and <BLNK> is being generated.
I would be grateful for any help on this. Thank you very much.
To Reproduce
tts --text 'As company has expanded its datacenters. to meet demands' --model_name ./model.pth --config_path ./config.json --out_path ./output.wav
Expected behavior
No response
Logs
No response
Environment
- TTS Version - 0.8.0
- PyTorch - 1.13.0
- Python - 3.8.10
- OS - Linux
- CPU
- TTS, PyTorch installed through Pip
Additional context
No response
Issue Analytics
- State:
- Created 9 months ago
- Comments:6 (3 by maintainers)
Top GitHub Comments
I believe the problem is when converting to phonemes. If you ignore the
Failed to create secure directory: No such file or directory
error message, rest of the appended gibberish is the phonetized input sentence. The error message (when googled), corresponds to something called Pulseaudio. Maybe that could be the root of error instead? Maybe you can check what the model takes as input after all the phonetizing during training/inference and then debug from there. Good luck.Thank you very much for your help @p0p4k !