[Bug] Strange output results from simple words
See original GitHub issueDescribe the bug
Sometimes I get really strange outputs. Like this one:
` tts --out_path hello.mp3 --text “hello”
tts_models/en/ljspeech/tacotron2-DDC is already downloaded. vocoder_models/en/ljspeech/hifigan_v2 is already downloaded. Using model: Tacotron2 Setting up Audio Processor… | > sample_rate:22050 | > resample:False | > num_mels:80 | > log_func:np.log | > min_level_db:-100 | > frame_shift_ms:None | > frame_length_ms:None | > ref_level_db:20 | > fft_size:1024 | > power:1.5 | > preemphasis:0.0 | > griffin_lim_iters:60 | > signal_norm:False | > symmetric_norm:True | > mel_fmin:0 | > mel_fmax:8000.0 | > pitch_fmin:0.0 | > pitch_fmax:640.0 | > spec_gain:1.0 | > stft_pad_mode:reflect | > max_norm:4.0 | > clip_norm:True | > do_trim_silence:True | > trim_db:60 | > do_sound_norm:False | > do_amp_to_db_linear:True | > do_amp_to_db_mel:True | > do_rms_norm:False | > db_level:None | > stats_path:None | > base:2.718281828459045 | > hop_length:256 | > win_length:1024 Model’s reduction rate
r
is set to: 1 Vocoder Model: hifigan Setting up Audio Processor… | > sample_rate:22050 | > resample:False | > num_mels:80 | > log_func:np.log | > min_level_db:-100 | > frame_shift_ms:None | > frame_length_ms:None | > ref_level_db:20 | > fft_size:1024 | > power:1.5 | > preemphasis:0.0 | > griffin_lim_iters:60 | > signal_norm:False | > symmetric_norm:True | > mel_fmin:0 | > mel_fmax:8000.0 | > pitch_fmin:0.0 | > pitch_fmax:640.0 | > spec_gain:1.0 | > stft_pad_mode:reflect | > max_norm:4.0 | > clip_norm:True | > do_trim_silence:False | > trim_db:60 | > do_sound_norm:False | > do_amp_to_db_linear:True | > do_amp_to_db_mel:True | > do_rms_norm:False | > db_level:None | > stats_path:None | > base:2.718281828459045 | > hop_length:256 | > win_length:1024 Generator Model: hifigan_generator Discriminator Model: hifigan_discriminator Removing weight norm… Text: hello Text splitted to sentences. [‘hello’] Decoder stopped withmax_decoder_steps
500 Processing time: 2.7834296226501465 Real-time factor: 0.4366435912025878 Saving output to hello.mp3 `
No idea what I’m doing wrong.
To Reproduce
tts --out_path hello.mp3 --text "hello"
Expected behavior
No response
Logs
No response
Environment
{
"CUDA": {
"GPU": [],
"available": false,
"version": "10.2"
},
"Packages": {
"PyTorch_debug": false,
"PyTorch_version": "1.11.0+cu102",
"TTS": "0.6.1",
"numpy": "1.19.5"
},
"System": {
"OS": "Linux",
"architecture": [
"64bit",
"ELF"
],
"processor": "",
"python": "3.9.2",
"version": "#1 SMP Debian 5.10.106-1 (2022-03-17)"
}
}
Additional context
No response
Issue Analytics
- State:
- Created a year ago
- Comments:13 (7 by maintainers)
Top GitHub Comments
You need to end with punctuation for most of the models since they are trained with datasets in which texts always end with punctuations.
Hi, yes, “hello.” works. Is there a reason not to automatically add “.” to sentences inputted if
$input !~ /\.$/
?