Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Bug] Strange output results from simple words

See original GitHub issue

Describe the bug

Sometimes I get really strange outputs. Like this one:

` tts --out_path hello.mp3 --text “hello”

tts_models/en/ljspeech/tacotron2-DDC is already downloaded. vocoder_models/en/ljspeech/hifigan_v2 is already downloaded. Using model: Tacotron2 Setting up Audio Processor… | > sample_rate:22050 | > resample:False | > num_mels:80 | > log_func:np.log | > min_level_db:-100 | > frame_shift_ms:None | > frame_length_ms:None | > ref_level_db:20 | > fft_size:1024 | > power:1.5 | > preemphasis:0.0 | > griffin_lim_iters:60 | > signal_norm:False | > symmetric_norm:True | > mel_fmin:0 | > mel_fmax:8000.0 | > pitch_fmin:0.0 | > pitch_fmax:640.0 | > spec_gain:1.0 | > stft_pad_mode:reflect | > max_norm:4.0 | > clip_norm:True | > do_trim_silence:True | > trim_db:60 | > do_sound_norm:False | > do_amp_to_db_linear:True | > do_amp_to_db_mel:True | > do_rms_norm:False | > db_level:None | > stats_path:None | > base:2.718281828459045 | > hop_length:256 | > win_length:1024 Model’s reduction rate r is set to: 1 Vocoder Model: hifigan Setting up Audio Processor… | > sample_rate:22050 | > resample:False | > num_mels:80 | > log_func:np.log | > min_level_db:-100 | > frame_shift_ms:None | > frame_length_ms:None | > ref_level_db:20 | > fft_size:1024 | > power:1.5 | > preemphasis:0.0 | > griffin_lim_iters:60 | > signal_norm:False | > symmetric_norm:True | > mel_fmin:0 | > mel_fmax:8000.0 | > pitch_fmin:0.0 | > pitch_fmax:640.0 | > spec_gain:1.0 | > stft_pad_mode:reflect | > max_norm:4.0 | > clip_norm:True | > do_trim_silence:False | > trim_db:60 | > do_sound_norm:False | > do_amp_to_db_linear:True | > do_amp_to_db_mel:True | > do_rms_norm:False | > db_level:None | > stats_path:None | > base:2.718281828459045 | > hop_length:256 | > win_length:1024 Generator Model: hifigan_generator Discriminator Model: hifigan_discriminator Removing weight norm… Text: hello Text splitted to sentences. [‘hello’] Decoder stopped with max_decoder_steps 500 Processing time: 2.7834296226501465 Real-time factor: 0.4366435912025878 Saving output to hello.mp3 `

No idea what I’m doing wrong.

To Reproduce

tts --out_path hello.mp3 --text "hello"

Expected behavior

No response

Logs

No response

Environment

{
    "CUDA": {
        "GPU": [],
        "available": false,
        "version": "10.2"
    },
    "Packages": {
        "PyTorch_debug": false,
        "PyTorch_version": "1.11.0+cu102",
        "TTS": "0.6.1",
        "numpy": "1.19.5"
    },
    "System": {
        "OS": "Linux",
        "architecture": [
            "64bit",
            "ELF"
        ],
        "processor": "",
        "python": "3.9.2",
        "version": "#1 SMP Debian 5.10.106-1 (2022-03-17)"
    }
}