[Bug] Truncated audio when generating speech
See original GitHub issueDescribe the bug
When generating speech audio from the following text, the generated file contains only a truncated speech audio, that is, the speech audio is interrupted before the sentences are pronounced.
Text:
Multiple debugging information entries may share the same abbreviation table entry. Each compilation unit is associated with a particular abbreviation table, but multiple compilation units may share the same table.
Generated speech audio: tts_output.zip
To Reproduce
$ tts --text 'Multiple debugging information entries may share the same abbreviation table entry. Each compilation unit is associated with a particular abbreviation table, but multiple compilation units may share the same table.'
Expected behavior
Generation of speech audio for the full input text.
Logs
$ tts --text 'Multiple debugging information entries may share the same abbreviation table entry. Each compilation unit is associated with a particular abbreviation table, but multiple compilation units may share the same table.'
> tts_models/en/ljspeech/tacotron2-DDC is already downloaded.
> vocoder_models/en/ljspeech/hifigan_v2 is already downloaded.
> Using model: Tacotron2
> Setting up Audio Processor...
| > sample_rate:22050
| > resample:False
| > num_mels:80
| > log_func:np.log
| > min_level_db:-100
| > frame_shift_ms:None
| > frame_length_ms:None
| > ref_level_db:20
| > fft_size:1024
| > power:1.5
| > preemphasis:0.0
| > griffin_lim_iters:60
| > signal_norm:False
| > symmetric_norm:True
| > mel_fmin:0
| > mel_fmax:8000.0
| > pitch_fmin:0.0
| > pitch_fmax:640.0
| > spec_gain:1.0
| > stft_pad_mode:reflect
| > max_norm:4.0
| > clip_norm:True
| > do_trim_silence:True
| > trim_db:60
| > do_sound_norm:False
| > do_amp_to_db_linear:True
| > do_amp_to_db_mel:True
| > do_rms_norm:False
| > db_level:None
| > stats_path:None
| > base:2.718281828459045
| > hop_length:256
| > win_length:1024
> Model's reduction rate `r` is set to: 1
> Vocoder Model: hifigan
> Setting up Audio Processor...
| > sample_rate:22050
| > resample:False
| > num_mels:80
| > log_func:np.log
| > min_level_db:-100
| > frame_shift_ms:None
| > frame_length_ms:None
| > ref_level_db:20
| > fft_size:1024
| > power:1.5
| > preemphasis:0.0
| > griffin_lim_iters:60
| > signal_norm:False
| > symmetric_norm:True
| > mel_fmin:0
| > mel_fmax:8000.0
| > pitch_fmin:0.0
| > pitch_fmax:640.0
| > spec_gain:1.0
| > stft_pad_mode:reflect
| > max_norm:4.0
| > clip_norm:True
| > do_trim_silence:False
| > trim_db:60
| > do_sound_norm:False
| > do_amp_to_db_linear:True
| > do_amp_to_db_mel:True
| > do_rms_norm:False
| > db_level:None
| > stats_path:None
| > base:2.718281828459045
| > hop_length:256
| > win_length:1024
> Generator Model: hifigan_generator
> Discriminator Model: hifigan_discriminator
Removing weight norm...
> Text: Multiple debugging information entries may share the same abbreviation table entry. Each compilation unit is associated with a particular abbreviation table, but multiple compilation units may share the same table.
> Text splitted to sentences.
['Multiple debugging information entries may share the same abbreviation table entry.', 'Each compilation unit is associated with a particular abbreviation table, but multiple compilation units may share the same table.']
> Decoder stopped with `max_decoder_steps` 500
> Processing time: 5.151856899261475
> Real-time factor: 0.4295162001993176
> Saving output to tts_output.wav
Environment
{
"CUDA": {
"GPU": [
"Quadro RTX 5000 with Max-Q Design"
],
"available": true,
"version": "10.2"
},
"Packages": {
"PyTorch_debug": false,
"PyTorch_version": "1.11.0+cu102",
"TTS": "0.6.2",
"numpy": "1.19.5"
},
"System": {
"OS": "Linux",
"architecture": [
"64bit",
"ELF"
],
"processor": "",
"python": "3.9.12",
"version": "#1 SMP PREEMPT Debian 5.16.18-1 (2022-03-29)"
}
}
Additional context
No response
Issue Analytics
- State:
- Created a year ago
- Comments:6 (1 by maintainers)
Top Results From Across the Web
Audio bug: Beginning parts of speech are cut off, page 2 - Forum
One fix that worked for me: Try going in to options and switching to a different resolution. Then switch to your default desktop...
Read more >TTS troubleshooting tips - IBM
A synthesized audio output is truncated The amount of text included in one speak request is limited to the amount of text that...
Read more >Stops working after long gap with no speech? #29 - GitHub
Truncating off the non-speech segment and restarting gets me a sensible transcription. I can put up a test file if it would be...
Read more >Applying noise reduction techniques and restoration effects
To achieve the best results with the Noise Reduction effect, apply it to audio with no DC offset. With a DC offset, this...
Read more >python - Truncated speech-to-text output from wav file with ...
@BhavyaParikh I guess should clarify that I'm not getting an error per se, but rather the recognizer is not computing the entire audio...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I would also like to know how to know the information @koutheir asked for 😃 :
So it would be nice if the question could be re-opened.
FYI, here is a work-around given in this issue: Add:
at the end of the config file of the (default) model, in:
Note, json expects a comma at the end of a line if the line is followed by another property/line. So you have to add a comma to the previous last line, and ensure you don’t add a comma to your own
"max_decoder_steps": 5000
command. TL;DR: make the last lines look like: