Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Bug] Truncated audio when generating speech

See original GitHub issue

Describe the bug

When generating speech audio from the following text, the generated file contains only a truncated speech audio, that is, the speech audio is interrupted before the sentences are pronounced.

Text:

Multiple debugging information entries may share the same abbreviation table entry. Each compilation unit is associated with a particular abbreviation table, but multiple compilation units may share the same table.

Generated speech audio: tts_output.zip

To Reproduce

$ tts --text 'Multiple debugging information entries may share the same abbreviation table entry. Each compilation unit is associated with a particular abbreviation table, but multiple compilation units may share the same table.'

Expected behavior

Generation of speech audio for the full input text.

Logs

$ tts --text 'Multiple debugging information entries may share the same abbreviation table entry. Each compilation unit is associated with a particular abbreviation table, but multiple compilation units may share the same table.'
 > tts_models/en/ljspeech/tacotron2-DDC is already downloaded.
 > vocoder_models/en/ljspeech/hifigan_v2 is already downloaded.
 > Using model: Tacotron2
 > Setting up Audio Processor...
 | > sample_rate:22050
 | > resample:False
 | > num_mels:80
 | > log_func:np.log
 | > min_level_db:-100
 | > frame_shift_ms:None
 | > frame_length_ms:None
 | > ref_level_db:20
 | > fft_size:1024
 | > power:1.5
 | > preemphasis:0.0
 | > griffin_lim_iters:60
 | > signal_norm:False
 | > symmetric_norm:True
 | > mel_fmin:0
 | > mel_fmax:8000.0
 | > pitch_fmin:0.0
 | > pitch_fmax:640.0
 | > spec_gain:1.0
 | > stft_pad_mode:reflect
 | > max_norm:4.0
 | > clip_norm:True
 | > do_trim_silence:True
 | > trim_db:60
 | > do_sound_norm:False
 | > do_amp_to_db_linear:True
 | > do_amp_to_db_mel:True
 | > do_rms_norm:False
 | > db_level:None
 | > stats_path:None
 | > base:2.718281828459045
 | > hop_length:256
 | > win_length:1024
 > Model's reduction rate `r` is set to: 1
 > Vocoder Model: hifigan
 > Setting up Audio Processor...
 | > sample_rate:22050
 | > resample:False
 | > num_mels:80
 | > log_func:np.log
 | > min_level_db:-100
 | > frame_shift_ms:None
 | > frame_length_ms:None
 | > ref_level_db:20
 | > fft_size:1024
 | > power:1.5
 | > preemphasis:0.0
 | > griffin_lim_iters:60
 | > signal_norm:False
 | > symmetric_norm:True
 | > mel_fmin:0
 | > mel_fmax:8000.0
 | > pitch_fmin:0.0
 | > pitch_fmax:640.0
 | > spec_gain:1.0
 | > stft_pad_mode:reflect
 | > max_norm:4.0
 | > clip_norm:True
 | > do_trim_silence:False
 | > trim_db:60
 | > do_sound_norm:False
 | > do_amp_to_db_linear:True
 | > do_amp_to_db_mel:True
 | > do_rms_norm:False
 | > db_level:None
 | > stats_path:None
 | > base:2.718281828459045
 | > hop_length:256
 | > win_length:1024
 > Generator Model: hifigan_generator
 > Discriminator Model: hifigan_discriminator
Removing weight norm...
 > Text: Multiple debugging information entries may share the same abbreviation table entry. Each compilation unit is associated with a particular abbreviation table, but multiple compilation units may share the same table.
 > Text splitted to sentences.
['Multiple debugging information entries may share the same abbreviation table entry.', 'Each compilation unit is associated with a particular abbreviation table, but multiple compilation units may share the same table.']
   > Decoder stopped with `max_decoder_steps` 500
 > Processing time: 5.151856899261475
 > Real-time factor: 0.4295162001993176
 > Saving output to tts_output.wav

Environment

{
    "CUDA": {
        "GPU": [
            "Quadro RTX 5000 with Max-Q Design"
        ],
        "available": true,
        "version": "10.2"
    },
    "Packages": {
        "PyTorch_debug": false,
        "PyTorch_version": "1.11.0+cu102",
        "TTS": "0.6.2",
        "numpy": "1.19.5"
    },
    "System": {
        "OS": "Linux",
        "architecture": [
            "64bit",
            "ELF"
        ],
        "processor": "",
        "python": "3.9.12",
        "version": "#1 SMP PREEMPT Debian 5.16.18-1 (2022-03-29)"
    }
}

Additional context

No response

Issue Analytics

State:
Created a year ago
Comments:6 (1 by maintainers)

Top GitHub Comments

3reactions

koutheircommented, Apr 21, 2022

There is a limit on the sentence length for the default model.

What is that exact limit?
How can one find this information for a particular model?
How is the length calculated (character count, word count, etc.)?

1reaction

a-t-0commented, Jul 9, 2022

I would also like to know how to know the information @koutheir asked for 😃 :

* What is that exact limit?

* How can one find this information for a particular model?

* How is the length calculated (character count, word count, etc.)?

So it would be nice if the question could be re-opened.

FYI, here is a work-around given in this issue: Add:

"max_decoder_steps": 5000

at the end of the config file of the (default) model, in:

~/.local/share/tts/vocoder_models--en--ljspeech--hifigan_v2/config.json

Note, json expects a comma at the end of a line if the line is followed by another property/line. So you have to add a comma to the previous last line, and ensure you don’t add a comma to your own "max_decoder_steps": 5000 command. TL;DR: make the last lines look like:

    // PATHS
    "output_path": "/home/erogol/gdrive/Trainings/sam/",
    // Custom limit made larger
    "max_decoder_steps": 5000
}

Top Results From Across the Web

Audio bug: Beginning parts of speech are cut off, page 2 - Forum

One fix that worked for me: Try going in to options and switching to a different resolution. Then switch to your default desktop...

TTS troubleshooting tips - IBM

A synthesized audio output is truncated The amount of text included in one speak request is limited to the amount of text that...

Stops working after long gap with no speech? #29 - GitHub

Truncating off the non-speech segment and restarting gets me a sensible transcription. I can put up a test file if it would be...

Applying noise reduction techniques and restoration effects

To achieve the best results with the Noise Reduction effect, apply it to audio with no DC offset. With a DC offset, this...

python - Truncated speech-to-text output from wav file with ...

@BhavyaParikh I guess should clarify that I'm not getting an error per se, but rather the recognizer is not computing the entire audio...

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

[Bug] Truncated audio when generating speech

Describe the bug

To Reproduce

Expected behavior

Logs

Environment

Additional context

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

[Bug] Strange output results from simple words

[Bug] ValueError: No phonemizer found for language de.