Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

LibriTTS out-of-box produces incoherent speech

See original GitHub issue

Using out of the box training produces results that are not forming coherent words. Initially running prepare_libri.ipnb with 20 speakers, then running as MFA instructed, I encountered size mismatches, to which I saw running tacotron’s extract_duration.py should resolve - and it has.

So running

    bash ttsexamples/mfa_extraction/scripts/prepare_mfa.sh
    python ttsexamples/mfa_extraction/run_mfa.py --corpus_directory ./libritts --output_directory ./mfa/parsed --jobs 8
    python ttsexamples/mfa_extraction/txt_grid_parser.py \
  --yaml_path ttsexamples/fastspeech2_libritts/conf/fastspeech2libritts.yaml \
  --dataset_path ./libritts \
  --text_grid_path ./mfa/parsed \
  --output_durations_path ./libritts/durations \
  --sample_rate 24000 

    tensorflow-tts-preprocess --rootdir ./libritts \
  --outdir ./dump_libritts \
  --config preprocess/libritts_preprocess.yaml \
  --dataset libritts
    
    tensorflow-tts-normalize --rootdir ./dump_libritts \
  --outdir ./dump_libritts \
  --config preprocess/libritts_preprocess.yaml \
  --dataset libritts

-> running the MFA since it generates the train.txt required later

and then extracting durations (for train and valid)

CUDA_VISIBLE_DEVICES=0 python ttsexamples/tacotron2/extract_duration.py \
  --rootdir ./dump_libritts/train/ \
  --outdir ./dump_libritts/train/durations/ \
  --checkpoint ./ttsexamples/tacotron2/exp/train.tacotron2.v1/checkpoints/model-120000.h5 \
  --use-norm 1 \
  --config ./ttsexamples/tacotron2/conf/tacotron2.v1.yaml \
  --batch-size 32 \
  --win-front 3 \
  --win-back 3

and finally running

bash ttsexamples/fastspeech2_libritts/scripts/train_libri.sh

This ultimately does not generate proper speech, testing with the libritts pretrained vocoder (nor other vocoders)

config = AutoConfig.from_pretrained("../pretrained/mbvocs24k/multiband_melgan.v1_24k.yaml")
mb_melgan = TFAutoModel.from_pretrained(
    config=config, 
    pretrained_path='../pretrained/mbvocs24k/libritts_24k.h5', # "../examples/fastspeech2/checkpoints/model-150000.h5",
    name="melgan"
)

Notes: I’ve changed the hop size to 300 in the yaml configurations according to previous issues.

Would appreciate any hint on what is going on/what’s wrong. Would love to upload and contribute a generated model at the end

Issue Analytics

State:
Created 2 years ago
Comments:5

Top GitHub Comments

1reaction

ZDisketcommented, Sep 14, 2021

@shachar-ug What Tacotron2 model are you using for extracting durations? It has to match the exact same dataset you’re trying to train it on.

0reactions

stale[bot]commented, Nov 15, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

Top Results From Across the Web

LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech

This paper introduces a new speech corpus called "LibriTTS" designed for text-to-speech use. It is derived from the original audio and text ...

LibriTTS corpus - OpenSLR

The LibriTTS corpus is designed for TTS research. It is derived from the original materials (mp3 audio files from LibriVox and text files...

LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech

This paper introduces a new speech corpus called “LibriTTS” designed for text-to-speech use. It is derived from the original audio and text materials...

LibriTTS - Google Research

LibriTTS is a multi-speaker English corpus of approximately 585 hours of read English speech at 24kHz sampling rate. The LibriTTS corpus is designed...

LibriTTS Dataset - Papers With Code

LibriTTS is a multi-speaker English corpus of approximately 585 hours of read English speech at 24kHz sampling rate, prepared by Heiga Zen with...