ESPnet2-TTS development plan
See original GitHub issueTODO
- Documentation
-
README.md
-
egs2/TEMPLATE/tts1/README.md
- joint training
- vits training
-
egs2/jvs/tts1/README.md
- fastspeech adap
- vits adap
-
New Features
- Text embedding frontend (haggingface/transformer?)
- Support
from_pretrained
function @kan-bayashi - Support GAN-based training @kan-bayashi #3436
- Support speaker id input @kan-bayashi #3452 #3453 #3490
- Joint training of text2mel and vocoder @kan-bayashi #3501 #3508
- Support language id input @kan-bayashi #3489 #3490
- Integrate the use of parallel_wavegan’s vocoder in inference @kan-bayashi #3513
- Joint trainable vocoders @kan-bayashi
Models
- VITS @kan-bayashi #3436 #3437 #3438 #3439 #3448 #3449
- AdaSpeech https://arxiv.org/abs/2103.00993
- AdaSpeech2 https://arxiv.org/abs/2104.09715
- DenoiSpeech
- Translatotron2
Vocoders
- Pretrained models of HiFiGAN or StyleMelGAN @kan-bayashi
- Libritts
- vctk
- csmsc
- ljspeech
- jsut
- HiFi-GAN @kan-bayashi
- Initial implemention https://github.com/kan-bayashi/ParallelWaveGAN/pull/273 https://github.com/kan-bayashi/ParallelWaveGAN/pull/275 https://github.com/kan-bayashi/ParallelWaveGAN/pull/276 https://github.com/kan-bayashi/ParallelWaveGAN/pull/277
- Tuning https://github.com/kan-bayashi/ParallelWaveGAN/issues/278
- StyleMelGAN @kan-bayashi
- Initial implementation https://github.com/kan-bayashi/ParallelWaveGAN/pull/274
- Tuning https://github.com/kan-bayashi/ParallelWaveGAN/issues/282
Recipe
- つくよみちゃんコーパス @kan-bayashi #3552
- CSS10 @kan-bayashi #3464
- RUSLAN @kan-bayashi #3378 #3390
- HUI-audio-corpus-german @kan-bayashi #3375 #3381 #3391
- KKS dataset @kan-bayashi #3383 #3400
- JTubeSpeech @Takaaki-Saeki #3459
- J-MAC
- J-KAC @TanUkkii007 #3468
- JMD @takenori-y #3394
- AISHELL-3 @ftshijt #3473
- SynPaFlex-Corpus
- The SIWIS French Speech Synthesis Database @takenori-y #3460
- CMU INDIC @peter-yh-wu #3401
- Hi-Fi TTS
- THCHS30 @ftshijt #3473
- DiDiSpeech
- IndicSpeech @peter-yh-wu #3435
Functions
- Multi-lingual G2P
- Korean G2P @kan-bayashi #3383
- Runssian G2P @kan-bayashi #3377
- German G2P @kan-bayashi #3371
- Spanish G2P @kan-bayashi #3373
- French G2P @kan-bayashi #3372
- Greek G2P @kan-bayashi #3463
- Finnish G2P @kan-bayashi #3463
- Hungarian G2P @kan-bayashi #3463
- Dutch G2P @kan-bayashi #3463
- Enhanced Japanenes G2P @kan-bayashi #3558 #3561
- Silence trimming at the beginning and the end of audio @kan-bayashi #3380
- Silence trimming at the middle of audio
- Conversion of MFA alignment to
durations
file - Audio quality checker for filtering
- Transcription quality checker for flitering
- Evaluation stage
- ASR eval @kan-bayashi #3569
-
MOSnet eval - MCD eval
-
FDSD eval
- Quantized decoding
Minor functions
- Overwrite the decoding params @kan-bayashi
- Fix the seed in the inference @kan-bayashi
- TTS inference interface modification @kan-bayashi
Any suggestions are welcome.
Issue Analytics
- State:
- Created 2 years ago
- Reactions:5
- Comments:6 (4 by maintainers)
Top Results From Across the Web
ESPnet: end-to-end speech processing toolkit - GitHub Pages
ESPnet2-TTS realtime demonstration · CMU 11751/18781 2021: ESPnet Tutorial · Run an inference example · Full installation · Run a recipe example ·...
Read more >The 2020 ESPnet Update: New Features, Broadened ...
The project has grown rapidly and now covers a wide range of speech processing applications. Now ESPnet also includes text to speech (TTS),...
Read more >ESPnet-TTS: Unified, Reproducible, and Integratable Open ...
This paper introduces a new end-to-end text-to-speech (E2E-TTS) toolkit named ESPnet-TTS, which is an extension of the open-source speech ...
Read more >The 2020 ESPnet Update: New Features, Broadened ...
This paper describes the recent development of ESPnet (https://github.com/espnet/espnet), an end-to-end speech processing toolkit. This project ...
Read more >espnet · PyPI
We are moving on ESPnet2-based development for TTS. The use of ESPnet1-TTS is deprecated, please use ESPnet2-TTS. SE: Speech enhancement (and separation).
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Quantization?
Any plans for Fre-GAN? https://arxiv.org/abs/2106.02297 https://github.com/rishikksh20/Fre-GAN-pytorch