🐸 TTS roadmap

These are the main dev plans for 🐸 TTS.

If you want to contribute to 🐸 TTS and don’t know where to start you can pick one here and start with our Contribution Guideline. We’re also always here to help.

Feel free to pick one or suggest a new one.

Contributions are always welcome 💪 .

v0.1.0 Milestones

Better model config handling #21
TTS recipes for public datasets.
TTS trainer API to unify all the model training scripts.
TTS, Vocoder and SpeakerEncoder model abstractions and APIs.
Documentation for
- Implementing a new model using 🐸 TTS.
- Training a model on a new dataset from gecko.
- Using Synthesizer interface on CLI or Server.
- Extracting Spectrograms for Vocoder training.
- Contributing a new pre-trained 🐸 TTS model.
- Explanation for Model config parameters/

v0.2.0 Milestones

Grapheme 2 Phoneme in-house conversion. (Thx to gruut 👍 )
Implement VITS model.

v0.3.0 Milestones

Implement generic ForwardTTS API.
Implement Fast Speech model.
Implement Fast Pitch model.

v0.4.0 Milestones

Trainer API v2 - join the discussion
Multi-speaker VCTK recipes for all the TTS.tts models.

v0.5.0 Milestones

Support for multi-lingual models
YourTTS release 🚀

v0.6.0 Milestones

Add ESpeak support
New Tokenizer and Phonemizer APIs #937
New Model API #1078
Splitting the trainer as a separate repo 👟Trainer
Update VITS model API
Gradient accumulation. #560 (in 👟)

v0.7.0 Milestones

Implement Capacitron 👑 @a-froghyar 👑 @WeberJulian
Release pretrained Capacitron

v0.8.0 Milestones

Separate numpy transforms
Better data sampling for VITS
New Thorsten DE models 👑 @thorstenMueller

🏃‍♀️ Milestones along the way

Implement End-to-end training API for ForwardTTS models a vocoder. #1510
Implement a Python voice synthesis API.
Inject phonemes to the input text at inference. #1452
AdaSpeech1/2 https://arxiv.org/pdf/2104.09715 and https://arxiv.org/abs/2103.00993
Let the user pass a custom text cleaner function.
Refactor the text cleaners for a more flexible and transparent API.
Implement HifiGAN2 (not the vocoder)
Implement emotion and style adaptation.
Implement FastSpeech2 (https://arxiv.org/abs/2006.04558).
AutoTTS 🤖 (👑 @loganhart420)
Watermarking TTS outputs to sign against DeepFakes.
Implement SSML v0.0.1
ONNX and TorchScript model exports.
TensorFlow run-time for training models.

🤖 New TTS models

AlignTTS (@erogol)
HiFiGAN (#16 👑 @rishikksh20 and @erogol)
UnivNet Vocoder ( 👑 @rishikksh20)
VITS paper
FastPitch source
Alignment Network paper
End2End TTS combining aligner + tts + vocoder.
Multi-Lingual TTS (#11 👑 @WeberJulian )
ParallelTacotron paper (open for contribution)
Efficient TTS paper (open for contribution)
Gaussian length regulator from https://arxiv.org/pdf/2010.04301.pdf (open for contribution)
LightSpeech from https://arxiv.org/pdf/2102.04040.pdf (open for contribution)
AdaSpeech1/2 https://arxiv.org/pdf/2104.09715 and https://arxiv.org/abs/2103.00993

Issue Analytics

State:
Created 3 years ago
Reactions:52
Comments:42 (17 by maintainers)

Top GitHub Comments

11reactions

AndrewBarfieldcommented, Apr 17, 2021

I’m learning the code/API and performing experiments. I hope to contribute soon.

I’m also wondering if I can donate (money) to Coqui?

5reactions

kerryeoncommented, Feb 23, 2022

Hello, thanks for great works! I’m a fan of Coqui TTS.

I’m porting some of the stuffs in the project to the Rust for the following reasons.

Predictable Performance
Static-typed Metadata & Model Management
Multithreaded Server Implementation
Just I love Rust

The VC in the YourTTS has been successfully implemented. And for this purpose, an example of saving/loading a pretrained Vits model has been added in the repo. I write it on Milestones PR because I think my work can be helpful to others 😃

Repository (RusTTS): https://github.com/kerryeon/rustts

Top Results From Across the Web

TTS 0.10.0 documentation

🐸TTS is a library for advanced Text-to-Speech generation. ... ReadTheDocs. 💾 Installation. TTS/README.md. 👩‍💻 Contributing. CONTRIBUTING.md. 📌 Road Map.

coqui-ai/TTS - Gitter

🐸 💬 TTS topics and discussions. People. Repo info.

TTS - PyPI

🐸TTS is a library for advanced Text-to-Speech generation. It's built on the latest research, was designed to achieve ... Road Map, Main Development...

A Deep Learning toolkit for Text-to-Speech, Battle-tested in ...

🐸TTS is a library for advanced Text-to-Speech generation. It's built on the latest research, ... Road Map, Main Development Plans. 🚀 Released Models,...

TTS | Read the Docs

Description. 🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production. Repository. https://github.com/coqui-ai/TTS.git ...