mirror of https://github.com/coqui-ai/TTS.git
Updated Home (markdown)
parent
3eea670789
commit
2c87fa7aa5
41
Home.md
41
Home.md
|
@ -1 +1,40 @@
|
|||
Welcome to the TTS wiki!
|
||||
# <img src="https://github.com/coqui-ai/TTS/blob/master/images/coqui-log-green-TTS.png" height="56"/>
|
||||
|
||||
TTS is a deep learning based text-to-speech solution. It favors simplicity over complex and large models and yet, it aims to achieve the state of the art results.
|
||||
|
||||
Based on [user study](https://ttschoice.github.io/), TTS is able to achieve on par or better results compared to other commercial and open-source text-to-speech solutions. It also supports various languages and already applied to more than 13 different languages.
|
||||
|
||||
The general architecture we use comprises two separate deep neural networks. The first network computes acoustic features from given text input. The second network produces the voice from the computed acoustic features. We call the first model "text2feat" and the second "vocoder".
|
||||
|
||||
TTS also servers a Speaker Encoder model that can be used for computing speaker embedding vectors for various purposes including speaker verification, speaker identification, multi-speaker text-to-speech models.
|
||||
|
||||
Currently, we implemented the following methods and models.
|
||||
|
||||
### Text-to-Feat Models
|
||||
- Tacotron: [paper](https://arxiv.org/abs/1703.10135)
|
||||
- Tacotron2: [paper](https://arxiv.org/abs/1712.05884)
|
||||
- Glow-TTS: [paper](https://arxiv.org/abs/2005.11129)
|
||||
- Speedy-Speech: [paper](https://arxiv.org/abs/2008.03802)
|
||||
|
||||
### Tricks for more efficient Tacotron learning.
|
||||
* Gradual Training: [blog post](https://erogol.com/gradual-training-with-tacotron-for-faster-convergence/)
|
||||
* Global Style Tokens: [paper](https://arxiv.org/pdf/1803.09017v1.pdf)
|
||||
|
||||
### Attention methods for Tacotron Models
|
||||
- Guided Attention: [paper](https://arxiv.org/abs/1710.08969)
|
||||
- Forward Backward Decoding: [paper](https://arxiv.org/abs/1907.09006)
|
||||
- Graves Attention: [paper](https://arxiv.org/abs/1907.09006)
|
||||
- Double Decoder Consistency: [blog](https://erogol.com/solving-attention-problems-of-tts-models-with-double-decoder-consistency/)
|
||||
|
||||
### Speaker Encoder
|
||||
- GE2E: [paper](https://arxiv.org/abs/1710.10467)
|
||||
- Angular Loss: [paper](https://arxiv.org/pdf/2003.11982.pdf)
|
||||
|
||||
### Vocoders
|
||||
- MelGAN: [paper](https://arxiv.org/abs/1710.10467)
|
||||
- MultiBandMelGAN: [paper](https://arxiv.org/abs/2005.05106)
|
||||
- ParallelWaveGAN: [paper](https://arxiv.org/abs/1910.11480)
|
||||
- GAN-TTS discriminators: [paper](https://arxiv.org/abs/1909.11646)
|
||||
- WaveRNN: [origin](https://github.com/fatchord/WaveRNN/)
|
||||
- WaveGrad: [paper](https://arxiv.org/abs/2009.00713)
|
||||
|
||||
|
|
Loading…
Reference in New Issue