mirror of https://github.com/coqui-ai/TTS.git
Add 👑YourTTS docs
parent
649dc9e9da
commit
45b2f8e42e
|
@ -3,10 +3,15 @@
|
|||
VITS (Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
|
||||
) is an End-to-End (encoder -> vocoder together) TTS model that takes advantage of SOTA DL techniques like GANs, VAE,
|
||||
Normalizing Flows. It does not require external alignment annotations and learns the text-to-audio alignment
|
||||
using MAS as explained in the paper. The model architecture is a combination of GlowTTS encoder and HiFiGAN vocoder.
|
||||
using MAS, as explained in the paper. The model architecture is a combination of GlowTTS encoder and HiFiGAN vocoder.
|
||||
It is a feed-forward model with x67.12 real-time factor on a GPU.
|
||||
|
||||
🐸 YourTTS is a multi-speaker and multi-lingual TTS model that can perform voice conversion and zero-shot speaker adaptation.
|
||||
It can also learn a new language or voice with a ~ 1 minute long audio clip. This is a big open gate for training
|
||||
TTS models in low-resources languages. 🐸 YourTTS uses VITS as the backbone architecture coupled with a speaker encoder model.
|
||||
|
||||
## Important resources & papers
|
||||
- 🐸 YourTTS: https://arxiv.org/abs/2112.02418
|
||||
- VITS: https://arxiv.org/pdf/2106.06103.pdf
|
||||
- Neural Spline Flows: https://arxiv.org/abs/1906.04032
|
||||
- Variational Autoencoder: https://arxiv.org/pdf/1312.6114.pdf
|
||||
|
|
Loading…
Reference in New Issue