mirror of https://github.com/coqui-ai/TTS.git
README.md update
parent
664f42df33
commit
c9e2df1451
38
README.md
38
README.md
|
@ -1,7 +1,10 @@
|
||||||
<p align="center"><img src="https://user-images.githubusercontent.com/1402048/52643646-c2102980-2edd-11e9-8c37-b72f3c89a640.png" data-canonical-src="
|
<p align="center"><img src="https://user-images.githubusercontent.com/1402048/52643646-c2102980-2edd-11e9-8c37-b72f3c89a640.png" data-canonical-src="
|
||||||
" width="320" height="95" /></p>
|
" width="320" height="95" /></p>
|
||||||
|
|
||||||
|
<center>
|
||||||
<img src="https://travis-ci.org/mozilla/TTS.svg?branch=dev"/>
|
<img src="https://travis-ci.org/mozilla/TTS.svg?branch=dev"/>
|
||||||
|
[](https://discourse.mozilla.org/c/tts)
|
||||||
|
</center>
|
||||||
|
|
||||||
This project is a part of [Mozilla Common Voice](https://voice.mozilla.org/en). TTS aims a deep learning based Text2Speech engine, low in cost and high in quality.
|
This project is a part of [Mozilla Common Voice](https://voice.mozilla.org/en). TTS aims a deep learning based Text2Speech engine, low in cost and high in quality.
|
||||||
|
|
||||||
|
@ -38,25 +41,26 @@ Vocoders:
|
||||||
You can also help us implement more models. Some TTS related work can be found [here](https://github.com/erogol/TTS-papers).
|
You can also help us implement more models. Some TTS related work can be found [here](https://github.com/erogol/TTS-papers).
|
||||||
|
|
||||||
## Features
|
## Features
|
||||||
- High performance Deep Learning models for Text2Speech related tasks.
|
- High performance Deep Learning models for Text2Speech tasks.
|
||||||
- Text2Speech models (Tacotron, Tacotron2).
|
- Text2Spec models (Tacotron, Tacotron2).
|
||||||
- Speaker Encoder to compute speaker embeddings efficiently.
|
- Speaker Encoder to compute speaker embeddings efficiently.
|
||||||
- Vocoder models (MelGAN, Multiband-MelGAN, GAN-TTS)
|
- Vocoder models (MelGAN, Multiband-MelGAN, GAN-TTS, ParallelWaveGAN)
|
||||||
- Support for multi-speaker TTS training.
|
|
||||||
- Support for Multi-GPUs training.
|
|
||||||
- Ability to convert Torch models to Tensorflow 2.0 for inference.
|
|
||||||
- Released pre-trained models.
|
|
||||||
- Fast and efficient model training.
|
- Fast and efficient model training.
|
||||||
- Detailed training logs on console and Tensorboard.
|
- Detailed training logs on console and Tensorboard.
|
||||||
|
- Support for multi-speaker TTS.
|
||||||
|
- Efficient Multi-GPUs training.
|
||||||
|
- Ability to convert PyTorch models to Tensorflow 2.0 and TFLite for inference.
|
||||||
|
- Released models in PyTorch, Tensorflow and TFLite.
|
||||||
- Tools to curate Text2Speech datasets under```dataset_analysis```.
|
- Tools to curate Text2Speech datasets under```dataset_analysis```.
|
||||||
- Demo server for model testing.
|
- Demo server for model testing.
|
||||||
- Notebooks for extensive model benchmarking.
|
- Notebooks for extensive model benchmarking.
|
||||||
- Modular (but not too much) code base enabling easy testing for new ideas.
|
- Modular (but not too much) code base enabling easy testing for new ideas.
|
||||||
|
|
||||||
## Requirements and Installation
|
## Main Requirements and Installation
|
||||||
Highly recommended to use [miniconda](https://conda.io/miniconda.html) for easier installation.
|
Highly recommended to use [miniconda](https://conda.io/miniconda.html) for easier installation.
|
||||||
* python>=3.6
|
* python>=3.6
|
||||||
* pytorch>=0.4.1
|
* pytorch>=1.4.1
|
||||||
|
* tensorflow>=2.2
|
||||||
* librosa
|
* librosa
|
||||||
* tensorboard
|
* tensorboard
|
||||||
* tensorboardX
|
* tensorboardX
|
||||||
|
@ -107,21 +111,7 @@ Audio examples: [soundcloud](https://soundcloud.com/user-565970875/pocket-articl
|
||||||
|
|
||||||
<img src="images/example_model_output.png?raw=true" alt="example_output" width="400"/>
|
<img src="images/example_model_output.png?raw=true" alt="example_output" width="400"/>
|
||||||
|
|
||||||
## Runtime
|
## [Mozilla TTS Tutorials and Notebooks](https://github.com/mozilla/TTS/wiki/TTS-Notebooks-and-Tutorials)
|
||||||
The most time-consuming part is the vocoder algorithm (Griffin-Lim) which runs on CPU. By setting its number of iterations lower, you might have faster execution with a small loss of quality. Some of the experimental values are below.
|
|
||||||
|
|
||||||
Sentence: "It took me quite a long time to develop a voice, and now that I have it I'm not going to be silent."
|
|
||||||
|
|
||||||
Audio length is approximately 6 secs.
|
|
||||||
|
|
||||||
| Time (secs) | System | # GL iters | Model
|
|
||||||
| ---- |:-------|:-----------| ---- |
|
|
||||||
|2.00|GTX1080Ti|30|Tacotron|
|
|
||||||
|3.01|GTX1080Ti|60|Tacotron|
|
|
||||||
|3.57|CPU|60|Tacotron|
|
|
||||||
|5.27|GTX1080Ti|60|Tacotron2|
|
|
||||||
|6.50|CPU|60|Tacotron2|
|
|
||||||
|
|
||||||
|
|
||||||
## Datasets and Data-Loading
|
## Datasets and Data-Loading
|
||||||
TTS provides a generic dataloader easy to use for new datasets. You need to write an preprocessor function to integrate your own dataset.Check ```datasets/preprocess.py``` to see some examples. After the function, you need to set ```dataset``` field in ```config.json```. Do not forget other data related fields too.
|
TTS provides a generic dataloader easy to use for new datasets. You need to write an preprocessor function to integrate your own dataset.Check ```datasets/preprocess.py``` to see some examples. After the function, you need to set ```dataset``` field in ```config.json```. Do not forget other data related fields too.
|
||||||
|
|
Loading…
Reference in New Issue