mirror of https://github.com/coqui-ai/TTS.git
README update
parent
14d33662ea
commit
b78aa5d322
127
README.md
127
README.md
|
@ -1,60 +1,41 @@
|
|||
<p align="center"><img src="https://user-images.githubusercontent.com/1402048/52643646-c2102980-2edd-11e9-8c37-b72f3c89a640.png" data-canonical-src="
|
||||
" width="320" height="95" /></p>
|
||||
<img src="https://user-images.githubusercontent.com/1402048/52643646-c2102980-2edd-11e9-8c37-b72f3c89a640.png" data-canonical-src="
|
||||
" width="300" height="90" align="right" />
|
||||
|
||||
<br/>
|
||||
# TTS: Text-to-Speech for all.
|
||||
|
||||
<p align='center'>
|
||||
<img src='https://circleci.com/gh/mozilla/TTS/tree/dev.svg?style=svg' alt="mozilla"/>
|
||||
<a href='https://discourse.mozilla.org/c/tts'><img src="https://img.shields.io/badge/discourse-online-green.svg"/></a>
|
||||
<a href='https://opensource.org/licenses/MPL-2.0'> <img src="https://img.shields.io/badge/License-MPL%202.0-brightgreen.svg"/></a>
|
||||
</p>
|
||||
TTS is a library for advanced Text-to-Speech generation. It's built on the latest research, was designed to be achive the best trade-off among ease-of-training, speed and quality.
|
||||
TTS comes with [pretrained models](https://github.com/mozilla/TTS/wiki/Released-Models), tools for measuring dataset quality and already used in **20+ languages** for products and research projects.
|
||||
|
||||
<br/>
|
||||
[]()
|
||||
[](https://opensource.org/licenses/MPL-2.0)
|
||||
|
||||
TTS is a deep learning based Text2Speech project, low in cost and high in quality.
|
||||
:loudspeaker: [English Voice Samples](https://erogol.github.io/ddc-samples/) and [SoundCloud playlist](https://soundcloud.com/user-565970875/pocket-article-wavernn-and-tacotron2)
|
||||
|
||||
English Voice Samples: https://erogol.github.io/ddc-samples/
|
||||
:man_cook: [TTS training recipes](https://github.com/erogol/TTS_recipes)
|
||||
|
||||
TTS training recipes: https://github.com/erogol/TTS_recipes
|
||||
:page_facing_up: [Text-to-Speech paper collection](https://github.com/erogol/TTS-papers)
|
||||
|
||||
TTS paper collection: https://github.com/erogol/TTS-papers
|
||||
## 💬 Where to ask questions
|
||||
Please use our dedicated channels for questions and discussion. Help is much more valuable if it's shared publicly, so that more people can benefit from it.
|
||||
|
||||
[](https://sourcerer.io/fame/erogol/erogol/TTS/links/0)[](https://sourcerer.io/fame/erogol/erogol/TTS/links/1)[](https://sourcerer.io/fame/erogol/erogol/TTS/links/2)[](https://sourcerer.io/fame/erogol/erogol/TTS/links/3)[](https://sourcerer.io/fame/erogol/erogol/TTS/links/4)[](https://sourcerer.io/fame/erogol/erogol/TTS/links/5)[](https://sourcerer.io/fame/erogol/erogol/TTS/links/6)[](https://sourcerer.io/fame/erogol/erogol/TTS/links/7)
|
||||
| Type | Platforms |
|
||||
| ------------------------------- | --------------------------------------- |
|
||||
| 🚨 **Bug Reports** | [GitHub Issue Tracker] |
|
||||
| 🎁 **Feature Requests & Ideas** | [GitHub Issue Tracker] |
|
||||
| 👩💻 **Usage Questions** | [Discourse Forum] |
|
||||
| 🗯 **General Discussion** | [Discourse Forum] and [Matrix Channel] |
|
||||
|
||||
## TTS Performance
|
||||
[github issue tracker]: https://github.com/mozilla/tts/issues
|
||||
[discourse forum]: https://discourse.mozilla.org/c/tts/
|
||||
[matrix channel]: https://matrix.to/#/!KTePhNahjgiVumkqca:matrix.org?via=matrix.org
|
||||
|
||||
|
||||
## 🥇 TTS Performance
|
||||
<p align="center"><img src="https://discourse-prod-uploads-81679984178418.s3.dualstack.us-west-2.amazonaws.com/optimized/3X/6/4/6428f980e9ec751c248e591460895f7881aec0c6_2_1035x591.png" width="800" /></p>
|
||||
|
||||
"Mozilla*" and "Judy*" are our models.
|
||||
[Details...](https://github.com/mozilla/TTS/wiki/Mean-Opinion-Score-Results)
|
||||
|
||||
## Provided Models and Methods
|
||||
Text-to-Spectrogram:
|
||||
- Tacotron: [paper](https://arxiv.org/abs/1703.10135)
|
||||
- Tacotron2: [paper](https://arxiv.org/abs/1712.05884)
|
||||
- Glow-TTS: [paper](https://arxiv.org/abs/2005.11129)
|
||||
- Speedy-Speech [paper](https://arxiv.org/pdf/2008.03802.pdf)
|
||||
|
||||
Attention Methods:
|
||||
- Guided Attention: [paper](https://arxiv.org/abs/1710.08969)
|
||||
- Forward Backward Decoding: [paper](https://arxiv.org/abs/1907.09006)
|
||||
- Graves Attention: [paper](https://arxiv.org/abs/1907.09006)
|
||||
- Double Decoder Consistency: [blog](https://erogol.com/solving-attention-problems-of-tts-models-with-double-decoder-consistency/)
|
||||
- Dynamic Convolutional Attention [paper](https://arxiv.org/abs/1910.10288)
|
||||
|
||||
Speaker Encoder:
|
||||
- GE2E: [paper](https://arxiv.org/abs/1710.10467)
|
||||
- Angular-Prototypical [paper](https://arxiv.org/pdf/2003.11982.pdf)
|
||||
|
||||
Vocoders:
|
||||
- MelGAN: [paper](https://arxiv.org/abs/1710.10467)
|
||||
- MultiBandMelGAN: [paper](https://arxiv.org/abs/2005.05106)
|
||||
- ParallelWaveGAN: [paper](https://arxiv.org/abs/1910.11480)
|
||||
- GAN-TTS discriminators: [paper](https://arxiv.org/abs/1909.11646)
|
||||
- WaveRNN: [origin][https://github.com/fatchord/WaveRNN/]
|
||||
- WaveGrad: [paper][https://arxiv.org/abs/2009.00713]
|
||||
|
||||
You can also help us implement more models. Some TTS related work can be found [here](https://github.com/erogol/TTS-papers).
|
||||
|
||||
## Features
|
||||
- High performance Deep Learning models for Text2Speech tasks.
|
||||
- Text2Spec models (Tacotron, Tacotron2).
|
||||
|
@ -71,24 +52,35 @@ You can also help us implement more models. Some TTS related work can be found [
|
|||
- Notebooks for extensive model benchmarking.
|
||||
- Modular (but not too much) code base enabling easy testing for new ideas.
|
||||
|
||||
## Main Requirements and Installation
|
||||
Highly recommended to use [miniconda](https://conda.io/miniconda.html) for easier installation.
|
||||
* python>=3.6
|
||||
* pytorch>=1.5.0
|
||||
* tensorflow>=2.3
|
||||
* librosa
|
||||
* tensorboard
|
||||
* tensorboardX
|
||||
* matplotlib
|
||||
* unidecode
|
||||
## Implemented Models
|
||||
### Text-to-Spectrogram
|
||||
- Tacotron: [paper](https://arxiv.org/abs/1703.10135)
|
||||
- Tacotron2: [paper](https://arxiv.org/abs/1712.05884)
|
||||
- Glow-TTS: [paper](https://arxiv.org/abs/2005.11129)
|
||||
|
||||
Install TTS using ```setup.py```. It will install all of the requirements automatically and make TTS available to all the python environment as an ordinary python module.
|
||||
### Attention Methods
|
||||
- Guided Attention: [paper](https://arxiv.org/abs/1710.08969)
|
||||
- Forward Backward Decoding: [paper](https://arxiv.org/abs/1907.09006)
|
||||
- Graves Attention: [paper](https://arxiv.org/abs/1907.09006)
|
||||
- Double Decoder Consistency: [blog](https://erogol.com/solving-attention-problems-of-tts-models-with-double-decoder-consistency/)
|
||||
|
||||
```python setup.py develop```
|
||||
### Speaker Encoder
|
||||
- GE2E: [paper](https://arxiv.org/abs/1710.10467)
|
||||
|
||||
Or you can use ```requirements.txt``` to install the requirements only.
|
||||
### Vocoders
|
||||
- MelGAN: [paper](https://arxiv.org/abs/1710.10467)
|
||||
- MultiBandMelGAN: [paper](https://arxiv.org/abs/2005.05106)
|
||||
- ParallelWaveGAN: [paper](https://arxiv.org/abs/1910.11480)
|
||||
- GAN-TTS discriminators: [paper](https://arxiv.org/abs/1909.11646)
|
||||
- WaveRNN: [origin](https://github.com/fatchord/WaveRNN/)
|
||||
- WaveGrad: [paper](https://arxiv.org/abs/2009.00713)
|
||||
|
||||
```pip install -r requirements.txt```
|
||||
You can also help us implement more models. Some TTS related work can be found [here](https://github.com/erogol/TTS-papers).
|
||||
|
||||
## Install TTS
|
||||
TTS supports **python >= 3.6**.
|
||||
|
||||
```python setup.py install``` or ```python setup.py develop``` to keep your installation in your working directory.
|
||||
|
||||
### Directory Structure
|
||||
```
|
||||
|
@ -114,7 +106,7 @@ Or you can use ```requirements.txt``` to install the requirements only.
|
|||
### Docker
|
||||
A docker image is created by [@synesthesiam](https://github.com/synesthesiam) and shared in a separate [repository](https://github.com/synesthesiam/docker-mozillatts) with the latest LJSpeech models.
|
||||
|
||||
## Release Models
|
||||
## Released Models
|
||||
Please visit [our wiki.](https://github.com/mozilla/TTS/wiki/Released-Models)
|
||||
|
||||
## Sample Model Output
|
||||
|
@ -126,7 +118,7 @@ Audio examples: [soundcloud](https://soundcloud.com/user-565970875/pocket-articl
|
|||
|
||||
<img src="images/example_model_output.png?raw=true" alt="example_output" width="400"/>
|
||||
|
||||
## [Mozilla TTS Tutorials and Notebooks](https://github.com/mozilla/TTS/wiki/TTS-Notebooks-and-Tutorials)
|
||||
## [TTS Tutorials and Notebooks](https://github.com/mozilla/TTS/wiki/TTS-Notebooks-and-Tutorials)
|
||||
|
||||
## Datasets and Data-Loading
|
||||
TTS provides a generic dataloader easy to use for your custom dataset.
|
||||
|
@ -178,6 +170,7 @@ In case of any error or intercepted execution, if there is no checkpoint yet und
|
|||
You can also enjoy Tensorboard, if you point Tensorboard argument```--logdir``` to the experiment folder.
|
||||
|
||||
## Contribution guidelines
|
||||
This repository is governed by Mozilla's code of conduct and etiquette guidelines. For more details, please read the [Mozilla Community Participation Guidelines.](https://www.mozilla.org/about/governance/policies/participation/)
|
||||
|
||||
Please send your Pull Request to ```dev``` branch. Before making a Pull Request, check your changes for basic mistakes and style problems by using a linter. We have cardboardlinter setup in this repository, so for example, if you've made some changes and would like to run the linter on just the changed code, you can use the follow command:
|
||||
|
||||
|
@ -191,9 +184,8 @@ If you like to use TTS to try a new idea and like to share your experiments with
|
|||
(If you have an idea for better collaboration, let us know)
|
||||
- Create a new branch.
|
||||
- Open an issue pointing your branch.
|
||||
- Explain your experiment.
|
||||
- Share your results as you proceed. (Tensorboard log files, audio results, visuals etc.)
|
||||
- Use LJSpeech dataset (for English) if you like to compare results with the released models. (It is the most open scalable dataset for quick experimentation)
|
||||
- Explain your idea and experiment.
|
||||
- Share your results regularly. (Tensorboard log files, audio results, visuals etc.)
|
||||
|
||||
## [Contact/Getting Help](https://github.com/mozilla/TTS/wiki/Contact-and-Getting-Help)
|
||||
|
||||
|
@ -207,17 +199,6 @@ If you like to use TTS to try a new idea and like to share your experiments with
|
|||
- [x] Multi-speaker embedding.
|
||||
- [x] Model optimization (model export, model pruning etc.)
|
||||
|
||||
<!--## References
|
||||
- [Efficient Neural Audio Synthesis](https://arxiv.org/pdf/1802.08435.pdf)
|
||||
- [Attention-Based models for speech recognition](https://arxiv.org/pdf/1506.07503.pdf)
|
||||
- [Generating Sequences With Recurrent Neural Networks](https://arxiv.org/pdf/1308.0850.pdf)
|
||||
- [Char2Wav: End-to-End Speech Synthesis](https://openreview.net/pdf?id=B1VWyySKx)
|
||||
- [VoiceLoop: Voice Fitting and Synthesis via a Phonological Loop](https://arxiv.org/pdf/1707.06588.pdf)
|
||||
- [WaveRNN](https://arxiv.org/pdf/1802.08435.pdf)
|
||||
- [Faster WaveNet](https://arxiv.org/abs/1611.09482)
|
||||
- [Parallel WaveNet](https://arxiv.org/abs/1711.10433)
|
||||
-->
|
||||
|
||||
### Acknowledgement
|
||||
- https://github.com/keithito/tacotron (Dataset pre-processing)
|
||||
- https://github.com/r9y9/tacotron_pytorch (Initial Tacotron architecture)
|
||||
|
|
Loading…
Reference in New Issue