🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

deep-learning glow-tts hifigan melgan multi-speaker-tts python pytorch speaker-encoder speaker-encodings speech speech-synthesis tacotron text-to-speech tts tts-model vocoder voice-cloning voice-conversion voice-synthesis

Go to file

Eren Golge 1ee45b5336 Change config to json 3		2018-01-22 08:29:27 -08:00
datasets	Change config to json 3	2018-01-22 08:29:27 -08:00
layers	New files	2018-01-22 06:59:41 -08:00
models	New files	2018-01-22 06:59:41 -08:00
png	Beginning	2018-01-22 01:48:59 -08:00
samples	Beginning	2018-01-22 01:48:59 -08:00
utils	Change config to json 3	2018-01-22 08:29:27 -08:00
.gitignore	new files	2018-01-22 06:59:21 -08:00
README.md	Beginning	2018-01-22 01:48:59 -08:00
__init__.py	Beginning	2018-01-22 01:48:59 -08:00
config.json	Change config to json 3	2018-01-22 08:29:27 -08:00
module.py	Beginning	2018-01-22 01:48:59 -08:00
requirements.txt	Beginning	2018-01-22 01:48:59 -08:00
synthesis.py	Beginning	2018-01-22 01:48:59 -08:00
train.py	Change config to json 3	2018-01-22 08:29:27 -08:00

README.md

Tacotron-pytorch

A pytorch implementation of Tacotron: A Fully End-to-End Text-To-Speech Synthesis Model.

Requirements

Install python 3
Install pytorch == 0.2.0
Install requirements:
```
pip install -r requirements.txt
```

Data

I used LJSpeech dataset which consists of pairs of text script and wav files. The complete dataset (13,100 pairs) can be downloaded here. I referred https://github.com/keithito/tacotron for the preprocessing code.

File description

hyperparams.py includes all hyper parameters that are needed.
data.py loads training data and preprocess text to index and wav files to spectrogram. Preprocessing codes for text is in text/ directory.
module.py contains all methods, including CBHG, highway, prenet, and so on.
network.py contains networks including encoder, decoder and post-processing network.
train.py is for training.
synthesis.py is for generating TTS sample.

Training the network

STEP 1. Download and extract LJSpeech data at any directory you want.
STEP 2. Adjust hyperparameters in hyperparams.py, especially 'data_path' which is a directory that you extract files, and the others if necessary.
STEP 3. Run train.py.

Generate TTS wav file

STEP 1. Run synthesis.py. Make sure the restore step.

Samples

You can check the generated samples in 'samples/' directory. Training step was only 60K, so the performance is not good yet.

Reference

Keith ito: https://github.com/keithito/tacotron

Comments

Any comments for the codes are always welcome.