mirror of https://github.com/coqui-ai/TTS.git
docs: add notes about xtts fine-tuning
parent
a425ba599d
commit
0df04cc259
|
@ -34,7 +34,7 @@ You can either use your trained model or choose a model from the provided list.
|
|||
tts --model_info_by_name vocoder_models/en/ljspeech/hifigan_v2
|
||||
```
|
||||
|
||||
#### Single Speaker Models
|
||||
#### Single speaker models
|
||||
|
||||
- Run TTS with the default model (`tts_models/en/ljspeech/tacotron2-DDC`):
|
||||
|
||||
|
@ -102,7 +102,7 @@ You can either use your trained model or choose a model from the provided list.
|
|||
--vocoder_config_path path/to/vocoder_config.json
|
||||
```
|
||||
|
||||
#### Multi-speaker Models
|
||||
#### Multi-speaker models
|
||||
|
||||
- List the available speakers and choose a `<speaker_id>` among them:
|
||||
|
||||
|
@ -125,7 +125,7 @@ You can either use your trained model or choose a model from the provided list.
|
|||
--speakers_file_path path/to/speaker.json --speaker_idx <speaker_id>
|
||||
```
|
||||
|
||||
#### Voice Conversion Models
|
||||
#### Voice conversion models
|
||||
|
||||
```sh
|
||||
tts --out_path output/path/speech.wav --model_name "<language>/<dataset>/<model_name>" \\
|
||||
|
|
|
@ -16,13 +16,19 @@ We tried to collect common issues and questions we receive about 🐸TTS. It is
|
|||
- If you need faster models, consider SpeedySpeech, GlowTTS or AlignTTS. Keep in mind that SpeedySpeech requires a pre-trained Tacotron or Tacotron2 model to compute text-to-speech alignments.
|
||||
|
||||
## How can I train my own `tts` model?
|
||||
|
||||
```{note} XTTS has separate fine-tuning scripts, see [here](models/xtts.md#training).
|
||||
```
|
||||
|
||||
0. Check your dataset with notebooks in [dataset_analysis](https://github.com/idiap/coqui-ai-TTS/tree/main/notebooks/dataset_analysis) folder. Use [this notebook](https://github.com/idiap/coqui-ai-TTS/blob/main/notebooks/dataset_analysis/CheckSpectrograms.ipynb) to find the right audio processing parameters. A better set of parameters results in a better audio synthesis.
|
||||
|
||||
1. Write your own dataset `formatter` in `datasets/formatters.py` or [format](datasets/formatting_your_dataset) your dataset as one of the supported datasets, like LJSpeech.
|
||||
A `formatter` parses the metadata file and converts a list of training samples.
|
||||
|
||||
2. If you have a dataset with a different alphabet than English, you need to set your own character list in the ```config.json```.
|
||||
- If you use phonemes for training and your language is supported [here](https://github.com/rhasspy/gruut#supported-languages), you don't need to set your character list.
|
||||
- If you use phonemes for training and your language is supported by
|
||||
[Espeak](https://github.com/espeak-ng/espeak-ng/blob/master/docs/languages.md)
|
||||
or [Gruut](https://github.com/rhasspy/gruut#supported-languages), you don't need to set your character list.
|
||||
- You can use `TTS/bin/find_unique_chars.py` to get characters used in your dataset.
|
||||
|
||||
3. Write your own text cleaner in ```utils.text.cleaners```. It is not always necessary, except when you have a different alphabet or language-specific requirements.
|
||||
|
|
|
@ -29,6 +29,9 @@ them and fine-tune it for your own dataset. This will help you in two main ways:
|
|||
|
||||
## Steps to fine-tune a 🐸 TTS model
|
||||
|
||||
```{note} XTTS has separate fine-tuning scripts, see [here](../models/xtts.md#training).
|
||||
```
|
||||
|
||||
1. Setup your dataset.
|
||||
|
||||
You need to format your target dataset in a certain way so that 🐸TTS data loader will be able to load it for the
|
||||
|
|
|
@ -8,3 +8,6 @@ The following pages show you how to train and fine-tune Coqui models:
|
|||
training_a_model
|
||||
finetuning
|
||||
```
|
||||
|
||||
Also see the [XTTS page](../models/xtts.md#training) if you want to fine-tune
|
||||
that model.
|
||||
|
|
|
@ -29,6 +29,9 @@ CLI, server or Python API.
|
|||
|
||||
## Training a `tts` Model
|
||||
|
||||
```{note} XTTS has separate fine-tuning scripts, see [here](models/xtts.md#training).
|
||||
```
|
||||
|
||||
A breakdown of a simple script that trains a GlowTTS model on the LJspeech
|
||||
dataset. For a more in-depth guide to training and fine-tuning also see [this
|
||||
page](training/index.md).
|
||||
|
|
Loading…
Reference in New Issue