mirror of https://github.com/coqui-ai/TTS.git
2
Speaker Encoder
Eren Gölge edited this page 2022-11-02 14:57:21 +01:00
🐸 TTS has a subproject, called Speaker Encoder. It is an implementation of https://arxiv.org/abs/1710.10467 . There is also a released model trained on LibriTTS dataset with ~1000 speakers in Released Models page.
You can use this model for various purposes:
- Training a multi-speaker model using voice embeddings as speaker features.
- Compute embedding vectors by
compute_embedding.py
and feed them to your TTS network. (TTS side needs to be implemented but it should be straight forward)
- Compute embedding vectors by
- Pruning bad examples from your TTS dataset.
- Compute embedding vectors and plot them using the notebook provided. Thx @nmstoker for this!
- Use as a speaker classification or verification system.
- Speaker diarization for ASR systems.
The model provided here is the halve of the baseline model. I figured, it is easier to train and the final performance does not differ too much compared to the larger version.