Speaker Encoder - TTS - Gitea: ArmstrongLabs

🐸 TTS has a subproject, called Speaker Encoder. It is an implementation of https://arxiv.org/abs/1710.10467 . There is also a released model trained on LibriTTS dataset with ~1000 speakers in Released Models page.

You can use this model for various purposes:

Training a multi-speaker model using voice embeddings as speaker features.
- Compute embedding vectors by compute_embedding.py and feed them to your TTS network. (TTS side needs to be implemented but it should be straight forward)
Pruning bad examples from your TTS dataset.
- Compute embedding vectors and plot them using the notebook provided. Thx @nmstoker for this!
Use as a speaker classification or verification system.
Speaker diarization for ASR systems.

The model provided here is the halve of the baseline model. I figured, it is easier to train and the final performance does not differ too much compared to the larger version.