mirror of https://github.com/coqui-ai/TTS.git
273 lines
7.3 KiB
Plaintext
273 lines
7.3 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "45ea3ef5",
|
|
"metadata": {
|
|
"tags": []
|
|
},
|
|
"source": [
|
|
"# Easy Inferencing with 🐸 TTS ⚡\n",
|
|
"\n",
|
|
"#### You want to quicly synthesize speech using Coqui 🐸 TTS model?\n",
|
|
"\n",
|
|
"💡: Grab a pre-trained model and use it to synthesize speech using any speaker voice, including yours! ⚡\n",
|
|
"\n",
|
|
"🐸 TTS comes with a list of pretrained models and speaker voices. You can even start a local demo server that you can open it on your favorite web browser and 🗣️ .\n",
|
|
"\n",
|
|
"In this notebook, we will: \n",
|
|
"```\n",
|
|
"1. List available pre-trained 🐸 TTS models\n",
|
|
"2. Run a 🐸 TTS model\n",
|
|
"3. Listen to the synthesized wave 📣\n",
|
|
"4. Run multispeaker 🐸 TTS model \n",
|
|
"```\n",
|
|
"So, let's jump right in!\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "a1e5c2a5-46eb-42fd-b550-2a052546857e",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Install 🐸 TTS ⬇️"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "fa2aec77",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"! pip install -U pip\n",
|
|
"! pip install TTS"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "8c07a273",
|
|
"metadata": {},
|
|
"source": [
|
|
"## ✅ List available pre-trained 🐸 TTS models\n",
|
|
"\n",
|
|
"Coqui 🐸TTS comes with a list of pretrained models for different model types (ex: TTS, vocoder), languages, datasets used for training and architectures. \n",
|
|
"\n",
|
|
"You can either use your own model or the release models under 🐸TTS.\n",
|
|
"\n",
|
|
"Use `tts --list_models` to find out the availble models.\n",
|
|
"\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "608d203f",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"! tts --list_models"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "ed9dd7ab",
|
|
"metadata": {},
|
|
"source": [
|
|
"## ✅ Run a 🐸 TTS model\n",
|
|
"\n",
|
|
"#### **First things first**: Using a release model and default vocoder:\n",
|
|
"\n",
|
|
"You can simply copy the full model name from the list above and use it \n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "cc9e4608-16ec-4dcd-bd6b-bd10d62286f8",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"!tts --text \"hello world\" \\\n",
|
|
"--model_name \"tts_models/en/ljspeech/glow-tts\" \\\n",
|
|
"--out_path output.wav\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "0ca2cb14-1aba-400e-a219-8ce44d9410be",
|
|
"metadata": {},
|
|
"source": [
|
|
"## 📣 Listen to the synthesized wave 📣"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "5fe63ef4-9284-4461-9dda-1ca7483a8f9b",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import IPython\n",
|
|
"IPython.display.Audio(\"output.wav\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "5e67d178-1ebe-49c7-9a47-0593251bdb96",
|
|
"metadata": {},
|
|
"source": [
|
|
"### **Second things second**:\n",
|
|
"\n",
|
|
"🔶 A TTS model can be either trained on a single speaker voice or multispeaker voices. This training choice is directly reflected on the inference ability and the available speaker voices that can be used to synthesize speech. \n",
|
|
"\n",
|
|
"🔶 If you want to run a multispeaker model from the released models list, you can first check the speaker ids using `--list_speaker_idx` flag and use this speaker voice to synthesize speech."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "87b18839-f750-4a61-bbb0-c964acaecab2",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# list the possible speaker IDs.\n",
|
|
"!tts --model_name \"tts_models/en/vctk/vits\" \\\n",
|
|
"--list_speaker_idxs \n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "c4365a9d-f922-4b14-88b0-d2b22a245b2e",
|
|
"metadata": {},
|
|
"source": [
|
|
"## 💬 Synthesize speech using speaker ID 💬"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "52be0403-d13e-4d9b-99c2-c10b85154063",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"!tts --text \"Trying out specific speaker voice\"\\\n",
|
|
"--out_path spkr-out.wav --model_name \"tts_models/en/vctk/vits\" \\\n",
|
|
"--speaker_idx \"p341\""
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "894a560a-f9c8-48ce-aaa6-afdf516c01f6",
|
|
"metadata": {},
|
|
"source": [
|
|
"## 📣 Listen to the synthesized speaker specific wave 📣"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "ed485b0a-dfd5-4a7e-a571-ebf74bdfc41d",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import IPython\n",
|
|
"IPython.display.Audio(\"spkr-out.wav\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "84636a38-097e-4dad-933b-0aeaee650e92",
|
|
"metadata": {},
|
|
"source": [
|
|
"🔶 If you want to use an external speaker to synthesize speech, you need to supply `--speaker_wav` flag along with an external speaker encoder path and config file, as follows:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "cbdb15fa-123a-4282-a127-87b50dc70365",
|
|
"metadata": {},
|
|
"source": [
|
|
"First we need to get the speaker encoder model, its config and a referece `speaker_wav`"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "e54f1b13-560c-4fed-bafd-e38ec9712359",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"!wget https://github.com/coqui-ai/TTS/releases/download/speaker_encoder_model/config_se.json\n",
|
|
"!wget https://github.com/coqui-ai/TTS/releases/download/speaker_encoder_model/model_se.pth.tar\n",
|
|
"!wget https://github.com/coqui-ai/TTS/raw/speaker_encoder_model/tests/data/ljspeech/wavs/LJ001-0001.wav"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "6dac1912-5054-4a68-8357-6d20fd99cb10",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"!tts --model_name tts_models/multilingual/multi-dataset/your_tts \\\n",
|
|
"--encoder_path model_se.pth.tar \\\n",
|
|
"--encoder_config config_se.json \\\n",
|
|
"--speaker_wav LJ001-0001.wav \\\n",
|
|
"--text \"Are we not allowed to dim the lights so people can see that a bit better?\"\\\n",
|
|
"--out_path spkr-out.wav \\\n",
|
|
"--language_idx \"en\""
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "92ddce58-8aca-4f69-84c3-645ae1b12e7d",
|
|
"metadata": {},
|
|
"source": [
|
|
"## 📣 Listen to the synthesized speaker specific wave 📣"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "cc889adc-9c71-4232-8e85-bfc8f76476f4",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import IPython\n",
|
|
"IPython.display.Audio(\"spkr-out.wav\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "29101d01-0b01-4153-a216-5dae415a5dd6",
|
|
"metadata": {},
|
|
"source": [
|
|
"## 🎉 Congratulations! 🎉 You now know how to use a TTS model to synthesize speech! \n",
|
|
"Follow up with the next tutorials to learn more adnavced material."
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3 (ipykernel)",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.8.10"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 5
|
|
}
|