TTS/recipes/ljspeech
logan hart 6fdb88f8e2
Add Delightful-TTS implementation (#2095)
* add configs

* Update config file

* Add model configs

* Add model layers

* Add layer files

* Add layer modules

* change config names

* Add emotion manager

* fIX missing ap bug

* Fix missing ap bug

* Add base TTS e2e class

* Fix wrong variable name in load_tts_samples

* Add training script

* Remove range predictor and gaussian upsampling

* Add helper function

* Add vctk recipe

* Add conformer docs

* Fix linting in conformer.py

* Add Docs

* remove duplicate import

* refactor args

* Fix bugs

* Removew emotion embedding

* remove unused arg

* Remove emotion embedding arg

* Remove emotion embedding arg

* fix style issues

* Fix bugs

* Fix bugs

* Add unittests

* make style

* fix formatter bug

* fix test

* Add pyworld compute pitch func

* Update requirments.txt

* Fix dataset Bug

* Chnge layer norm to instance norm

* Add missing import

* Remove emotions.py

* remove ssim loss

* Add init layers func to aligner

* refactor model layers

* remove audio_config arg

* Rename loss func

* Rename to delightful-tts

* Rename loss func

* Remove unused modules

* refactor imports

* replace audio config with audio processor

* Add change sample rate option

* remove broken resample func

* update recipe

* fix style, add config docs

* fix tests and multispeaker embd dim

* remove pyworld

* Make style and fix inference

* Split tts tests

* Fixup

* Fixup

* Fixup

* Add argument names

* Set "random" speaker in the model Tortoise/Bark

* Use a diff f0_cache path for delightfull tts

* Fix delightful speaker handling

* Fix lint

* Make style

---------

Co-authored-by: loganhart420 <loganartpersonal@gmail.com>
Co-authored-by: Eren Gölge <erogol@hotmail.com>
2023-07-24 13:41:26 +02:00
..
align_tts d-vector handling (#1945) 2022-09-13 14:10:33 +02:00
delightful_tts Add Delightful-TTS implementation (#2095) 2023-07-24 13:41:26 +02:00
fast_pitch d-vector handling (#1945) 2022-09-13 14:10:33 +02:00
fast_speech d-vector handling (#1945) 2022-09-13 14:10:33 +02:00
fastspeech2 Fastspeech2 (#2073) 2023-01-15 22:39:22 +01:00
glow_tts d-vector handling (#1945) 2022-09-13 14:10:33 +02:00
hifigan Make style (#1405) 2022-03-16 12:13:55 +01:00
multiband_melgan Make style (#1405) 2022-03-16 12:13:55 +01:00
neuralhmm_tts Adding neural HMM TTS Model (#2272) 2023-01-23 11:53:04 +01:00
overflow Adding OverFlow (#2183) 2022-12-12 12:44:15 +01:00
speedy_speech d-vector handling (#1945) 2022-09-13 14:10:33 +02:00
tacotron2-Capacitron d-vector handling (#1945) 2022-09-13 14:10:33 +02:00
tacotron2-DCA d-vector handling (#1945) 2022-09-13 14:10:33 +02:00
tacotron2-DDC d-vector handling (#1945) 2022-09-13 14:10:33 +02:00
univnet Make style (#1405) 2022-03-16 12:13:55 +01:00
vits_tts d-vector handling (#1945) 2022-09-13 14:10:33 +02:00
wavegrad Make style 2022-02-25 11:26:59 +01:00
wavernn Make style 2022-02-25 11:26:59 +01:00
README.md Create LJSpeech recipes for all the models 2021-06-22 16:21:11 +02:00
download_ljspeech.sh Update ljspeech download 2022-02-25 11:12:44 +01:00

README.md

🐸💬 TTS LJspeech Recipes

For running the recipes

  1. Download the LJSpeech dataset here either manually from its official website or using download_ljspeech.sh.

  2. Go to your desired model folder and run the training.

    Running Python files. (Choose the desired GPU ID for your run and set CUDA_VISIBLE_DEVICES)

    CUDA_VISIBLE_DEVICES="0" python train_modelX.py
    

    Running bash scripts.

    bash run.sh
    

💡 Note that these runs are just templates to help you start training your first model. They are not optimized for the best result. Double-check the configurations and feel free to share your experiments to find better parameters together 💪.