Commit Graph

122 Commits (dev)

Author SHA1 Message Date
Edresson Casanova cbdbc44e0f
Fix XTTS v2.0 training recipe (#3154)
* Fix XTTS v2.0 training recipe

* Update XTTS v2 model hash
2023-11-07 14:16:44 +01:00
Edresson Casanova 905900afc9 Update XTTS v1.1 recipe 2023-11-06 19:14:50 -03:00
Edresson Casanova cabff9f323 Update XTTS v2.0 recipe 2023-11-06 17:47:14 -03:00
Edresson Casanova 1b6f8d0e46 Update unit tests and recipes 2023-11-06 20:25:06 +01:00
Edresson Casanova 9942000c50 Update XTTS v2 recipe model files 2023-11-06 20:20:28 +01:00
Eren Gölge f0cb19ecca
Drop diffusion from XTTS (#3150)
* Drop diffusion for XTTS

* Make style

* Drop diffusion deps in code

* Restore thrashed
2023-11-06 20:15:49 +01:00
Edresson Casanova e45227d9ff
XTTS v2.0 (#3137)
* Implement most similar ref training approach

* Use non-enhanced hifigan for test samples

* Add Perceiver

* Update GPT Trainer for perceiver support

* Update XTTS docs

* Bug fix masking with XTTS perceiver

* Bug fix on gpt forward

* Bug Fix on XTTS v2.0 training

* Add XTTS v2.0 unit tests

* Add XTTS v2.0 inference unit tests

* Bug Fix on diffusion inference

* Add XTTS v2.0 training recipe

* Placeholder model entry

* Add cloning params to config

* Make prompt embedding configurable

* Make cloning configurable

* Cheap fix for a cheaper fix

* Prevent resampling

* Update model entry

* Update docs

* Update requirements

* Code linting

* Add xtts v2 to sep tests

* Bug fix on XTTS get_gpt_cond_latents

* Bug fix on rebase

* Make style

* Bug fix in Japenese tokenizer

* Add num2words to deps

* Remove unused kwarg and added num_beams=1 as default

---------

Co-authored-by: Eren G??lge <egolge@coqui.ai>
2023-11-06 14:58:18 +01:00
Aarni Koskela 38f6f8f0bb
Run `make style` & re-enable it in CI (#3127) 2023-11-06 11:36:37 +01:00
Edresson Casanova 8af3d2dbcd Add a dedicated workflow for XTTS tests 2023-10-24 09:52:44 -03:00
Edresson Casanova 8853e1c3ec Update XTTS recipe to only download checkpoint if it is needed 2023-10-23 10:45:41 -03:00
Edresson Casanova 653f2e75ef Update xtts trainer recipe 2023-10-23 09:58:16 -03:00
Edresson Casanova ec7f54768a Rebase bug fix and update recipe 2023-10-21 17:37:51 -03:00
Edresson Casanova affaf11148 Add XTTS training unit test 2023-10-21 13:41:12 -03:00
Edresson Casanova 94dcf84979 Rename XTTS recipe 2023-10-21 13:37:21 -03:00
Edresson Casanova 5f98dbeec9 Update Ljspeech XTTS recipe 2023-10-21 13:37:21 -03:00
Edresson Casanova 469d624615 Update LJspeech XTTS recipe 2023-10-21 13:37:21 -03:00
Edresson Casanova 9e3598c3b7 Bug Fix on inference using XTTS trainer checkpoint 2023-10-21 13:37:21 -03:00
Edresson Casanova c4ceaabe2c Add test sentences during the training 2023-10-21 13:33:56 -03:00
Edresson Casanova bafab049c2 Add prompting masking 2023-10-21 13:33:56 -03:00
Edresson Casanova 47d613df3a Add reproducible evaluation 2023-10-21 13:33:56 -03:00
Edresson Casanova a32961bcb4 Add XTTS base training code 2023-10-21 13:33:56 -03:00
Eren Gölge 623ea41634
Fix model tests (#2943) 2023-09-14 15:21:48 +02:00
Eren Gölge 4033db5f4b 🔥 XTTS implementation 2023-09-13 17:51:24 +02:00
Edresson Casanova 4d3f23b5d3
Add CML-TTS dataset YourTTS training recipe (#2934) 2023-09-12 11:49:14 +02:00
Aleś Bułojčyk fead04f779
Add phonemizer for Belarusian language (#2856) 2023-08-28 11:20:45 +02:00
Eren Gölge 69f080eb47
Fix DelightfulTTS (#2823)
* Fix tests

* Make style
2023-07-31 13:52:45 +02:00
AWAS666 9e74b51aa6
Delightful TTS VCTK recipe fixes (#2808)
* fix: wrong import class

* fix: formatter name missing

* feat: get rid of clearml
2023-07-31 10:27:42 +02:00
Aleś Bułojčyk d124f78430
Recipe for Belarusian TTS (#2756)
* Changes from jhlfrfufyfn <jhlfrfufyfn@gmail.com>

* Recipe for Belarusian TTS

---------

Co-authored-by: jhlfrfufyfn <jhlfrfufyfn@gmail.com>
2023-07-31 10:26:21 +02:00
logan hart 6fdb88f8e2
Add Delightful-TTS implementation (#2095)
* add configs

* Update config file

* Add model configs

* Add model layers

* Add layer files

* Add layer modules

* change config names

* Add emotion manager

* fIX missing ap bug

* Fix missing ap bug

* Add base TTS e2e class

* Fix wrong variable name in load_tts_samples

* Add training script

* Remove range predictor and gaussian upsampling

* Add helper function

* Add vctk recipe

* Add conformer docs

* Fix linting in conformer.py

* Add Docs

* remove duplicate import

* refactor args

* Fix bugs

* Removew emotion embedding

* remove unused arg

* Remove emotion embedding arg

* Remove emotion embedding arg

* fix style issues

* Fix bugs

* Fix bugs

* Add unittests

* make style

* fix formatter bug

* fix test

* Add pyworld compute pitch func

* Update requirments.txt

* Fix dataset Bug

* Chnge layer norm to instance norm

* Add missing import

* Remove emotions.py

* remove ssim loss

* Add init layers func to aligner

* refactor model layers

* remove audio_config arg

* Rename loss func

* Rename to delightful-tts

* Rename loss func

* Remove unused modules

* refactor imports

* replace audio config with audio processor

* Add change sample rate option

* remove broken resample func

* update recipe

* fix style, add config docs

* fix tests and multispeaker embd dim

* remove pyworld

* Make style and fix inference

* Split tts tests

* Fixup

* Fixup

* Fixup

* Add argument names

* Set "random" speaker in the model Tortoise/Bark

* Use a diff f0_cache path for delightfull tts

* Fix delightful speaker handling

* Fix lint

* Make style

---------

Co-authored-by: loganhart420 <loganartpersonal@gmail.com>
Co-authored-by: Eren Gölge <erogol@hotmail.com>
2023-07-24 13:41:26 +02:00
PiaoYang 630327c4e6
Update compute_embeddings.py (#2668)
* [Typo] Fix variable name. More readable description.

Update train_yourtts.py

Reformat.

Reformat using black again.

* Add `old_append`. Fix bool argparse.

* Reformat.
2023-07-04 11:37:47 +02:00
prakharpbuf c1875f68df
typos and minor fixes (#2508)
* Update tacotron1-2.md

* Update README.md

* Update Tutorial_2_train_your_first_TTS_model.ipynb

* Update synthesizer.py

There is no arg called --speaker_name

* Update formatting_your_dataset.md

* Update AnalyzeDataset.ipynb

* Update AnalyzeDataset.ipynb

* Update AnalyzeDataset.ipynb

* Update finetuning.md

* Update train_yourtts.py

* Update train_yourtts.py

* Update train_yourtts.py

* Update finetuning.md
2023-04-26 15:22:57 +02:00
Shivam Mehta d83ee8fe45
Adding neural HMM TTS Model (#2272)
* Adding neural HMM TTS

* Adding tests

* Adding neural hmm on readme

* renaming training recipe

* Removing overflow\s decoder parameters from the config

* Update the Trainer requirement version for a compatible one (#2276)

* Bump up to v0.10.2

* Adding neural HMM TTS

* Adding tests

* Adding neural hmm on readme

* renaming training recipe

* Removing overflow\s decoder parameters from the config

* fixing documentation

Co-authored-by: Edresson Casanova <edresson1@gmail.com>
Co-authored-by: Eren Gölge <erogol@hotmail.com>
2023-01-23 11:53:04 +01:00
manmay nakhashi bc422f2f3c
Fastspeech2 (#2073)
* added EnergyDataset

* add energy to Dataset

* add comupte_energy

* added energy params

* added energy to forward_tts

* added plot_avg_energy for visualisation

* Update forward_tts.py

* create file

* added fastspeech2 recipe

* add fastspeech2 config

* removed energy from fast pitch

* add energy loss to forward tts

* Update fastspeech2_config.py

* change run_name

* Update numpy_transforms.py

* fix typo

* fix typo

* fix typo

* linting issues

* use_energy default value --> False

* Update numpy_transforms.py

* linting fixes

* fix typo

* liniting_fix

* liniting_fix

* fix

* fixes

* fixes

* lint fix

* lint fixws

* added training test

* wrong import

* wrong import

* trailing whitespace

* style fix

* changed class name because of error

* class name change

* class name change

* change class name

* fixed styles
2023-01-15 22:39:22 +01:00
Khalid Bashir 42afad5e79
Fixed bug related to yourtts speaker embeddings issue (#2234)
* Fixed bug related to yourtts speaker embeddings issue

* Reverted code for base_tts

* Bug fix on VITS d_vector_file type

* Ignore the test speakers on YourTTS recipe

* Add speaker encoder model and config on YourTTS recipe to easily do zero-shot inference

* Update YourTTS config file

* Update ModelManager._update_path to deal with list attributes

* Fix lint checks

* Remove unused code

* Fix unit tests

* Reset name_to_id to get the right speaker ids on load_embeddings_from_list_of_files

* Set weighted_sampler_multipliers as an empty dict to prevent users' mistakes

Co-authored-by: Edresson Casanova <edresson1@gmail.com>
2023-01-02 14:20:02 +01:00
Julian Weber a07397733b
Multilingual tokenizer (#2229)
* Implement multilingual tokenizer

* Add multi_phonemizer receipe

* Fix lint

* Add TestMultiPhonemizer

* Fix lint

* make style
2023-01-02 10:03:19 +01:00
Edresson Casanova 061ac43187
Add Original YourTTS vocabulary for full transfer learning (#2206) 2022-12-13 09:02:10 +01:00
Edresson Casanova 3b1a28fa95
Add YourTTS VCTK recipe (#2198)
* Add YourTTS VCTK recipe

* Fix lint

* Add compute_embeddings and resample_files functions to be able to reuse it

* Add automatic download and speaker embedding computation for YourTTS VCTK recipe

* Add parameter for eval metadata file on compute embeddings function
2022-12-12 16:14:25 +01:00
Shivam Mehta 3b8b105b0d
Adding OverFlow (#2183)
* Adding encoder

* currently modifying hmm

* Adding hmm

* Adding overflow

* Adding overflow setting up flat start

* Removing runs

* adding normalization parameters

* Fixing models on same device

* Training overflow and plotting evaluations

* Adding inference

* At the end of epoch the test sentences are coming on cpu instead of gpu

* Adding figures from model during training to monitor

* reverting tacotron2 training recipe

* fixing inference on gpu for test sentences on config

* moving helpers and texts within overflows source code

* renaming to overflow

* moving loss to the model file

* Fixing the rename

* Model training but not plotting the test config sentences's audios

* Formatting logs

* Changing model name to camelcase

* Fixing test log

* Fixing plotting bug

* Adding some tests

* Adding more tests to overflow

* Adding all tests for overflow

* making changes to camel case in config

* Adding information about parameters and docstring

* removing compute_mel_statistics moved statistic computation to the model instead

* Added overflow in readme

* Adding more test cases, now it doesn't saves transition_p like tensor and can be dumped as json
2022-12-12 12:44:15 +01:00
Eren Gölge 9e5a469c64
d-vector handling (#1945)
* Update BaseDatasetConfig

- Add dataset_name
- Chane name to formatter_name

* Update compute_embedding

- Allow entering dataset by args
- Use released model by default
- Use the new key format

* Update loading

* Update recipes

* Update other dep code

* Update tests

* Fixup

* Load multiple embedding files

* Fix argument names in dep code

* Update docs

* Fix argument name

* Fix linter
2022-09-13 14:10:33 +02:00
Edresson Casanova 096b35f639
Add VCTK speaker encoder recipe (#1912) 2022-08-26 16:19:03 +02:00
Tsai Meng-Ting 9d32cbc3db
Fix type in download_vctk.sh (#1739)
typo in comment
2022-07-20 12:27:42 +02:00
Eren Gölge 49bac724c0
Implement VitsAudioConfig (#1556)
* Implement VitsAudioConfig

* Update VITS LJSpeech recipe

* Update VITS VCTK recipe

* Make style

* Add missing decorator

* Add missing param

* Make style

* Update recipes

* Fix test

* Bug fix

* Exclude tests folder

* Make linter

* Make style
2022-07-12 18:49:58 +02:00
a-froghyar 34b80e0280
feat: updated recipes and lr fix (#1718)
- updated the recipes activating more losses for more stable training
- re-enabling guided attention loss
- fixed a bug about not the correct lr fetched for logging
2022-07-12 15:00:53 +02:00
Eren G??lge f1e35596e8 Remove redundant config field 2022-07-11 13:39:41 +02:00
Noran Raskin a790df4e94
Training recipes for thorsten dataset (#1020)
* Fix style

* Fix isort

* Remove tensorboardX from requirements

Co-authored-by: logan hart <72301874+loganhart420@users.noreply.github.com>
Co-authored-by: Eren Gölge <egolge@coqui.ai>
2022-05-30 12:07:31 +02:00
a-froghyar 8be21ec387
Capacitron (#977)
* new CI config

* initial Capacitron implementation

* delete old unused file

* fix empty formatting changes

* update losses and training script

* fix previous commit

* fix commit

* Add Capacitron test and first round of test fixes

* revert formatter change

* add changes to the synthesizer

* add stepwise gradual lr scheduler and changes to the recipe

* add inference script for dev use

* feat: add posterior inference arguments to synth methods
- added reference wav and text args for posterior inference
- some formatting

* fix: add espeak flag to base_tts and dataset APIs
- use_espeak_phonemes flag was not implemented in those APIs
- espeak is now able to be utilised for phoneme generation
- necessary phonemizer for the Capacitron model

* chore: update training script and style
- training script includes the espeak flag and other hyperparams
- made style

* chore: fix linting

* feat: add Tacotron 2 support

* leftover from dev

* chore:rename parser args

* feat: extract optimizers
- created a separate optimizer class to merge the two optimizers

* chore: revert arbitrary trainer changes

* fmt: revert formatting bug

* formatting again

* formatting fixed

* fix: log func

* fix: update optimizer
- Implemented load_state_dict for continuing training

* fix: clean optimizer init for standard models

* improvement: purge espeak flags and add training scripts

* Delete capacitronT2.py

delete old training script, new one is pushed

* feat: capacitron trainer methods
- extracted capacitron specific training  operations from the trainer into custom
methods in taco1 and taco2 models

* chore: renaming and merging capacitron and gst style args

* fix: bug fixes from the previous commit

* fix: implement state_dict method on CapacitronOptimizer

* fix: call method

* fix: inference naming

* Delete train_capacitron.py

* fix: synthesize

* feat: update tests

* chore: fix style

* Delete capacitron_inference.py

* fix: fix train tts t2 capacitron tests

* fix: double forward in T2 train step

* fix: double forward in T1 train step

* fix: run make style

* fix: remove unused import

* fix: test for T1 capacitron

* fix: make lint

* feat: add blizzard2013 recipes

* make style

* fix: update recipes

* chore: make style

* Plot test sentences in Tacotron

* chore: make style and fix import

* fix: call forward first before problematic floordiv op

* fix: update recipes

* feat: add min_audio_len to recipes

* aux_input["style_mel"]

* chore: make style

* Make capacitron T2 recipe more stable

* Remove T1 capacitron Ljspeech

* feat: implement new grad clipping routine and update configs

* make style

* Add pretrained checkpoints

* Add default vocoder

* Change trainer package

* Fix grad clip issue for tacotron

* Fix scheduler issue with tacotron

Co-authored-by: Eren Gölge <egolge@coqui.ai>
Co-authored-by: WeberJulian <julian.weber@hotmail.fr>
Co-authored-by: Eren Gölge <erogol@hotmail.com>
2022-05-20 16:17:11 +02:00
Edresson Casanova 060e0f9368
Add EmbeddingManager and BaseIDManager (#1374) 2022-03-31 13:41:16 +02:00
Eren Gölge 1c3623af33
Fix model manager (#1436)
* Fix manager

* Make style
2022-03-23 12:57:14 +01:00
Edresson Casanova ccdc2300dc
Add eval_split and eval_split_size in the call of load_tts_samples for all recipes (#1424) 2022-03-22 12:54:41 +01:00
Eren Gölge 0870a4faa2
Make style (#1405) 2022-03-16 12:13:55 +01:00