Commit Graph

257 Commits (1d3c0c88467d01e114063b54279693b464ac656a)

Author SHA1 Message Date
erogol f75b0a6439 linter updates 2020-05-18 18:46:13 +02:00
erogol df8fd3823d Merge branch 'tf-convert2' into dev 2020-05-18 13:13:21 +02:00
erogol d99fda8e42 init batch norm explicit initial values 2020-05-12 16:23:32 +02:00
erogol 6f5c8773d6 enable encoder lstm bias 2020-05-12 16:23:32 +02:00
erogol 9504b71f79 fix lstm biases True 2020-05-12 16:23:32 +02:00
erogol de2918c85b bug fixes 2020-05-12 16:23:32 +02:00
erogol 736f169cc9 tf lstm does not match torch lstm wrt bias vectors. So I avoid bias in LSTM as an easy solution. 2020-05-12 16:23:32 +02:00
erogol d282222553 renaming layers to be converted to TF counterpart 2020-05-12 16:23:32 +02:00
Edresson Casanova cce13ee245
Fix bug in Graves Attn
On my machine at Graves attention the variable self.J ( self.J = torch.arange(0, inputs.shape[1]+2).to(inputs.device) + 0.5) is a LongTensor, but it must be a float tensor. So I get the following error:

Traceback (most recent call last):
  File "train.py", line 704, in <module>
    main(args)
  File "train.py", line 619, in main
    global_step, epoch)
  File "train.py", line 170, in train
    text_input, text_lengths, mel_input, speaker_embeddings=speaker_embeddings)
  File "/home/edresson/anaconda3/envs/TTS2/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/mnt/edresson/DD/TTS/voice-clonning/TTS/tts_namespace/TTS/models/tacotron.py", line 121, in forward
    self.speaker_embeddings_projected)
  File "/home/edresson/anaconda3/envs/TTS2/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/mnt/edresson/DD/TTS/voice-clonning/TTS/tts_namespace/TTS/layers/tacotron.py", line 435, in forward
    output, stop_token, attention = self.decode(inputs, mask)
  File "/mnt/edresson/DD/TTS/voice-clonning/TTS/tts_namespace/TTS/layers/tacotron.py", line 367, in decode
    self.attention_rnn_hidden, inputs, self.processed_inputs, mask)
  File "/home/edresson/anaconda3/envs/TTS2/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/mnt/edresson/DD/TTS/voice-clonning/TTS/tts_namespace/TTS/layers/common_layers.py", line 180, in forward
    phi_t = g_t.unsqueeze(-1) * (1.0 / (1.0 + torch.sigmoid((mu_t.unsqueeze(-1) - j) / sig_t.unsqueeze(-1))))
RuntimeError: expected type torch.cuda.FloatTensor but got torch.cuda.LongTensor


In addition the + 0.5 operation is canceled if it is a LongTensor.
Test: 
>>> torch.arange(0, 10) 
tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> torch.arange(0, 10) + 0.5
tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> torch.arange(0, 10.0) + 0.5
tensor([0.5000, 1.5000, 2.5000, 3.5000, 4.5000, 5.5000, 6.5000, 7.5000, 8.5000,
        9.5000])

To resolve this I forced the arrange range to float:
self.J = torch.arange(0, inputs.shape[1]+2.0).to(inputs.device) + 0.5
2020-05-04 17:52:58 -03:00
erogol 201f04d3b3 dropout graves attention heads to decorrelate and prevent overpowering of a single head 2020-03-10 13:53:04 +01:00
erogol 669a2e1d73 linter fixes 2020-03-10 11:30:13 +01:00
erogol d83b58e35d TTS Loss aggregated all loss functuons 2020-03-09 21:03:18 +01:00
erogol a68012aec2 BCE masked loss and padding stop_tokens with 0s not 1s 2020-02-25 14:17:20 +01:00
erogol 2b1fb6cb12 add mozilla german 2020-02-19 18:30:25 +01:00
root 1ef6278d2d tacotron2 stop condition 2020-02-19 18:27:02 +01:00
root ca33336ae0 testing seq_len_norm 2020-02-19 18:27:02 +01:00
root 0d17019d22 remove old graves 2020-02-19 18:27:02 +01:00
root bb1117ff32 stop dividing g_t with sig_t and commenting 2020-02-19 18:27:02 +01:00
root 678d56cdef bug fix for losses 2020-02-19 18:27:02 +01:00
root 9921d682c3 seq_len_norm for imbalanced datasets 2020-02-19 18:27:02 +01:00
root 72817438db graves v2 2020-02-19 18:27:02 +01:00
root cf7d968f57 graves attention as in melnet paper 2020-02-19 18:27:01 +01:00
root dc0e6c8019 simpler gmm attention implementaiton 2020-02-19 18:27:01 +01:00
root 0e8881114b efficient GMM attneiton with native broadcasting 2020-01-10 13:45:09 +01:00
root f2b6d00c45 grave attention config update: 2020-01-07 18:47:02 +01:00
geneing 748cbbc403 Change to GMMv2b 2020-01-05 18:34:01 -08:00
geneing 34e0291ba7 Change to GMMv2b 2020-01-05 18:32:49 -08:00
geneing 20b4211af5 Change to GMMv2b 2020-01-05 18:32:35 -08:00
Eren Golge 79cca4ac80 more loss tests 2019-11-15 14:30:28 +01:00
Eren Golge cd06a4c1e5 linter fix 2019-11-12 13:51:22 +01:00
Eren Golge df1b8b3ec7 linter and test updates for speaker_encoder, gmm_Attention 2019-11-12 12:42:42 +01:00
Eren Golge 1401a0db6b update GMM attention calp max min 2019-11-12 11:20:53 +01:00
Eren Golge 6f3dd1b6ae chnage gmm activations 2019-11-12 11:20:53 +01:00
Eren Golge 015f7780f4 Decoder shape comments for Tacotron2, decoupled grad clip for stopnet and the rest of the network. Some variable renaming and bug fix for alignment score logging 2019-11-12 11:20:53 +01:00
Eren Golge 2966e3f2d1 use ReLU for GMM 2019-11-12 11:20:53 +01:00
Eren Golge b904bc02d6 config update and initial bias for graves attention 2019-11-12 11:19:57 +01:00
Eren Golge 926a4d36ce change tanh layer size for graves attention 2019-11-12 11:19:16 +01:00
Eren Golge fb34c7b272 config and bug fix 2019-11-12 11:19:16 +01:00
Eren Golge 695bf1a1f6 bug fix for illegal memory reach 2019-11-12 11:19:16 +01:00
Eren Golge b9e0faca98 config update and bug fixes 2019-11-12 11:19:16 +01:00
Eren Golge adf9ebd629 Graves attention and setting attn type by config.json 2019-11-12 11:18:57 +01:00
Eren Golge 84d81b6579 graves attention [WIP] 2019-11-12 11:17:35 +01:00
Eren Golge ec579d02a1 bug fix argparser 2019-10-31 15:13:39 +01:00
Eren Golge 60b6ec18fe bug fix for synthesis.py 2019-10-29 17:38:59 +01:00
Eren Golge 002991ca15 bug fixes, linter update and test updates 2019-10-29 14:28:49 +01:00
Eren Golge 89ef71ead8 bug fix tacotron2, decoder return order fixed 2019-10-29 13:32:20 +01:00
Eren Golge 5a56a2c096 bug fixes for forward backward training and load_data for parsing data_loader 2019-10-29 02:58:42 +01:00
Eren Golge e83a4b07d2 commention model outputs for tacotron, align outputs shapes of tacotron and tracotron2, merge bidirectional decoder 2019-10-28 14:51:19 +01:00
Eren Golge 9d5a5b0764 linter 2019-10-24 14:34:31 +02:00
Eren Golge ea32f2368d linter fix 2019-10-24 14:11:07 +02:00