TTS/utils/visual.py

import torch
import librosa
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
from TTS.utils.text import phoneme_to_sequence, sequence_to_phoneme


def plot_alignment(alignment, info=None, fig_size=(16, 10), title=None):
    if isinstance(alignment, torch.Tensor):
        alignment_ = alignment.detach().cpu().numpy().squeeze()
    else:
        alignment_ = alignment
    fig, ax = plt.subplots(figsize=fig_size)
    im = ax.imshow(
        alignment_.T, aspect='auto', origin='lower', interpolation='none')
    fig.colorbar(im, ax=ax)
    xlabel = 'Decoder timestep'
    if info is not None:
        xlabel += '\n\n' + info
    plt.xlabel(xlabel)
    plt.ylabel('Encoder timestep')
    # plt.yticks(range(len(text)), list(text))
    plt.tight_layout()
    if title is not None:
        plt.title(title)
    return fig


def plot_spectrogram(linear_output, audio, fig_size=(16, 10)):
    if isinstance(linear_output, torch.Tensor):
        linear_output_ = linear_output.detach().cpu().numpy().squeeze()
    else:
        linear_output_ = linear_output
    spectrogram = audio._denormalize(linear_output_.T)  # pylint: disable=protected-access
    fig = plt.figure(figsize=fig_size)
    plt.imshow(spectrogram, aspect="auto", origin="lower")
    plt.colorbar()
    plt.tight_layout()
    return fig


def visualize(alignment, postnet_output, stop_tokens, text, hop_length, CONFIG, decoder_output=None, output_path=None, figsize=(8, 24)):
    if decoder_output is not None:
        num_plot = 4
    else:
        num_plot = 3

    label_fontsize = 16
    fig = plt.figure(figsize=figsize)

    plt.subplot(num_plot, 1, 1)
    plt.imshow(alignment.T, aspect="auto", origin="lower", interpolation=None)
    plt.xlabel("Decoder timestamp", fontsize=label_fontsize)
    plt.ylabel("Encoder timestamp", fontsize=label_fontsize)
    # compute phoneme representation and back
    if CONFIG.use_phonemes:
        seq = phoneme_to_sequence(text, [CONFIG.text_cleaner], CONFIG.phoneme_language, CONFIG.enable_eos_bos_chars, tp=CONFIG.characters if 'characters' in CONFIG.keys() else None)
        text = sequence_to_phoneme(seq, tp=CONFIG.characters if 'characters' in CONFIG.keys() else None)
        print(text)
    plt.yticks(range(len(text)), list(text))
    plt.colorbar()
    # plot stopnet predictions
    plt.subplot(num_plot, 1, 2)
    plt.plot(range(len(stop_tokens)), list(stop_tokens))
    # plot postnet spectrogram
    plt.subplot(num_plot, 1, 3)
    librosa.display.specshow(postnet_output.T, sr=CONFIG.audio['sample_rate'],
                             hop_length=hop_length, x_axis="time", y_axis="linear",
                             fmin=CONFIG.audio['mel_fmin'],
                             fmax=CONFIG.audio['mel_fmax'])

    plt.xlabel("Time", fontsize=label_fontsize)
    plt.ylabel("Hz", fontsize=label_fontsize)
    plt.tight_layout()
    plt.colorbar()

    if decoder_output is not None:
        plt.subplot(num_plot, 1, 4)
        librosa.display.specshow(decoder_output.T, sr=CONFIG.audio['sample_rate'],
                                 hop_length=hop_length, x_axis="time", y_axis="linear",
                                 fmin=CONFIG.audio['mel_fmin'],
                                 fmax=CONFIG.audio['mel_fmax'])
        plt.xlabel("Time", fontsize=label_fontsize)
        plt.ylabel("Hz", fontsize=label_fontsize)
        plt.tight_layout()
        plt.colorbar()

    if output_path:
        print(output_path)
        fig.savefig(output_path)
        plt.close()
visual.py update 2019-09-05 14:48:36 +00:00			`import torch`
Batch update after data-loss 2018-11-02 15:13:51 +00:00			`import librosa`
A big revision: visualization, data loader, tests 2018-02-04 16:25:00 +00:00			`import matplotlib`
			`matplotlib.use('Agg')`
			`import matplotlib.pyplot as plt`
Fix installation by using an explicit symlink 2019-08-29 09:49:53 +00:00			`from TTS.utils.text import phoneme_to_sequence, sequence_to_phoneme`
A big revision: visualization, data loader, tests 2018-02-04 16:25:00 +00:00

visual.py update 2019-09-05 14:48:36 +00:00			`def plot_alignment(alignment, info=None, fig_size=(16, 10), title=None):`
			`if isinstance(alignment, torch.Tensor):`
			`alignment_ = alignment.detach().cpu().numpy().squeeze()`
			`else:`
			`alignment_ = alignment`
			`fig, ax = plt.subplots(figsize=fig_size)`
pep8 format all 2018-08-02 14:34:17 +00:00			`im = ax.imshow(`
visual.py update 2019-09-05 14:48:36 +00:00			`alignment_.T, aspect='auto', origin='lower', interpolation='none')`
A big revision: visualization, data loader, tests 2018-02-04 16:25:00 +00:00			`fig.colorbar(im, ax=ax)`
			`xlabel = 'Decoder timestep'`
			`if info is not None:`
			`xlabel += '\n\n' + info`
			`plt.xlabel(xlabel)`
			`plt.ylabel('Encoder timestep')`
Batch update after data-loss 2018-11-02 15:13:51 +00:00			`# plt.yticks(range(len(text)), list(text))`
A big revision: visualization, data loader, tests 2018-02-04 16:25:00 +00:00			`plt.tight_layout()`
visual.py update 2019-09-05 14:48:36 +00:00			`if title is not None:`
			`plt.title(title)`
tensorboardx plotting figures 2018-08-11 14:53:09 +00:00			`return fig`
A big revision: visualization, data loader, tests 2018-02-04 16:25:00 +00:00

visual.py update 2019-09-05 14:48:36 +00:00			`def plot_spectrogram(linear_output, audio, fig_size=(16, 10)):`
			`if isinstance(linear_output, torch.Tensor):`
			`linear_output_ = linear_output.detach().cpu().numpy().squeeze()`
			`else:`
			`linear_output_ = linear_output`
visualization updates wrt mean-var scaling 2020-03-17 12:28:15 +00:00			`spectrogram = audio._denormalize(linear_output_.T) # pylint: disable=protected-access`
visual.py update 2019-09-05 14:48:36 +00:00			`fig = plt.figure(figsize=fig_size)`
visualization updates wrt mean-var scaling 2020-03-17 12:28:15 +00:00			`plt.imshow(spectrogram, aspect="auto", origin="lower")`
A big revision: visualization, data loader, tests 2018-02-04 16:25:00 +00:00			`plt.colorbar()`
			`plt.tight_layout()`
tensorboardx plotting figures 2018-08-11 14:53:09 +00:00			`return fig`
Batch update after data-loss 2018-11-02 15:13:51 +00:00

pylint fix 2020-04-23 13:46:45 +00:00			`def visualize(alignment, postnet_output, stop_tokens, text, hop_length, CONFIG, decoder_output=None, output_path=None, figsize=(8, 24)):`
bug fixes and consider the fmin fmax plotting specs 2020-04-09 10:28:52 +00:00			`if decoder_output is not None:`
Plot mel spectrogram if required 2018-11-13 11:10:40 +00:00			`num_plot = 4`
			`else:`
			`num_plot = 3`

Batch update after data-loss 2018-11-02 15:13:51 +00:00			`label_fontsize = 16`
visualization updates wrt mean-var scaling 2020-03-17 12:28:15 +00:00			`fig = plt.figure(figsize=figsize)`
Batch update after data-loss 2018-11-02 15:13:51 +00:00
Plot mel spectrogram if required 2018-11-13 11:10:40 +00:00			`plt.subplot(num_plot, 1, 1)`
Batch update after data-loss 2018-11-02 15:13:51 +00:00			`plt.imshow(alignment.T, aspect="auto", origin="lower", interpolation=None)`
			`plt.xlabel("Decoder timestamp", fontsize=label_fontsize)`
			`plt.ylabel("Encoder timestamp", fontsize=label_fontsize)`
bug fixes and consider the fmin fmax plotting specs 2020-04-09 10:28:52 +00:00			`# compute phoneme representation and back`
visual updates for phoenemes 2019-02-25 16:20:05 +00:00			`if CONFIG.use_phonemes:`
rename text to characters in config.json 2020-03-03 12:17:56 +00:00			`seq = phoneme_to_sequence(text, [CONFIG.text_cleaner], CONFIG.phoneme_language, CONFIG.enable_eos_bos_chars, tp=CONFIG.characters if 'characters' in CONFIG.keys() else None)`
			`text = sequence_to_phoneme(seq, tp=CONFIG.characters if 'characters' in CONFIG.keys() else None)`
small bug fixes 2019-05-14 11:53:26 +00:00			`print(text)`
Batch update after data-loss 2018-11-02 15:13:51 +00:00			`plt.yticks(range(len(text)), list(text))`
			`plt.colorbar()`
bug fixes and consider the fmin fmax plotting specs 2020-04-09 10:28:52 +00:00			`# plot stopnet predictions`
Plot mel spectrogram if required 2018-11-13 11:10:40 +00:00			`plt.subplot(num_plot, 1, 2)`
Batch update after data-loss 2018-11-02 15:13:51 +00:00			`plt.plot(range(len(stop_tokens)), list(stop_tokens))`
bug fixes and consider the fmin fmax plotting specs 2020-04-09 10:28:52 +00:00			`# plot postnet spectrogram`
Plot mel spectrogram if required 2018-11-13 11:10:40 +00:00			`plt.subplot(num_plot, 1, 3)`
bug fixes and consider the fmin fmax plotting specs 2020-04-09 10:28:52 +00:00			`librosa.display.specshow(postnet_output.T, sr=CONFIG.audio['sample_rate'],`
			`hop_length=hop_length, x_axis="time", y_axis="linear",`
			`fmin=CONFIG.audio['mel_fmin'],`
			`fmax=CONFIG.audio['mel_fmax'])`

Batch update after data-loss 2018-11-02 15:13:51 +00:00			`plt.xlabel("Time", fontsize=label_fontsize)`
			`plt.ylabel("Hz", fontsize=label_fontsize)`
visual updates for phoenemes 2019-02-25 16:20:05 +00:00			`plt.tight_layout()`
			`plt.colorbar()`
Batch update after data-loss 2018-11-02 15:13:51 +00:00
bug fixes and consider the fmin fmax plotting specs 2020-04-09 10:28:52 +00:00			`if decoder_output is not None:`
Plot mel spectrogram if required 2018-11-13 11:10:40 +00:00			`plt.subplot(num_plot, 1, 4)`
bug fixes and consider the fmin fmax plotting specs 2020-04-09 10:28:52 +00:00			`librosa.display.specshow(decoder_output.T, sr=CONFIG.audio['sample_rate'],`
			`hop_length=hop_length, x_axis="time", y_axis="linear",`
			`fmin=CONFIG.audio['mel_fmin'],`
			`fmax=CONFIG.audio['mel_fmax'])`
Plot mel spectrogram if required 2018-11-13 11:10:40 +00:00			`plt.xlabel("Time", fontsize=label_fontsize)`
			`plt.ylabel("Hz", fontsize=label_fontsize)`
visual updates for phoenemes 2019-02-25 16:20:05 +00:00			`plt.tight_layout()`
			`plt.colorbar()`
Fix Pylint issues 2019-07-19 06:46:23 +00:00
save figures in visualize of set 2019-05-12 15:35:44 +00:00			`if output_path:`
			`print(output_path)`
			`fig.savefig(output_path)`
small bug fixes 2019-05-14 11:53:26 +00:00			`plt.close()`