A bash script that can convert txt to wav using the all powerful https://github.com/coqui-ai/TTS

Go to file

Christos Angelopoulos 0ecb7dd084 update Sapo.sh(delimitation, play audio correction)		2022-03-22 01:25:38 +02:00
screenshots	update screenshots	2022-03-20 20:08:11 +02:00
LICENSE	update files	2022-03-07 01:09:48 +02:00
README.md	update README.md	2022-03-20 20:08:59 +02:00
Sapo.sh	update Sapo.sh(delimitation, play audio correction)	2022-03-22 01:25:38 +02:00
sapo-fix.png	upload files	2022-03-08 15:15:49 +02:00
sapo-fix.sh	update files	2022-03-10 22:00:56 +02:00
sapo.png	upload files	2022-03-08 15:15:49 +02:00
sapo_progress.png	upload files	2022-03-08 15:15:49 +02:00
sapofonetix.sed	update sapofonetix.sed (442 lines)	2022-03-22 01:24:17 +02:00

README.md

Sapo

A bash script that can convert txt to wav using the all powerful https://github.com/coqui-ai/TTS

TTS

https://github.com/coqui-ai/TTS

INSTALL TTS

pip install TTS

FIX LONG UTTERANCES PROBLEM

https://dirk.net/2021/10/31/tts-fix-max-decoder-steps/

OTHER DEPENDENCIES

sudo apt install sed yad sox jq

As a text editor I use xed. If you prefer, however, another text editor by default (gedit, geany, mousepad etc), please substitute xed in line 93 of Sapo.sh with the respective command of your preffered editor.
Likewise, instead of celluloid audio player, you can use any other player you prefer, like xplayer, mplayer, smplayer, vlc, mpv etc. Just make sure to substitute celluloid with your preffered player in lines 197 & 223 of Sapo.sh.
The same applies for Audacity and any other preffred wave editor in line 222 of Sapo.sh. While audacity is not considered an absolute dependency for the functionality of the script, having a wave editor installed might as well be of use in cases, so, such a choice exists in fixing potential errors.

SCREENSHOTS

File selection dialogue

The file is delimited to lines with fewer characters each, so there will be no problem with the text-to-speech conversion due to excessively long lines. However, the user can edit the file further before the speech conversion.

Progress bar , and rough estimate of time left (probably depends on hardware)

DETECTING ERRORS

I. CLUTTER IN AUDIO OUTPUT

Sometimes the output wav file of a text file line is longer than necessary, containing hissing sounds, inrecognisable utterrances and clutter at the end of it.

In order to detect which wave files are generated having that problem, the ratio of character count of line / duration of audio file is calculated. This ratio helps us roughly to estimate which lines were rendered with errors.

The lines that possibly present this problem are written down in the errors.tsv that is generated. After the end of all the lines, the lines written down in the tsv file get re-rendered.

Many times this alone is enough.

After that each line one by one can be examined. The user is presented with a few options for each line:

These options include:

⯈Play the respective audio file
🗘Re-render the line, making minor changes(like e.g. putting a fullstop at the end of the line)

✀Trim the clutter that exists at the end of audio file, anything that exists after half a second of detected silence.
🗡Split render the line text in two batches, that will be concatenated after(useful in long sentences)

🎜Edit the respective audio file with a wave editor(e.g._Audacity)
✗Remove the respective audio file directly.
By hitting 😀Keep/Next the user can accept the audio file as is, or after correcting it, and proceed to the next.

After that, the audio files from all the lines will be concatenated into one.

II. SED SCRIPT

sapofonetix.sed is a script that substitutes words that get mispelled with other letter combinations, that have the right pronunciation result, e.g.

s/biscuit/biskit/g;s/Biscuit/biskit/g

will substitute the word biscuit (or Biscuit in plural) with the word biskeet (Biskeet), that its pronunciation sounds more proper.

The list of words is growing as the script gets used more, this will be an on going task:

FEEL FREE TO CONTRIBUTE!

It would be really realy helpful if you sent me a file containing all the mispronounced words that you have so far encountered. A better pronunciation would be found and recorded in the sapofonetix.sed database. Thus, the percentage of the mispronounced words would be made less and less.

Process complete, the final wav file is inside the created Sapo_filename folder, named filename.wav.

If the wav files (one for each line of text file) are too many, the final wav file will not be produced. In this case concatetate the wav files in smaller batches ( every 500 files), and then concatenate those to the final sound file, using the sox command, for example:

cd Sapo_1_1.txt

sox {000001..000500}.wav ~/Desktop/1f.wav

sox {000501..001000}.wav ~/Desktop/2f.wav

sox {001001..001500}.wav ~/Desktop/3f.wav

cd ~/Desktop

sox {1..3}f.wav final.wav

Sapo-fix.sh

Sapo-fish.sh is the error-correcting routine included in Sapo.sh, that can be run on its own, when the user wants to correct the lines detected and written in errors.tsv.

The user can also edit any line he wishes, just by entering in a line of errors.tsv the respective line number, wav number, and then run Sapo-fix.sh.