mirror of https://gitlab.com/christosangel/sapo
186 lines
5.4 KiB
Markdown
186 lines
5.4 KiB
Markdown
# Sapo
|
||
|
||
A bash script that can convert txt to wav using the all powerful https://github.com/coqui-ai/TTS
|
||
|
||
## TTS
|
||
|
||
https://github.com/coqui-ai/TTS
|
||
|
||
### INSTALL TTS
|
||
|
||
> pip install TTS
|
||
|
||
### FIX LONG UTTERANCES PROBLEM
|
||
|
||
https://dirk.net/2021/10/31/tts-fix-max-decoder-steps/
|
||
|
||
### OTHER DEPENDENCIES
|
||
|
||
> sudo apt install sed yad sox jq
|
||
|
||
* As a text editor I use _xed_. If you prefer, however, another text editor by default (gedit, geany, mousepad etc), please substitute __xed__ in _line 93_ of __Sapo.sh__ with the respective command of your preffered editor.
|
||
* Likewise, instead of _celluloid_ audio player, you can use any other player you prefer, like _xplayer, mplayer, smplayer, vlc, mpv etc._ Just make sure to substitute celluloid with your preffered player in line 223 of __Sapo.sh__.
|
||
* The same applies for _Audacity_ and any other preffred wave editor in line 222 of Sapo.sh. While _audacity_ is not considered an absolute dependency for the functionality of the script, having a wave editor installed might as well be of use in cases, so, such a choice exists in fixing potential errors.
|
||
|
||
|
||
|
||
### SCREENSHOTS
|
||
|
||
* File selection dialogue:
|
||
|
||
---
|
||
|
||
![0.png](screenshots/0.png)
|
||
|
||
---
|
||
|
||
* The file is delimited to lines with fewer characters each, so there will be no problem with the text-to-speech conversion due to excessively long lines. However, the user can edit the file further before the speech conversion.
|
||
|
||
![1.png](screenshots/1.png)
|
||
|
||
---
|
||
|
||
![2.png](screenshots/2.png)
|
||
|
||
---
|
||
|
||
* Progress bar , and rough estimate of time left (probably depends on hardware)
|
||
|
||
![3.png](screenshots/3.png)
|
||
|
||
---
|
||
|
||
### DETECTING ERRORS
|
||
|
||
### I. CLUTTER IN AUDIO OUTPUT
|
||
|
||
Sometimes the output wav file of a text file line is longer than necessary, containing hissing sounds, inrecognisable utterrances and clutter at the end of it.
|
||
|
||
In order to detect which wave files are generated having that problem, the ratio of _character count of line / duration of audio file_ is calculated. This ratio helps us roughly to estimate which lines were rendered with errors.
|
||
|
||
The lines that _possibly_ present this problem are written down in the errors.tsv that is generated. After the end of all the lines, the lines written down in the tsv file get re-rendered.
|
||
|
||
Many times this alone is enough.
|
||
|
||
---
|
||
|
||
![8.png](screenshots/8.png)
|
||
|
||
|
||
---
|
||
At this point the user will be prompted to select editing:
|
||
|
||
+ All the lines of the file, one by one, where the user can make any change they wish on any word of any line, or
|
||
|
||
+ Just the lines that were reported with an error during their rendering. These errors have to do with the length of the line, and not with mispronounced words.
|
||
|
||
---
|
||
|
||
|
||
![9.png](screenshots/9.png)
|
||
|
||
|
||
---
|
||
|
||
|
||
Either way, the user is presented with *a few options* for each line:
|
||
|
||
---
|
||
|
||
|
||
![5.png](screenshots/5.png)
|
||
|
||
|
||
---
|
||
|
||
These options include:
|
||
|
||
* **⯈Play** the respective audio file
|
||
|
||
|
||
* __🗘Re-render__ the line, making minor changes(like e.g. putting a fullstop at the end of the line)
|
||
|
||
|
||
---
|
||
|
||
![6.png](screenshots/6.png)
|
||
|
||
|
||
|
||
|
||
---
|
||
|
||
* __✀Trim the clutter__ that exists at the end of audio file, anything that exists after half a second of detected silence.
|
||
|
||
|
||
* __🗡Split render__ the line text in two batches, that will be concatenated after(useful in long sentences)
|
||
|
||
---
|
||
|
||
![7.png](screenshots/7.png)
|
||
|
||
|
||
|
||
|
||
---
|
||
|
||
|
||
* __🎜Edit__ the respective audio file with a wave editor(e.g._Audacity)
|
||
|
||
* __✗Remove__ the respective audio file directly.
|
||
|
||
* __⬅️Previous Line__ takes the user back to the previous line
|
||
|
||
* By hitting __➡️Next Line__ the user can accept the audio file as is, or after correcting it, and proceed to the next.
|
||
|
||
|
||
**After that, the audio files from all the lines will be concatenated into one.**
|
||
|
||
### II. SED SCRIPT
|
||
|
||
sapofonetix.sed is a script that substitutes words that get mispelled with other letter combinations, that have the right pronunciation result, e.g.
|
||
> s/biscuit/biskit/g;s/Biscuit/biskit/g
|
||
|
||
will substitute the word _biscuit_ (or _Biscuit_ in plural) with the word _biskeet_ (_Biskeet_), that its pronunciation sounds more proper.
|
||
|
||
The list of words is growing as the script gets used more, this will be an on going task:
|
||
|
||
### <u>FEEL FREE TO CONTRIBUTE!</u>
|
||
|
||
__It would be really realy helpful if you sent me a file containing all the mispronounced words that you have so far encountered.
|
||
A better pronunciation would be found and recorded in the sapofonetix.sed database.
|
||
Thus, the percentage of the mispronounced words would be made less and less.__
|
||
|
||
|
||
|
||
---
|
||
|
||
* Process complete, the final wav file is inside the created **Sapo_filename** folder, named **filename.wav**.
|
||
|
||
If the wav files (one for each line of text file) are too many, the final wav file
|
||
will not be produced. In this case concatetate the wav files in smaller batches ( every 500 files), and then concatenate _those_ to the final sound file, using the **sox** command, for example:
|
||
|
||
> cd Sapo_1_1.txt
|
||
>
|
||
> sox {000001..000500}.wav ~/Desktop/1f.wav
|
||
>
|
||
> sox {000501..001000}.wav ~/Desktop/2f.wav
|
||
>
|
||
> sox {001001..001500}.wav ~/Desktop/3f.wav
|
||
>
|
||
> cd ~/Desktop
|
||
>
|
||
> sox {1..3}f.wav final.wav
|
||
|
||
---
|
||
|
||
![4.png](screenshots/4.png)
|
||
|
||
---
|
||
|
||
### Sapo-fix.sh
|
||
|
||
Sapo-fish.sh is the error-correcting routine included in Sapo.sh, that can be run on its own, when the user wants to correct the lines detected and written in errors.tsv.
|
||
|
||
The user can also edit any line he wishes, just by entering in a line of errors.tsv the respective line number, wav number, and then run Sapo-fix.sh.
|