2022-03-04 23:44:10 +00:00
# Sapo
2022-04-02 07:54:02 +00:00
---
2022-04-02 07:57:29 +00:00
### The audio book generator
---
A bash script that can convert .txt files to .wav using the all powerful https://github.com/coqui-ai/TTS
2022-03-04 23:44:10 +00:00
2022-03-04 23:51:17 +00:00
## TTS
2022-03-04 23:44:10 +00:00
2022-04-02 07:54:02 +00:00
---
2022-03-04 23:51:17 +00:00
https://github.com/coqui-ai/TTS
2022-03-04 23:44:10 +00:00
2022-03-04 23:51:17 +00:00
### INSTALL TTS
2022-03-04 23:44:10 +00:00
2022-04-02 07:54:02 +00:00
---
2022-04-06 21:37:14 +00:00
To install TTS, open a terminal and type the following command:
2022-04-02 19:31:11 +00:00
> $ pip install TTS
2022-03-04 23:44:10 +00:00
2022-03-04 23:51:17 +00:00
### FIX LONG UTTERANCES PROBLEM
2022-03-04 23:44:10 +00:00
2022-04-02 07:54:02 +00:00
---
2022-04-06 21:37:14 +00:00
In order to be able to process long sentences, follow the instructions in this link:
2022-04-02 07:54:02 +00:00
2022-03-04 23:51:17 +00:00
https://dirk.net/2021/10/31/tts-fix-max-decoder-steps/
2022-03-06 18:34:52 +00:00
2022-03-06 18:36:06 +00:00
### OTHER DEPENDENCIES
2022-03-06 18:34:52 +00:00
2022-04-02 07:54:02 +00:00
---
2022-04-01 06:02:43 +00:00
> sudo apt install sed yad sox jq mplayer audacity xed
2022-03-06 21:50:56 +00:00
2022-04-02 07:54:02 +00:00
* As a text editor I use _xed_ . If you prefer, however, another text editor by default (gedit, geany, mousepad etc), please substitute __xed__ in __line 23 of Sapo.sh__ with the respective command of your preffered editor:
> EDITOR="xed"
* The same applies for _Audacity_ in __line 24 of Sapo.sh__ . While _audacity_ is not considered an absolute dependency for the functionality of the script, having a wave editor installed might as well be of use in cases, so, such a choice exists in fixing potential errors.
2022-03-07 00:26:43 +00:00
2022-04-02 07:54:02 +00:00
> AUDIO_EDITOR="audacity"
2022-03-20 18:08:59 +00:00
### SCREENSHOTS
2022-04-02 07:54:02 +00:00
---
2022-03-22 22:12:48 +00:00
* File selection dialogue:
---
2022-03-20 18:08:59 +00:00
![0.png ](screenshots/0.png )
---
* The file is delimited to lines with fewer characters each, so there will be no problem with the text-to-speech conversion due to excessively long lines. However, the user can edit the file further before the speech conversion.
![1.png ](screenshots/1.png )
---
![2.png ](screenshots/2.png )
---
* Progress bar , and rough estimate of time left (probably depends on hardware)
![3.png ](screenshots/3.png )
2022-03-09 01:53:42 +00:00
---
2022-03-09 01:45:12 +00:00
### DETECTING ERRORS
2022-03-07 00:26:43 +00:00
2022-04-02 07:54:02 +00:00
---
2022-03-16 22:28:10 +00:00
### I. CLUTTER IN AUDIO OUTPUT
2022-03-09 01:45:12 +00:00
2022-04-02 07:54:02 +00:00
---
2022-03-16 22:28:10 +00:00
Sometimes the output wav file of a text file line is longer than necessary, containing hissing sounds, inrecognisable utterrances and clutter at the end of it.
2022-03-09 01:53:42 +00:00
In order to detect which wave files are generated having that problem, the ratio of _character count of line / duration of audio file_ is calculated. This ratio helps us roughly to estimate which lines were rendered with errors.
2022-03-09 12:25:41 +00:00
The lines that _possibly_ present this problem are written down in the errors.tsv that is generated. After the end of all the lines, the lines written down in the tsv file get re-rendered.
2022-03-09 01:45:12 +00:00
2022-03-09 12:25:41 +00:00
Many times this alone is enough.
2022-03-09 01:45:12 +00:00
2022-03-20 18:08:59 +00:00
---
![8.png ](screenshots/8.png )
2022-03-22 22:12:48 +00:00
---
At this point the user will be prompted to select editing:
+ All the lines of the file, one by one, where the user can make any change they wish on any word of any line, or
+ Just the lines that were reported with an error during their rendering. These errors have to do with the length of the line, and not with mispronounced words.
2022-03-09 12:25:41 +00:00
---
2022-03-09 01:45:12 +00:00
2022-03-22 22:12:48 +00:00
![9.png ](screenshots/9.png )
2022-03-09 01:45:12 +00:00
2022-03-22 22:12:48 +00:00
---
2022-04-02 07:54:02 +00:00
Either way, the user is presented with *a few options* for each line:
2022-03-22 22:12:48 +00:00
---
![5.png ](screenshots/5.png )
2022-03-09 01:45:12 +00:00
2022-03-09 12:25:41 +00:00
---
2022-03-09 13:18:41 +00:00
These options include:
2022-03-09 12:25:41 +00:00
2022-03-09 13:18:41 +00:00
* ** ⯈Play** the respective audio file
2022-03-09 01:45:12 +00:00
2022-03-09 13:18:41 +00:00
* __ 🗘Re-render__ the line, making minor changes(like e.g. putting a fullstop at the end of the line)
2022-03-09 12:25:41 +00:00
2022-03-09 01:45:12 +00:00
2022-03-09 01:53:42 +00:00
---
2022-03-09 01:45:12 +00:00
2022-03-09 12:25:41 +00:00
![6.png ](screenshots/6.png )
---
2022-03-09 13:42:18 +00:00
* __ ✀Trim the clutter__ that exists at the end of audio file, anything that exists after half a second of detected silence.
2022-04-02 07:54:02 +00:00
2022-03-09 12:25:41 +00:00
2022-03-09 13:42:18 +00:00
* __ 🗡Split render__ the line text in two batches, that will be concatenated after(useful in long sentences)
2022-03-09 12:25:41 +00:00
---
![7.png ](screenshots/7.png )
2022-03-09 01:45:12 +00:00
2022-03-09 01:53:42 +00:00
---
2022-03-09 01:45:12 +00:00
2022-03-09 12:25:41 +00:00
2022-04-01 06:02:43 +00:00
* __ 🛠️ Edit__ the respective audio file with a wave editor(e.g._Audacity)
2022-03-09 12:25:41 +00:00
2022-03-09 13:18:41 +00:00
* __ ✗Remove__ the respective audio file directly.
2022-04-02 07:54:02 +00:00
2022-04-01 06:02:43 +00:00
* __⬅️ Previous__ takes the user back to the previous line
2022-03-09 12:25:41 +00:00
2022-04-01 06:02:43 +00:00
* __ ➡️ Next__ takes the user to the next line
2022-04-02 08:38:43 +00:00
2022-04-01 06:02:43 +00:00
* __👉 Go To__ can take the user to a specific line number for editing.
2022-03-26 00:32:13 +00:00
2022-04-01 06:02:43 +00:00
* __⏩ Browse__ will go to the next line and directly play the audio file.
2022-03-09 12:25:41 +00:00
**After that, the audio files from all the lines will be concatenated into one.**
2022-03-26 00:32:13 +00:00
### II. SED SCRIPTS
2022-03-06 22:04:27 +00:00
2022-04-02 07:54:02 +00:00
---
_letters.sed, abbreviations.sed and fonetix.sed_ are scripts that substitute letters, abbreviations and words that get mispelled with other letter combinations, that have the right pronunciation result, e.g.
2022-03-26 00:32:13 +00:00
> s/biscuit/biskit/g
2022-03-06 22:04:27 +00:00
2022-04-02 23:38:07 +00:00
will substitute the word _biscuit_ with the word _biskit_ , the pronunciation of which sounds more proper.
2022-03-06 22:04:27 +00:00
2022-03-09 01:53:42 +00:00
The list of words is growing as the script gets used more, this will be an on going task:
2022-04-02 07:54:02 +00:00
---
2022-03-17 14:06:51 +00:00
### <u>FEEL FREE TO CONTRIBUTE!</u>
2022-04-02 07:54:02 +00:00
---
2022-04-02 23:38:07 +00:00
It would be ___really really helpful___ if you sent me a file containing all the mispronounced words that you have so far encountered.
A better pronunciation would be found and recorded in the _fonetix.sed_ database.
Thus, the percentage of the mispronounced words would be made less and less.
2022-03-06 22:04:27 +00:00
2022-03-22 22:12:48 +00:00
2022-03-06 22:04:27 +00:00
2022-03-06 21:50:56 +00:00
---
2022-04-02 07:54:02 +00:00
* Process complete, the final wav file is inside the created **Sapo_filename** folder, named **filename.wav** .
2022-03-06 23:00:19 +00:00
2022-04-02 07:54:02 +00:00
If the wav files (one for each line of text file) are too many, the final wav file
2022-03-06 23:00:19 +00:00
will not be produced. In this case concatetate the wav files in smaller batches ( every 500 files), and then concatenate _those_ to the final sound file, using the **sox** command, for example:
2022-04-02 19:31:11 +00:00
>$ cd Sapo_1_1.txt
>$ sox {000001..000500}.wav ~/Desktop/1f.wav
>$ sox {000501..001000}.wav ~/Desktop/2f.wav
>$ sox {001001..001500}.wav ~/Desktop/3f.wav
>$ cd ~/Desktop
>$ sox {1..3}f.wav final.wav
2022-03-22 22:12:48 +00:00
---
2022-03-06 23:00:19 +00:00
2022-03-06 23:06:27 +00:00
![4.png ](screenshots/4.png )
2022-03-09 11:04:13 +00:00
2022-03-22 22:12:48 +00:00
---
2022-03-09 11:04:13 +00:00
### Sapo-fix.sh
2022-04-02 07:54:02 +00:00
---
Sapo-fish.sh is the error-correcting routine included in Sapo.sh, that can be run on its own, when the user wants to correct the lines detected and written in errors.tsv.
2022-03-09 11:04:13 +00:00
2022-03-09 12:25:41 +00:00
The user can also edit any line he wishes, just by entering in a line of errors.tsv the respective line number, wav number, and then run Sapo-fix.sh.
2022-04-02 19:31:11 +00:00
## DOWNLOAD AND INSTALL
---
* From the page https://gitlab.com/christosangel/sapo click on the __Download__ button
---
![20.png ](screenshots/20.png )
---
* and select __zip__ :
---
![21.png ](screenshots/21.png )
---
* Download the .zip file to your computer(for instance at the Downloads directory):
---
![23.png ](screenshots/23.png )
---
* Navigate to this directory, right-click on sapo-main.zip, and select __Extract Here__ :
---
![24.png ](screenshots/24.png )
---
* Open the terminal, with the following commands you will:
2022-04-02 23:38:07 +00:00
2022-04-02 19:31:11 +00:00
* navigate to home folder,
> $ cd
* create __ ~/git/sapo__ directory and copy the contents of the unzipped __sapo-main__ folder in there,
> $ mkdir -p ~/git/sapo/ &&cp -r ~/Downloads/sapo-main/* ~/git/sapo/
* make Sapo.sh and sapo-fix.sh executable:
> $ chmod +x ~/git/sapo/{Sapo.sh,sapo-fix.sh}
* Finally, all you have to do to run the script, is either
2022-04-02 19:36:41 +00:00
* Navigate to __ ~/git/sapo/__ and double-click on __Sapo.sh__ , or
2022-04-02 19:31:11 +00:00
2022-04-02 19:36:41 +00:00
* from the terminal, run the command:
2022-04-02 19:31:11 +00:00
2022-04-02 19:37:55 +00:00
> $ ~/git/sapo/Sapo.sh