mirror of https://gitlab.com/christosangel/sapo
222 lines
5.4 KiB
Markdown
222 lines
5.4 KiB
Markdown
# Sapo
|
||
|
||
---
|
||
|
||
### The audio book generator
|
||
|
||
---
|
||
|
||
A bash script that can convert .txt files to .wav using the all powerful https://github.com/coqui-ai/TTS
|
||
|
||
## TTS
|
||
|
||
---
|
||
|
||
https://github.com/coqui-ai/TTS
|
||
|
||
### INSTALL TTS
|
||
|
||
---
|
||
|
||
> pip install TTS
|
||
|
||
### FIX LONG UTTERANCES PROBLEM
|
||
|
||
---
|
||
|
||
https://dirk.net/2021/10/31/tts-fix-max-decoder-steps/
|
||
|
||
### OTHER DEPENDENCIES
|
||
|
||
---
|
||
|
||
> sudo apt install sed yad sox jq mplayer audacity xed
|
||
|
||
* As a text editor I use _xed_. If you prefer, however, another text editor by default (gedit, geany, mousepad etc), please substitute __xed__ in __line 23 of Sapo.sh__ with the respective command of your preffered editor:
|
||
|
||
> EDITOR="xed"
|
||
|
||
* The same applies for _Audacity_ in __line 24 of Sapo.sh__. While _audacity_ is not considered an absolute dependency for the functionality of the script, having a wave editor installed might as well be of use in cases, so, such a choice exists in fixing potential errors.
|
||
|
||
> AUDIO_EDITOR="audacity"
|
||
|
||
|
||
### SCREENSHOTS
|
||
|
||
---
|
||
|
||
* File selection dialogue:
|
||
|
||
---
|
||
|
||
![0.png](screenshots/0.png)
|
||
|
||
---
|
||
|
||
* The file is delimited to lines with fewer characters each, so there will be no problem with the text-to-speech conversion due to excessively long lines. However, the user can edit the file further before the speech conversion.
|
||
|
||
![1.png](screenshots/1.png)
|
||
|
||
---
|
||
|
||
![2.png](screenshots/2.png)
|
||
|
||
---
|
||
|
||
* Progress bar , and rough estimate of time left (probably depends on hardware)
|
||
|
||
![3.png](screenshots/3.png)
|
||
|
||
---
|
||
|
||
### DETECTING ERRORS
|
||
|
||
---
|
||
|
||
### I. CLUTTER IN AUDIO OUTPUT
|
||
|
||
---
|
||
|
||
Sometimes the output wav file of a text file line is longer than necessary, containing hissing sounds, inrecognisable utterrances and clutter at the end of it.
|
||
|
||
In order to detect which wave files are generated having that problem, the ratio of _character count of line / duration of audio file_ is calculated. This ratio helps us roughly to estimate which lines were rendered with errors.
|
||
|
||
The lines that _possibly_ present this problem are written down in the errors.tsv that is generated. After the end of all the lines, the lines written down in the tsv file get re-rendered.
|
||
|
||
Many times this alone is enough.
|
||
|
||
---
|
||
|
||
![8.png](screenshots/8.png)
|
||
|
||
|
||
---
|
||
At this point the user will be prompted to select editing:
|
||
|
||
+ All the lines of the file, one by one, where the user can make any change they wish on any word of any line, or
|
||
|
||
+ Just the lines that were reported with an error during their rendering. These errors have to do with the length of the line, and not with mispronounced words.
|
||
|
||
---
|
||
|
||
|
||
![9.png](screenshots/9.png)
|
||
|
||
|
||
---
|
||
|
||
|
||
Either way, the user is presented with *a few options* for each line:
|
||
|
||
---
|
||
|
||
|
||
![5.png](screenshots/5.png)
|
||
|
||
|
||
---
|
||
|
||
These options include:
|
||
|
||
* **⯈Play** the respective audio file
|
||
|
||
|
||
* __🗘Re-render__ the line, making minor changes(like e.g. putting a fullstop at the end of the line)
|
||
|
||
|
||
---
|
||
|
||
![6.png](screenshots/6.png)
|
||
|
||
|
||
|
||
|
||
---
|
||
|
||
* __✀Trim the clutter__ that exists at the end of audio file, anything that exists after half a second of detected silence.
|
||
|
||
|
||
* __🗡Split render__ the line text in two batches, that will be concatenated after(useful in long sentences)
|
||
|
||
---
|
||
|
||
![7.png](screenshots/7.png)
|
||
|
||
|
||
|
||
|
||
---
|
||
|
||
|
||
* __🛠️Edit__ the respective audio file with a wave editor(e.g._Audacity)
|
||
|
||
* __✗Remove__ the respective audio file directly.
|
||
|
||
* __⬅️Previous__ takes the user back to the previous line
|
||
|
||
* __➡️Next__ takes the user to the next line
|
||
|
||
* __👉 Go To__ can take the user to a specific line number for editing.
|
||
|
||
* __⏩ Browse__ will go to the next line and directly play the audio file.
|
||
|
||
|
||
**After that, the audio files from all the lines will be concatenated into one.**
|
||
|
||
### II. SED SCRIPTS
|
||
|
||
---
|
||
|
||
_letters.sed, abbreviations.sed and fonetix.sed_ are scripts that substitute letters, abbreviations and words that get mispelled with other letter combinations, that have the right pronunciation result, e.g.
|
||
|
||
> s/biscuit/biskit/g
|
||
|
||
will substitute the word _biscuit_ with the word _biskeet_ , the pronunciation sof which ounds more proper.
|
||
|
||
The list of words is growing as the script gets used more, this will be an on going task:
|
||
|
||
---
|
||
|
||
### <u>FEEL FREE TO CONTRIBUTE!</u>
|
||
|
||
---
|
||
|
||
__It would be really realy helpful if you sent me a file containing all the mispronounced words that you have so far encountered.
|
||
A better pronunciation would be found and recorded in the fonetix.sed database.
|
||
Thus, the percentage of the mispronounced words would be made less and less.__
|
||
|
||
|
||
|
||
---
|
||
|
||
* Process complete, the final wav file is inside the created **Sapo_filename** folder, named **filename.wav**.
|
||
|
||
If the wav files (one for each line of text file) are too many, the final wav file
|
||
will not be produced. In this case concatetate the wav files in smaller batches ( every 500 files), and then concatenate _those_ to the final sound file, using the **sox** command, for example:
|
||
|
||
> cd Sapo_1_1.txt
|
||
>
|
||
> sox {000001..000500}.wav ~/Desktop/1f.wav
|
||
>
|
||
> sox {000501..001000}.wav ~/Desktop/2f.wav
|
||
>
|
||
> sox {001001..001500}.wav ~/Desktop/3f.wav
|
||
>
|
||
> cd ~/Desktop
|
||
>
|
||
> sox {1..3}f.wav final.wav
|
||
|
||
---
|
||
|
||
![4.png](screenshots/4.png)
|
||
|
||
---
|
||
|
||
### Sapo-fix.sh
|
||
|
||
---
|
||
|
||
Sapo-fish.sh is the error-correcting routine included in Sapo.sh, that can be run on its own, when the user wants to correct the lines detected and written in errors.tsv.
|
||
|
||
The user can also edit any line he wishes, just by entering in a line of errors.tsv the respective line number, wav number, and then run Sapo-fix.sh.
|