sapo/README.md

# Sapo

---

### The audio book generator

---

A bash script that can convert .txt files to .wav using the all powerful https://github.com/coqui-ai/TTS

## TTS

---

https://github.com/coqui-ai/TTS

### INSTALL TTS

---

To install TTS, open a terminal and type the following command:

> $ pip install TTS

### FIX LONG UTTERANCES PROBLEM

---
In order to be able to process long sentences, follow the instructions in this link:

https://dirk.net/2021/10/31/tts-fix-max-decoder-steps/

### OTHER DEPENDENCIES

---

> sudo apt install sed yad sox jq mplayer audacity xed

  * As a text editor I use _xed_. If you prefer, however, another text editor by default (gedit, geany, mousepad etc), please substitute __xed__  in __line 23 of Sapo.sh__ with the respective command of your preffered editor:

> EDITOR="xed"

  * The same applies for _Audacity_ in __line 24 of Sapo.sh__. While _audacity_ is not considered an absolute dependency for the functionality of the script, having a wave editor installed might as well be of use in cases, so, such a choice exists in fixing potential errors.

> AUDIO_EDITOR="audacity"


### SCREENSHOTS

---

  * File selection dialogue:

---

![0.png](screenshots/0.png)

---

  * The file is delimited to lines with fewer characters each, so there  will be no problem with the text-to-speech conversion due to excessively long lines. However, the user can edit the file further before the speech conversion.

![1.png](screenshots/1.png)

---

![2.png](screenshots/2.png)

---

  * Progress bar , and rough estimate of time left (probably depends on hardware)

![3.png](screenshots/3.png)

---

### DETECTING ERRORS

---

### I. CLUTTER IN AUDIO OUTPUT

---

Sometimes  the output wav file of a text file line is longer than necessary, containing hissing sounds, inrecognisable utterrances and clutter at the end of it.

 In order to detect which wave files are generated having that problem, the ratio of _character count of line / duration of audio file_ is calculated. This ratio helps us roughly to estimate which lines were rendered with errors.

The lines that _possibly_ present this problem are written down in the errors.tsv that is generated. After the end of  all the lines, the lines written down in the tsv file get re-rendered.

 Many times this alone is enough.

---

![8.png](screenshots/8.png)


---
At this point the user will be prompted to select editing:

+ All the lines of the file, one by one, where the user can make any change they wish on any word of any line, or

+ Just the lines that were reported with an error during their rendering. These errors have to do with the length of the line, and not with mispronounced words.

---


![9.png](screenshots/9.png)


---


Either way, the user is presented with *a few options* for each line:

---


![5.png](screenshots/5.png)


---

These options include:

  *    **⯈Play** the respective audio file


  *    __🗘Re-render__ the line, making minor changes(like e.g. putting a fullstop at the end of the line)


---

![6.png](screenshots/6.png)


---

  *   __✀Trim the clutter__ that exists at the end of audio file, anything that exists after half a second of detected silence.


  *   __🗡Split render__ the line text in two batches, that will be concatenated after(useful in long sentences)

---

![7.png](screenshots/7.png)


---


  *   __🛠️Edit__ the respective audio file with a wave editor(e.g._Audacity)

  *   __✗Remove__ the respective audio file directly.

  * __⬅️Previous__ takes the user back to the previous line

  *   __➡️Next__ takes the user to the next line

  * __👉 Go To__ can take the user to a specific line number for editing.

  * __⏩ Browse__ will go to the next line and directly play the audio file.


**After that, the audio files from all the lines will be concatenated into one.**

### II. SED SCRIPTS

---

_letters.sed, abbreviations.sed and fonetix.sed_ are  scripts that substitute letters, abbreviations and words that get mispelled  with other letter combinations, that have the right pronunciation result, e.g.

> s/biscuit/biskit/g

will substitute the word _biscuit_ with the word _biskit_ , the pronunciation of which sounds more proper.

The list of words is growing as the script gets used more, this will be an on going task:

---

### <u>FEEL FREE TO CONTRIBUTE!</u>

---

 It would be ___really really helpful___ if you sent me a file containing all the mispronounced words that you have so far encountered.
 A better pronunciation would be found and recorded in the _fonetix.sed_ database.
Thus, the percentage of the mispronounced words would be made less and less.


---

  * Process complete, the final wav file is inside the created **Sapo_filename** folder, named **filename.wav**.

    If the wav files (one for each line of  text file) are too many, the final wav file
 will not be produced. In this case concatetate the wav files in smaller batches ( every 500 files), and then concatenate _those_ to the final sound file, using the **sox** command, for example:

>$ cd Sapo_1_1.txt

>$ sox {000001..000500}.wav ~/Desktop/1f.wav

>$ sox {000501..001000}.wav ~/Desktop/2f.wav

>$ sox {001001..001500}.wav ~/Desktop/3f.wav

>$ cd ~/Desktop

>$ sox {1..3}f.wav final.wav

---

![4.png](screenshots/4.png)

---

### Sapo-fix.sh

---

Sapo-fish.sh is the error-correcting routine included in Sapo.sh, that can be run on its own, when the user wants to correct the lines detected and written in errors.tsv.

The user can also edit any line he wishes, just by entering in a line of errors.tsv the respective line number, wav number, and then run Sapo-fix.sh.


## DOWNLOAD AND INSTALL


---


  *  From the page https://gitlab.com/christosangel/sapo click on the __Download__ button

---

![20.png](screenshots/20.png)

---

  *  and select __zip__:

---

![21.png](screenshots/21.png)

---

  *  Download the .zip file to your computer(for instance at the Downloads directory):

---

![23.png](screenshots/23.png)

---

  *  Navigate to this directory, right-click on sapo-main.zip, and select __Extract Here__:

---

![24.png](screenshots/24.png)

---

  *  Open the terminal, with the following commands you will:

  * navigate to home folder,

> $ cd

  *  create __~/git/sapo__ directory and copy the contents of the unzipped __sapo-main__ folder in there,

> $ mkdir -p ~/git/sapo/ &&cp -r ~/Downloads/sapo-main/* ~/git/sapo/

  *  make Sapo.sh and sapo-fix.sh executable:

> $ chmod +x ~/git/sapo/{Sapo.sh,sapo-fix.sh}


  *  Finally, all you have to do to run the script, is either

      * Navigate to __~/git/sapo/__ and double-click on __Sapo.sh__, or

      *  from the terminal, run the command:

> $ ~/git/sapo/Sapo.sh
-												Initial commit
											
										
										
											2022-03-04 23:44:10 +00:00
+								# Sapo
-												 update file

											
										
										
											2022-04-02 07:54:02 +00:00
+								---
-												 update file

											
										
										
											2022-04-02 07:57:29 +00:00
+								### The audio book generator
 								---
 								A bash script that can convert .txt files to .wav using the all powerful https://github.com/coqui-ai/TTS
-												Initial commit
											
										
										
											2022-03-04 23:44:10 +00:00
-												Update README.md
											
										
										
											2022-03-04 23:51:17 +00:00
+								## TTS
-												Initial commit
											
										
										
											2022-03-04 23:44:10 +00:00
-												 update file

											
										
										
											2022-04-02 07:54:02 +00:00
+								---
-												Update README.md
											
										
										
											2022-03-04 23:51:17 +00:00
+								https://github.com/coqui-ai/TTS
-												Initial commit
											
										
										
											2022-03-04 23:44:10 +00:00
-												Update README.md
											
										
										
											2022-03-04 23:51:17 +00:00
+								### INSTALL TTS
-												Initial commit
											
										
										
											2022-03-04 23:44:10 +00:00
-												 update file

											
										
										
											2022-04-02 07:54:02 +00:00
+								---
-												update files

											
										
										
											2022-04-06 21:37:14 +00:00
+								To install TTS, open a terminal and type the following command:
-												add Download and install instuctions

											
										
										
											2022-04-02 19:31:11 +00:00
+								> $ pip install TTS
-												Initial commit
											
										
										
											2022-03-04 23:44:10 +00:00
-												Update README.md
											
										
										
											2022-03-04 23:51:17 +00:00
+								### FIX LONG UTTERANCES PROBLEM
-												Initial commit
											
										
										
											2022-03-04 23:44:10 +00:00
-												 update file

											
										
										
											2022-04-02 07:54:02 +00:00
+								---
-												update files

											
										
										
											2022-04-06 21:37:14 +00:00
+								In order to be able to process long sentences, follow the instructions in this link:
-												 update file

											
										
										
											2022-04-02 07:54:02 +00:00
-												Update README.md
											
										
										
											2022-03-04 23:51:17 +00:00
+								https://dirk.net/2021/10/31/tts-fix-max-decoder-steps/
-												update files

											
										
										
											2022-03-06 18:34:52 +00:00
-												update README.md

											
										
										
											2022-03-06 18:36:06 +00:00
+								### OTHER DEPENDENCIES
-												update files

											
										
										
											2022-03-06 18:34:52 +00:00
-												 update file

											
										
										
											2022-04-02 07:54:02 +00:00
+								---
-												update README.md

											
										
										
											2022-04-01 06:02:43 +00:00
+								> sudo apt install sed yad sox jq mplayer audacity xed
-												update README.md

											
										
										
											2022-03-06 21:50:56 +00:00
-												 update file

											
										
										
											2022-04-02 07:54:02 +00:00
+								  * As a text editor I use _xed_. If you prefer, however, another text editor by default (gedit, geany, mousepad etc), please substitute __xed__  in __line 23 of Sapo.sh__ with the respective command of your preffered editor:
 								> EDITOR="xed"
 								  * The same applies for _Audacity_ in __line 24 of Sapo.sh__. While _audacity_ is not considered an absolute dependency for the functionality of the script, having a wave editor installed might as well be of use in cases, so, such a choice exists in fixing potential errors.
-												update README.md

											
										
										
											2022-03-07 00:26:43 +00:00
-												 update file

											
										
										
											2022-04-02 07:54:02 +00:00
+								> AUDIO_EDITOR="audacity"
-												update README.md

											
										
										
											2022-03-20 18:08:59 +00:00
 								### SCREENSHOTS
-												 update file

											
										
										
											2022-04-02 07:54:02 +00:00
+								---
-												update README.md

											
										
										
											2022-03-22 22:12:48 +00:00
+								  * File selection dialogue:
 								---
-												update README.md

											
										
										
											2022-03-20 18:08:59 +00:00
 								![0.png](screenshots/0.png)
 								---
 								  * The file is delimited to lines with fewer characters each, so there  will be no problem with the text-to-speech conversion due to excessively long lines. However, the user can edit the file further before the speech conversion.
 								![1.png](screenshots/1.png)
 								---
 								![2.png](screenshots/2.png)
 								---
 								  * Progress bar , and rough estimate of time left (probably depends on hardware)
 								![3.png](screenshots/3.png)
-												update README.md

											
										
										
											2022-03-09 01:53:42 +00:00
+								---
-												update files

											
										
										
											2022-03-09 01:45:12 +00:00
+								### DETECTING ERRORS
-												update README.md

											
										
										
											2022-03-07 00:26:43 +00:00
-												 update file

											
										
										
											2022-04-02 07:54:02 +00:00
+								---
-												update README.md

											
										
										
											2022-03-16 22:28:10 +00:00
+								### I. CLUTTER IN AUDIO OUTPUT
-												update files

											
										
										
											2022-03-09 01:45:12 +00:00
-												 update file

											
										
										
											2022-04-02 07:54:02 +00:00
+								---
-												update README.md

											
										
										
											2022-03-16 22:28:10 +00:00
+								Sometimes  the output wav file of a text file line is longer than necessary, containing hissing sounds, inrecognisable utterrances and clutter at the end of it.
-												update README.md

											
										
										
											2022-03-09 01:53:42 +00:00
 								 In order to detect which wave files are generated having that problem, the ratio of _character count of line / duration of audio file_ is calculated. This ratio helps us roughly to estimate which lines were rendered with errors.
-												update files

											
										
										
											2022-03-09 12:25:41 +00:00
+								The lines that _possibly_ present this problem are written down in the errors.tsv that is generated. After the end of  all the lines, the lines written down in the tsv file get re-rendered.
-												update files

											
										
										
											2022-03-09 01:45:12 +00:00
-												update files

											
										
										
											2022-03-09 12:25:41 +00:00
+								 Many times this alone is enough.
-												update files

											
										
										
											2022-03-09 01:45:12 +00:00
-												update README.md

											
										
										
											2022-03-20 18:08:59 +00:00
+								---
 								![8.png](screenshots/8.png)
-												update README.md

											
										
										
											2022-03-22 22:12:48 +00:00
+								---
 								At this point the user will be prompted to select editing:
 								+ All the lines of the file, one by one, where the user can make any change they wish on any word of any line, or
 								+ Just the lines that were reported with an error during their rendering. These errors have to do with the length of the line, and not with mispronounced words.
-												update files

											
										
										
											2022-03-09 12:25:41 +00:00
 								---
-												update files

											
										
										
											2022-03-09 01:45:12 +00:00
-												update README.md

											
										
										
											2022-03-22 22:12:48 +00:00
+								![9.png](screenshots/9.png)
-												update files

											
										
										
											2022-03-09 01:45:12 +00:00
-												update README.md

											
										
										
											2022-03-22 22:12:48 +00:00
+								---
-												 update file

											
										
										
											2022-04-02 07:54:02 +00:00
+								Either way, the user is presented with *a few options* for each line:
-												update README.md

											
										
										
											2022-03-22 22:12:48 +00:00
 								---
 								![5.png](screenshots/5.png)
-												update files

											
										
										
											2022-03-09 01:45:12 +00:00
-												update files

											
										
										
											2022-03-09 12:25:41 +00:00
+								---
-												update files

											
										
										
											2022-03-09 13:18:41 +00:00
+								These options include:
-												update files

											
										
										
											2022-03-09 12:25:41 +00:00
-												update files

											
										
										
											2022-03-09 13:18:41 +00:00
+								  *    **⯈Play** the respective audio file
-												update files

											
										
										
											2022-03-09 01:45:12 +00:00
-												update files

											
										
										
											2022-03-09 13:18:41 +00:00
+								  *    __🗘Re-render__ the line, making minor changes(like e.g. putting a fullstop at the end of the line)
-												update files

											
										
										
											2022-03-09 12:25:41 +00:00
-												update files

											
										
										
											2022-03-09 01:45:12 +00:00
-												update README.md

											
										
										
											2022-03-09 01:53:42 +00:00
+								---
-												update files

											
										
										
											2022-03-09 01:45:12 +00:00
-												update files

											
										
										
											2022-03-09 12:25:41 +00:00
+								![6.png](screenshots/6.png)
 								---
-												update README.md

											
										
										
											2022-03-09 13:42:18 +00:00
+								  *   __✀Trim the clutter__ that exists at the end of audio file, anything that exists after half a second of detected silence.
-												 update file

											
										
										
											2022-04-02 07:54:02 +00:00
-												update files

											
										
										
											2022-03-09 12:25:41 +00:00
-												update README.md

											
										
										
											2022-03-09 13:42:18 +00:00
+								  *   __🗡Split render__ the line text in two batches, that will be concatenated after(useful in long sentences)
-												update files

											
										
										
											2022-03-09 12:25:41 +00:00
 								---
 								![7.png](screenshots/7.png)
-												update files

											
										
										
											2022-03-09 01:45:12 +00:00
-												update README.md

											
										
										
											2022-03-09 01:53:42 +00:00
+								---
-												update files

											
										
										
											2022-03-09 01:45:12 +00:00
-												update files

											
										
										
											2022-03-09 12:25:41 +00:00
-												update README.md

											
										
										
											2022-04-01 06:02:43 +00:00
+								  *   __🛠️Edit__ the respective audio file with a wave editor(e.g._Audacity)
-												update files

											
										
										
											2022-03-09 12:25:41 +00:00
-												update files

											
										
										
											2022-03-09 13:18:41 +00:00
+								  *   __✗Remove__ the respective audio file directly.
-												 update file

											
										
										
											2022-04-02 07:54:02 +00:00
-												update README.md

											
										
										
											2022-04-01 06:02:43 +00:00
+								  * __⬅️Previous__ takes the user back to the previous line
-												update files

											
										
										
											2022-03-09 12:25:41 +00:00
-												update README.md

											
										
										
											2022-04-01 06:02:43 +00:00
+								  *   __➡️Next__ takes the user to the next line
-												 update file

											
										
										
											2022-04-02 08:38:43 +00:00
-												update README.md

											
										
										
											2022-04-01 06:02:43 +00:00
+								  * __👉 Go To__ can take the user to a specific line number for editing.
-												update files

											
										
										
											2022-03-26 00:32:13 +00:00
-												update README.md

											
										
										
											2022-04-01 06:02:43 +00:00
+								  * __⏩ Browse__ will go to the next line and directly play the audio file.
-												update files

											
										
										
											2022-03-09 12:25:41 +00:00
 								**After that, the audio files from all the lines will be concatenated into one.**
-												update files

											
										
										
											2022-03-26 00:32:13 +00:00
+								### II. SED SCRIPTS
-												upload README.md

											
										
										
											2022-03-06 22:04:27 +00:00
-												 update file

											
										
										
											2022-04-02 07:54:02 +00:00
+								---
 								_letters.sed, abbreviations.sed and fonetix.sed_ are  scripts that substitute letters, abbreviations and words that get mispelled  with other letter combinations, that have the right pronunciation result, e.g.
-												update files

											
										
										
											2022-03-26 00:32:13 +00:00
+								> s/biscuit/biskit/g
-												upload README.md

											
										
										
											2022-03-06 22:04:27 +00:00
-												update file

											
										
										
											2022-04-02 23:38:07 +00:00
+								will substitute the word _biscuit_ with the word _biskit_ , the pronunciation of which sounds more proper.
-												upload README.md

											
										
										
											2022-03-06 22:04:27 +00:00
-												update README.md

											
										
										
											2022-03-09 01:53:42 +00:00
+								The list of words is growing as the script gets used more, this will be an on going task:
-												 update file

											
										
										
											2022-04-02 07:54:02 +00:00
+								---
-												update README.md

											
										
										
											2022-03-17 14:06:51 +00:00
+								### <u>FEEL FREE TO CONTRIBUTE!</u>
-												 update file

											
										
										
											2022-04-02 07:54:02 +00:00
+								---
-												update file

											
										
										
											2022-04-02 23:38:07 +00:00
+								 It would be ___really really helpful___ if you sent me a file containing all the mispronounced words that you have so far encountered.
 								 A better pronunciation would be found and recorded in the _fonetix.sed_ database.
 								Thus, the percentage of the mispronounced words would be made less and less.
-												upload README.md

											
										
										
											2022-03-06 22:04:27 +00:00
-												update README.md

											
										
										
											2022-03-22 22:12:48 +00:00
-												upload README.md

											
										
										
											2022-03-06 22:04:27 +00:00
-												update README.md

											
										
										
											2022-03-06 21:50:56 +00:00
+								---
-												 update file

											
										
										
											2022-04-02 07:54:02 +00:00
+								  * Process complete, the final wav file is inside the created **Sapo_filename** folder, named **filename.wav**.
-												upload README.md

											
										
										
											2022-03-06 23:00:19 +00:00
-												 update file

											
										
										
											2022-04-02 07:54:02 +00:00
+								    If the wav files (one for each line of  text file) are too many, the final wav file
-												upload README.md

											
										
										
											2022-03-06 23:00:19 +00:00
+								 will not be produced. In this case concatetate the wav files in smaller batches ( every 500 files), and then concatenate _those_ to the final sound file, using the **sox** command, for example:
-												add Download and install instuctions

											
										
										
											2022-04-02 19:31:11 +00:00
+								>$ cd Sapo_1_1.txt
 								>$ sox {000001..000500}.wav ~/Desktop/1f.wav
 								>$ sox {000501..001000}.wav ~/Desktop/2f.wav
 								>$ sox {001001..001500}.wav ~/Desktop/3f.wav
 								>$ cd ~/Desktop
 								>$ sox {1..3}f.wav final.wav
-												update README.md

											
										
										
											2022-03-22 22:12:48 +00:00
 								---
-												upload README.md

											
										
										
											2022-03-06 23:00:19 +00:00
-												Update README.md
											
										
										
											2022-03-06 23:06:27 +00:00
+								![4.png](screenshots/4.png)
-												update files

											
										
										
											2022-03-09 11:04:13 +00:00
-												update README.md

											
										
										
											2022-03-22 22:12:48 +00:00
+								---
-												update files

											
										
										
											2022-03-09 11:04:13 +00:00
+								### Sapo-fix.sh
-												 update file

											
										
										
											2022-04-02 07:54:02 +00:00
+								---
 								Sapo-fish.sh is the error-correcting routine included in Sapo.sh, that can be run on its own, when the user wants to correct the lines detected and written in errors.tsv.
-												update files

											
										
										
											2022-03-09 11:04:13 +00:00
-												update files

											
										
										
											2022-03-09 12:25:41 +00:00
+								The user can also edit any line he wishes, just by entering in a line of errors.tsv the respective line number, wav number, and then run Sapo-fix.sh.
-												add Download and install instuctions

											
										
										
											2022-04-02 19:31:11 +00:00
 								## DOWNLOAD AND INSTALL
 								---
 								  *  From the page https://gitlab.com/christosangel/sapo click on the __Download__ button
 								---
 								![20.png](screenshots/20.png)
 								---
 								  *  and select __zip__:
 								---
 								![21.png](screenshots/21.png)
 								---
 								  *  Download the .zip file to your computer(for instance at the Downloads directory):
 								---
 								![23.png](screenshots/23.png)
 								---
 								  *  Navigate to this directory, right-click on sapo-main.zip, and select __Extract Here__:
 								---
 								![24.png](screenshots/24.png)
 								---
 								  *  Open the terminal, with the following commands you will:
-												update file

											
										
										
											2022-04-02 23:38:07 +00:00
-												add Download and install instuctions

											
										
										
											2022-04-02 19:31:11 +00:00
+								  * navigate to home folder,
 								> $ cd
 								  *  create __~/git/sapo__ directory and copy the contents of the unzipped __sapo-main__ folder in there,
 								> $ mkdir -p ~/git/sapo/ &&cp -r ~/Downloads/sapo-main/* ~/git/sapo/
 								  *  make Sapo.sh and sapo-fix.sh executable:
 								> $ chmod +x ~/git/sapo/{Sapo.sh,sapo-fix.sh}
 								  *  Finally, all you have to do to run the script, is either
-												update file

											
										
										
											2022-04-02 19:36:41 +00:00
+								      * Navigate to __~/git/sapo/__ and double-click on __Sapo.sh__, or
-												add Download and install instuctions

											
										
										
											2022-04-02 19:31:11 +00:00
-												update file

											
										
										
											2022-04-02 19:36:41 +00:00
+								      *  from the terminal, run the command:
-												add Download and install instuctions

											
										
										
											2022-04-02 19:31:11 +00:00
-												update file

											
										
										
											2022-04-02 19:37:55 +00:00
+								> $ ~/git/sapo/Sapo.sh