sapo/README.md

# Sapo

A bash script that can convert txt to wav using the all powerful https://github.com/coqui-ai/TTS

## TTS

https://github.com/coqui-ai/TTS

### INSTALL TTS

> pip install TTS

### FIX LONG UTTERANCES PROBLEM

https://dirk.net/2021/10/31/tts-fix-max-decoder-steps/

### OTHER DEPENDENCIES

> sudo apt install sed yad sox jq

  * As a text editor I use _xed_. If you prefer, however, another text editor by default (gedit, geany, mousepad etc), please substitute __xed__ in _line 82_ of __Sapo.sh__ with the respective command of your preffered editor.
  * Likewise, instead of _celluloid_ audio player, you can use any other player you prefer, like _xplayer, mplayer, smplayer, vlc, mpv etc._ Just make sure to substitute celluloid with your preffered player in line 211 of __Sapo.sh__.
  * The same applies for _Audacity_ and any other preffred wave editor in line 222 of Sapo.sh. While _audacity_ is not considered an absolute dependency for the functionality of the script, having a wave editor installed might as well be of use in cases, so, such a choice exists in fixing potential errors.

---

### DETECTING ERRORS

### I. CLUTTER IN AUDIO OUTPUT

Sometimes  the output wav file of a text file line is longer than necessary, containing hissing sounds, inrecognisable utterrances and clutter at the end of it.

 In order to detect which wave files are generated having that problem, the ratio of _character count of line / duration of audio file_ is calculated. This ratio helps us roughly to estimate which lines were rendered with errors.

The lines that _possibly_ present this problem are written down in the errors.tsv that is generated. After the end of  all the lines, the lines written down in the tsv file get re-rendered.

 Many times this alone is enough.

After that each line one by one can be examined. The user is presented with *a few options* for each line: 

---

![5.png](screenshots/5.png)


---

These options include:

  *    **⯈Play** the respective audio file


  *    __🗘Re-render__ the line, making minor changes(like e.g. putting a fullstop at the end of the line)


---

![6.png](screenshots/6.png)


---

  *   __✀Trim the clutter__ that exists at the end of audio file, anything that exists after half a second of detected silence.
  

  *   __🗡Split render__ the line text in two batches, that will be concatenated after(useful in long sentences)

---

![7.png](screenshots/7.png)


---


  *   __🎜Edit__ the respective audio file with a wave editor(e.g._Audacity)

  *   __✗Remove__ the respective audio file directly.

  *   By hitting __😀Keep__ the user can accept the audio file as is, or after correcting it, and proceed to the next.


**After that, the audio files from all the lines will be concatenated into one.**

### II. SED SCRIPT

sapofonetix.sed is a script that substitutes words that get mispelled  with other letter combinations, that have the right pronunciation result, e.g. 
> s/biscuit/biskit/g;s/Biscuit/biskit/g

will substitute the word _biscuit_ (or _Biscuit_ in plural) with the word _biskeet_ (_Biskeet_), that its pronunciation sounds more proper.

The list of words is growing as the script gets used more, this will be an on going task:

 ___<u>feel free to chime in!</u>___

___

### SCREENSHOTS

  * File selection dialog

![0.png](screenshots/0.png)

---

  * The file is delimited to lines with fewer characters each, so there  will be no problem with the text-to-speech conversion due to excessively long lines. However, the user can edit the file further before thw speech conversion.

![1.png](screenshots/1.png)

---

![2.png](screenshots/2.png)

---

  * Progress bar , and rough estimate of time left (probably depends on hardware)

![3.png](screenshots/3.png)

---

  * Process complete, the final wav file is inside the created **Sapo_filename** folder, named **filename.wav**. 

    If the wav files (one for each line of  text file) are too many, the final wav file 
 will not be produced. In this case concatetate the wav files in smaller batches ( every 500 files), and then concatenate _those_ to the final sound file, using the **sox** command, for example:

> cd Sapo_1_1.txt
> 
> sox {000001..000500}.wav ~/Desktop/1f.wav
> 
> sox {000501..001000}.wav ~/Desktop/2f.wav
> 
> sox {001001..001500}.wav ~/Desktop/3f.wav
> 
> cd ~/Desktop
> 
> sox {1..3}f.wav final.wav
> 

![4.png](screenshots/4.png)

### Sapo-fix.sh

Sapo-fish.sh is the error-correcting routine included in Sapo.sh, that can be run on its own, when the user wants to correct the lines detected and written in errors.tsv. 

The user can also edit any line he wishes, just by entering in a line of errors.tsv the respective line number, wav number, and then run Sapo-fix.sh.
Initial commit 2022-03-04 23:44:10 +00:00			`# Sapo`

			`A bash script that can convert txt to wav using the all powerful https://github.com/coqui-ai/TTS`

Update README.md 2022-03-04 23:51:17 +00:00			`## TTS`
Initial commit 2022-03-04 23:44:10 +00:00
Update README.md 2022-03-04 23:51:17 +00:00			`https://github.com/coqui-ai/TTS`
Initial commit 2022-03-04 23:44:10 +00:00
Update README.md 2022-03-04 23:51:17 +00:00			`### INSTALL TTS`
Initial commit 2022-03-04 23:44:10 +00:00
Update README.md 2022-03-04 23:51:17 +00:00			`> pip install TTS`
Initial commit 2022-03-04 23:44:10 +00:00
Update README.md 2022-03-04 23:51:17 +00:00			`### FIX LONG UTTERANCES PROBLEM`
Initial commit 2022-03-04 23:44:10 +00:00
Update README.md 2022-03-04 23:51:17 +00:00			`https://dirk.net/2021/10/31/tts-fix-max-decoder-steps/`
update files 2022-03-06 18:34:52 +00:00
update README.md 2022-03-06 18:36:06 +00:00			`### OTHER DEPENDENCIES`
update files 2022-03-06 18:34:52 +00:00
update files 2022-03-09 01:45:12 +00:00			`> sudo apt install sed yad sox jq`
update README.md 2022-03-06 21:50:56 +00:00
update files 2022-03-09 01:45:12 +00:00			`* As a text editor I use _xed_. If you prefer, however, another text editor by default (gedit, geany, mousepad etc), please substitute __xed__ in _line 82_ of __Sapo.sh__ with the respective command of your preffered editor.`
			`* Likewise, instead of _celluloid_ audio player, you can use any other player you prefer, like _xplayer, mplayer, smplayer, vlc, mpv etc._ Just make sure to substitute celluloid with your preffered player in line 211 of __Sapo.sh__.`
			`* The same applies for _Audacity_ and any other preffred wave editor in line 222 of Sapo.sh. While _audacity_ is not considered an absolute dependency for the functionality of the script, having a wave editor installed might as well be of use in cases, so, such a choice exists in fixing potential errors.`
update README.md 2022-03-07 00:26:43 +00:00
update README.md 2022-03-09 01:53:42 +00:00			`---`

update files 2022-03-09 01:45:12 +00:00			`### DETECTING ERRORS`
update README.md 2022-03-07 00:26:43 +00:00
update README.md 2022-03-16 22:28:10 +00:00			`### I. CLUTTER IN AUDIO OUTPUT`
update files 2022-03-09 01:45:12 +00:00
update README.md 2022-03-16 22:28:10 +00:00			`Sometimes the output wav file of a text file line is longer than necessary, containing hissing sounds, inrecognisable utterrances and clutter at the end of it.`
update README.md 2022-03-09 01:53:42 +00:00
			`In order to detect which wave files are generated having that problem, the ratio of _character count of line / duration of audio file_ is calculated. This ratio helps us roughly to estimate which lines were rendered with errors.`

update files 2022-03-09 12:25:41 +00:00			`The lines that _possibly_ present this problem are written down in the errors.tsv that is generated. After the end of all the lines, the lines written down in the tsv file get re-rendered.`
update files 2022-03-09 01:45:12 +00:00
update files 2022-03-09 12:25:41 +00:00			`Many times this alone is enough.`
update files 2022-03-09 01:45:12 +00:00
update files 2022-03-09 12:25:41 +00:00			`After that each line one by one can be examined. The user is presented with a few options for each line:`

			`---`

			`![5.png](screenshots/5.png)`
update files 2022-03-09 01:45:12 +00:00



update files 2022-03-09 12:25:41 +00:00			`---`

update files 2022-03-09 13:18:41 +00:00			`These options include:`
update files 2022-03-09 12:25:41 +00:00
update files 2022-03-09 13:18:41 +00:00			`* ⯈Play the respective audio file`
update files 2022-03-09 01:45:12 +00:00

update files 2022-03-09 13:18:41 +00:00			`* __🗘Re-render__ the line, making minor changes(like e.g. putting a fullstop at the end of the line)`
update files 2022-03-09 12:25:41 +00:00
update files 2022-03-09 01:45:12 +00:00
update README.md 2022-03-09 01:53:42 +00:00			`---`
update files 2022-03-09 01:45:12 +00:00
update files 2022-03-09 12:25:41 +00:00			`![6.png](screenshots/6.png)`




			`---`

update README.md 2022-03-09 13:42:18 +00:00			`* __✀Trim the clutter__ that exists at the end of audio file, anything that exists after half a second of detected silence.`
update files 2022-03-09 13:18:41 +00:00
update files 2022-03-09 12:25:41 +00:00
update README.md 2022-03-09 13:42:18 +00:00			`* __🗡Split render__ the line text in two batches, that will be concatenated after(useful in long sentences)`
update files 2022-03-09 12:25:41 +00:00
			`---`

			`![7.png](screenshots/7.png)`

update files 2022-03-09 01:45:12 +00:00


update README.md 2022-03-09 01:53:42 +00:00			`---`
update files 2022-03-09 01:45:12 +00:00
update files 2022-03-09 12:25:41 +00:00
update files 2022-03-09 13:18:41 +00:00			`* __🎜Edit__ the respective audio file with a wave editor(e.g._Audacity)`
update files 2022-03-09 12:25:41 +00:00
update files 2022-03-09 13:18:41 +00:00			`* __✗Remove__ the respective audio file directly.`
update files 2022-03-09 12:25:41 +00:00
update files 2022-03-09 13:18:41 +00:00			`* By hitting __😀Keep__ the user can accept the audio file as is, or after correcting it, and proceed to the next.`
update files 2022-03-09 12:25:41 +00:00

			`After that, the audio files from all the lines will be concatenated into one.`

update files 2022-03-09 01:45:12 +00:00			`### II. SED SCRIPT`
upload README.md 2022-03-06 22:04:27 +00:00
			`sapofonetix.sed is a script that substitutes words that get mispelled with other letter combinations, that have the right pronunciation result, e.g.`
			`> s/biscuit/biskit/g;s/Biscuit/biskit/g`

			`will substitute the word _biscuit_ (or _Biscuit_ in plural) with the word _biskeet_ (_Biskeet_), that its pronunciation sounds more proper.`

update README.md 2022-03-09 01:53:42 +00:00			`The list of words is growing as the script gets used more, this will be an on going task:`

			`___<u>feel free to chime in!</u>___`
upload README.md 2022-03-06 22:04:27 +00:00
update README.md 2022-03-09 01:53:42 +00:00			`___`
upload README.md 2022-03-06 22:04:27 +00:00
update README.md 2022-03-06 21:50:56 +00:00			`### SCREENSHOTS`

upload README.md 2022-03-06 23:00:19 +00:00			`* File selection dialog`

update README.md 2022-03-06 21:50:56 +00:00			`![0.png](screenshots/0.png)`

			`---`

upload README.md 2022-03-06 23:00:19 +00:00			`* The file is delimited to lines with fewer characters each, so there will be no problem with the text-to-speech conversion due to excessively long lines. However, the user can edit the file further before thw speech conversion.`

update README.md 2022-03-06 21:50:56 +00:00			`![1.png](screenshots/1.png)`

			`---`

			`![2.png](screenshots/2.png)`

			`---`

upload README.md 2022-03-06 23:00:19 +00:00			`* Progress bar , and rough estimate of time left (probably depends on hardware)`
update README.md 2022-03-06 21:50:56 +00:00
			`![3.png](screenshots/3.png)`

			`---`

upload README.md 2022-03-06 23:00:19 +00:00			`* Process complete, the final wav file is inside the created Sapo_filename folder, named filename.wav.`

			`If the wav files (one for each line of text file) are too many, the final wav file`
			`will not be produced. In this case concatetate the wav files in smaller batches ( every 500 files), and then concatenate _those_ to the final sound file, using the sox command, for example:`

			`> cd Sapo_1_1.txt`
			`>`
			`> sox {000001..000500}.wav ~/Desktop/1f.wav`
			`>`
			`> sox {000501..001000}.wav ~/Desktop/2f.wav`
			`>`
			`> sox {001001..001500}.wav ~/Desktop/3f.wav`
			`>`
			`> cd ~/Desktop`
			`>`
Update README.md 2022-03-06 23:06:27 +00:00			`> sox {1..3}f.wav final.wav`
upload README.md 2022-03-06 23:00:19 +00:00			`>`

Update README.md 2022-03-06 23:06:27 +00:00			`![4.png](screenshots/4.png)`
update files 2022-03-09 11:04:13 +00:00
			`### Sapo-fix.sh`

update files 2022-03-09 12:25:41 +00:00			`Sapo-fish.sh is the error-correcting routine included in Sapo.sh, that can be run on its own, when the user wants to correct the lines detected and written in errors.tsv.`
update files 2022-03-09 11:04:13 +00:00
update files 2022-03-09 12:25:41 +00:00			`The user can also edit any line he wishes, just by entering in a line of errors.tsv the respective line number, wav number, and then run Sapo-fix.sh.`