update README.md

main
Christos Angelopoulos 2022-03-09 03:53:42 +02:00
parent 6f7df5492c
commit 24f3dc7a65
1 changed files with 20 additions and 10 deletions

View File

@ -22,37 +22,44 @@ https://dirk.net/2021/10/31/tts-fix-max-decoder-steps/
* Likewise, instead of _celluloid_ audio player, you can use any other player you prefer, like _xplayer, mplayer, smplayer, vlc, mpv etc._ Just make sure to substitute celluloid with your preffered player in line 211 of __Sapo.sh__.
* The same applies for _Audacity_ and any other preffred wave editor in line 222 of Sapo.sh. While _audacity_ is not considered an absolute dependency for the functionality of the script, having a wave editor installed might as well be of use in cases, so, such a choice exists in fixing potential errors.
---
### DETECTING ERRORS
### I. CLATTER IN AUDIO OUTPUT
Sometimes the output wav file of a text file line is longer than necessary, containing hissing sounds, inrecognisable utterrances and clatter at the end of it. In order to detect which wave files are generated having that problem, the ratio of _character count of line / duration of audio file_
is calculated. The lines that _possibly_ present this problem are written down in the error.tsv that is generated. After the end of all the lines,
Sometimes the output wav file of a text file line is longer than necessary, containing hissing sounds, inrecognisable utterrances and clatter at the end of it.
In order to detect which wave files are generated having that problem, the ratio of _character count of line / duration of audio file_ is calculated. This ratio helps us roughly to estimate which lines were rendered with errors.
The lines that _possibly_ present this problem are written down in the error.tsv that is generated. After the end of all the lines,
* the lines written down in the tsv file get re-rendered. Many times this alone is enough.
* After that each line one by one can be examined. The user can
> 1.__Play__ the respective audio file
1.__Play__ the respective audio file
>2.__Re-render__ the line, making minor changes(like e.g. putting a fullstop at the end of the line)
2.__Re-render__ the line, making minor changes(like e.g. putting a fullstop at the end of the line)
>3.__Trim the clutter__ that exists at the end of audio file, anything that exists after half a second of detected silence.
3.__Trim the clutter__ that exists at the end of audio file, anything that exists after half a second of detected silence.
>4.__Split__ render the line text in two batches, that will be concatenated after(useful in long sentences)
4.__Split__ render the line text in two batches, that will be concatenated after(useful in long sentences)
>5.__Edit__ the respective audio file with a wave editor(e.g._Audacity)
5.__Edit__ the respective audio file with a wave editor(e.g._Audacity)
>6.__Remove__ the respective audio file directly.
6.__Remove__ the respective audio file directly.
>7.By hiting __OK__ the user vcan accept the audio file as is, or after correcting it, and proceed to the next.
7.By hiting __OK__ the user can accept the audio file as is, or after correcting it, and proceed to the next.
---
![5.png](screenshots/5.png)
After that, the audio files from all the lines will be concatenated into one.
---
### II. SED SCRIPT
@ -61,8 +68,11 @@ sapofonetix.sed is a script that substitutes words that get mispelled with othe
will substitute the word _biscuit_ (or _Biscuit_ in plural) with the word _biskeet_ (_Biskeet_), that its pronunciation sounds more proper.
The list of words is growing as the script gets used more, ___<u>feel free to chime in!</u>___
The list of words is growing as the script gets used more, this will be an on going task:
___<u>feel free to chime in!</u>___
___
### SCREENSHOTS