Pulseaudio allows corking / ducking for streams with prioritized roles
This sets the role for the mycroft speech to "phone" and all other to
"music" if the config tts->pulse_duck is set to true.
the old "__metaclass__" has been ignored since the switch to python 3
this restores the metaclass functionality by updating it to the new
class kwarg syntax
When the mimic2 TTS instance is created the phrases from mycroft-core and mycroft-wifi setup (if available) is generated and stored locally (defaults to /opt/mycroft/preloaded_cache but can be changed with the mimic2 config parameter "preloaded_cache"
On startup the cache will be copied into the cache directory to speed up default interactions.
* Fix mimic 2 long sentences
Fixes bug in the second and third chunking pass incorrectly by
concatinating strings with lists resulting in chunks of single
characters.
* Handle mimic2 chunking correctly
- Move preprocessing from get_tts() to a method called from tts execute,
this allows all parts to be spoken and the caching to work correctly
- Remove duplicate of phonetic spelling in mimic2_tts
Mimic2 voices has several issues:
* Numeric generation was all goofed up for decimal numbers. It was
independently doing number to speech for the values before and after
the period on a decimal number. E.g. 1.100 became "one.one hundred".
Removed local normalization altogether since it is handled
correctly on the Mimic2 server now.
* Sentence splitting was being applied in the middle of things like
numbers or abbreviations. E.g. "I.B.M."
* Chunker algorithms were making unnecessary copies of string arrays
While in there I also:
* Removed several unused imports
* Cleaned up docstrings
* Prefixed private helpers with _
* Added debug logging of mimic2 request text
* Remove the old single viseme message
Instead a single viseme message is sent
* Correct the spelling of viseme.
Ref: https://en.wikipedia.org/wiki/Viseme
* Remove debug print.
This was probably left in by mistake, removing to clean up the audio log somewhat.
The settings code worked, but was noisy and generally messy about
a few exceptional but common situations:
* When the .mycroft/skills/<SkillName> folder didn't already exist
* When network timeouts and such occcurred
I also slipped in a couple trivial code cleanups for an unused variable
and a log message.
* Refactor mimic2 to use the shared tts architecture
* Make sure the queue is cleared
- Add a convenience method grouping clear_queue and clear_visemes
- The start time is now set before the lock to allow multiple speech requests queued before the stop signal to also be cancelled
- Make sure the any pending TTS generation is cleared from the queue by calling tts.clear() when breaking from the chunking loop.
* Add new api command to send visemes as single list. This allows more efficient use of the messagebus and gives implementors flexibility in how they handle the visualization.
* Switch mark1 to use viseme_list
* Fix mimic2 negative numbers
Make the regex extracting numbers also match negative numbers when preparsing phrases sent to the mimic2 service
* Update pronounce_number to use "minus" for negatives
After discussion in the chat it was suggested to use "minus" for negatives as default.
When scientific notation is used the term "negative " is still used.
The "phonetic_spellings.txt" mechanism converts odd words to strings that
represent what they should sound like when spoken. For example, "mycroftai"
to "Mycroft A.I.". This provides an easy mechanism to provide hints to
lots of Text to Speech engines.
This adds it to Mimic2, along with a spelling of "corgi".
This is a step towards abstracting the idea of an Enclosure which ties Mycroft to the hardware that is running Mycroft.
- Move the enclosure API to mycroft.enclosure.api (previously was mycroft.client.enclosure.api)
- Move display_manager out of enclosure client to mycroft.enclosure.display_manager
- Merge EnclosureWeather into EnclosureMouth
- Wrap display manager in a class
Numbers are now normalized to the text equivalent. This is to solve problems where mimic2 returns a speed up text generation when saying number values.
* added asynchronous request to mimic2 backend to break up audio into chunks
* add chracter threshold
* refactored split_by_punctuation
* add punctuation function to sentence chunks
* fix spelling
* some more spelling
* added mimic2 in mycroftconf
* removed unused imports
The Mark 1 button press can now be "consumed" when a skill handles
the Stop command. When this happens, the button press will not
trigger listening mode. An additional press would be needed to
trigger listening.
This introduces the "mycroft.stop.handled" messagebus message. It
carries a data field called "by" which identifies who handled it.
Currently the values are "TTS" for when speaking ends or the name
of a skill which implements Stop and returns True from the call.
Also fixed a potential bug when the flag to clear queued visemes
was left set after a button press.
- Engines now specify if they support ssml rather than the configuration
- The text client strips out ssml tags
- Engines can modify tags via the `self.modify_tag` method
Add Python 2/3 compatibility
==== Tech Notes ====
This allows the main bus, skills and cli to be run in both python 2.7 and
3.5+.
Mainly trivial changes
- syntax for exceptions
- logic for importing correct Queue module
- .iteritems -> future.utils.iteritems when accessing dicts key value
pairs
* Allow audio service to be run in python 3
* Make speech client work with python 3
* Importing of Queue version dependent
* Exception syntax corrected
* Creating sound buffer is version dependant
- Adapt context use range from builtins
- Use compatible next() instead of .next() when walking the skill
directory
* Make CLI Python 3 Compatible
- Use compatible BytesIO instead of StringsIO
- Open files as text instead of binary
- Make sure integer divisions are used
* Make messagebus send compatible
* Fix failing travis
Re-add future 0.16.0
* Make string checks compatible
* basestring doesn't exist in python 3 so it's imported from the "past"
* Fix latest compatibility issues in speech client
- handle urllib
- handle encoding before calling md5
* Make Api.build_json() python 2/3 compatible
The isSpeaking signal would only be generated when the actual audio playback
started, but this could be several seconds for TTS engines like Mimic which
take some time to generate the audio file for playback. This changes the
creation of the "isSpeaking" signal to the start of the execute() method,
which should queue up audio and leave the signal set until the queue has
eventually been cleared.
When the TTS engine provides visemes to the faceplate, the information
passed along consists of the mouth shape and the duration to display it.
When the system get backed-up for some reason (e.g. the CPU is briefly
overloaded), the code would attempt to catch up the animation but would
still send the 'expired' viseme across the serial port to the faceplate
with no wait-time. This results in a fast-moving mouth to catch up,
which isn't very pleasing.
Now the viseme is passed along with an expiration date, so if the time
to display it has already passed then the viseme code gets thrown away
instead of being sent across the (relatively slow) serial port. This
allows better catch-up.
==== Tech Notes ====
Since the playback now is performed in a thread the curate_cache could
clean out generated speech before or in the middle of playing back the
queue.
==== Tech Notes ====
Since the voice is a quite large download stalling download is a real
possibility. Using wget allows for resume and retry of download in a
simple way.
==== Tech Notes ====
- Using download utility to download voice binary
- reverts to default voice if not premium
- uses default voice during download and switches over when done
This commit officially switches the mycroft-core repository from
GPLv3.0 licensing to Apache 2.0. All dependencies on GPL'ed code
have been removed and we have contacted all previous contributors
with still-existing code in the repository to agree to this change.
Going forward, all contributors will sign a Contributor License
Agreement (CLA) by visiting https://mycroft.ai/cla, then they will
be included in the Mycroft Project's overall Contributor list,
found at: https://github.com/MycroftAI/contributors. This cleanly
protects the project, the contributor and all who use the technology
to build upon.
Futher discussion can be found at this blog post:
https://mycroft.ai/blog/right-license/
This commit also removes all __author__="" from the code. These
lines are painful to maintain and the etiquette surrounding their
maintainence is unclear. Do you remove a name from the list if the
last line of code the wrote gets replaced? Etc. Now all
contributors are publicly acknowledged in the aforementioned repo,
and actual authorship is maintained by Github in a much more
effective and elegant way!
Finally, a few references to "Mycroft AI" were changed to the correct
legal entity name "Mycroft AI Inc."
==== Fixed Issues ====
#403 Update License.md and file headers to Apache 2.0
#400 Update LICENSE.md
==== Documentation Notes ====
Deprecated the ScheduledSkill and ScheduledCRUDSkill classes.
These capabilities have been superceded by the more flexible MycroftSkill
class methods schedule_event(), schedule_repeating_event(), update_event(),
and cancel_event().
==== Tech Notes ====
isSpeaking was lowered as soon a the tts had synthesized the audio and
not when the output finished. This commit moves the signal
raising/lowering to the tts instead of the 'mycroft.speak' handler.
===Fixed issues ====
#958
==== Tech Notes ====
Adds method clear_visimes() to voice playback thread to stop visime stream
instead of having visime stream check for signals.
==== Documentation Notes ====
NONE - things like description of a new feature or notes on behavior
changes
==== Localization Notes ====
NONE - point out new strings, functions needing international versions,
etc.
==== Environment Notes ====
NONE - new package requirements, new files being written to disk,
etc.
==== Protocol Notes ====
NONE - message types added or changed, new signals, etc.
* Check queue empty with self.queue.empty() instead of len()
* Add error logging of exceptions in tts thread.
* Limit the number of audio_output_start messages.
recognizer_loop:audio_output_start message will only be sent if the
queue has been empty since last loop.
* BUGFIX: The big bug was calling is_paired() during wake_word_in_audio(). When not paired, that call hit the server, taking about a second. Since it happened multiple times a second, the audio buffers got backed up hugely. This resulted in weird behavior later as the buffers get cleared out.
* Added mycroft.api.has_been_paired(), which just looks for the pairing key (it does not validate it is still active with the server, like is_paired())
* The enclosure now checks for internet connectivity and kicks off the wifisetup process, not the wifisetup client itself.
* During the "onboarding" process, the microphone is muted using the new "mycroft.mic.mute" message. After pairing completes, the "mycroft.mic.unmute" is expected to be sent from the pairing skill. Unmuting again after a re-pairing is harmless.
* mute_and_speak() is smart enough to not unmute itself when complete if muted before
* util.check_for_signal() now accepts -1 as the lifetime. This means it never times out.
* util.stop_speaking() is more intelligent about shutting down the spoken text (including text that has been split at periods) and visemes
* Added a mycroft.api.is_paired() method
* Added mycroft.util.is_speaking and mycroft.util.wait_while_speaking() methods
* RESET now waits for the spoken notice to complete
* Stopped the "Checking for updates" and "Skills updated" prompts (commented out for now, probably will eliminate)
* Wifi setup filters out hidden ("x00") networks
* Visemes should keep up better if they get behind (will skip)
* Mimic is now searched for on the users path
* Onboarding process:
- wifi setup starts automatically
- User is walked through the process
- wake word and button pressing are ignored
- At end, a short tutorial is given
The TTS audio is now cached. If the same TTS is requested again, the cached WAV and phoneme sequence is reused.
Major points:
* Created mycroft.util.get_cache_directory(). You can give this a domain, also. The mycroft.conf can define where this directory resides, so enclosures can have this reside on a ramdisk, for instance.
* Created mycroft.util.curate_cache(). This retains a percentage of the disk size free.
* upgrading gTTS
- using play_mp3 to play mp3 files
Conflicts:
mycroft/tts/google_tts.py
* Update gTTS version
* default to 'us-en' if lang is omitted
* Fix multiple sentence speech.
Wait until audio has been played before exiting `GoogleTTS.execute()`
- Initialize tts ws and enclosure at the main process
Note:
- This is a minimal change to fix the problem.
- The ultimate goal is to have a totally isolated TTS process which requires its own main and ws initialization to be developed soon.
The 1980s birthed a new form of interaction between computers and users. For the first time computers became capable of understanding the most basic form of human communication - pointing and grunting. The mouse and the GUI revolutionized computing and made computers accessible to the masses.
We have now entered a third era. We are rapidly approaching a time when computer systems will understand human language and respond using the most natural form of human communication – speech.
This is an important development. Some might even call it revolutionary.
Despite its importance, however, the technologies that will underpin this new method of interaction are the property of major tech firms who don't necessarily have the public's best interests at heart.
Not anymore.
Meet Mycroft – the worlds first open source natural language platform. Mycroft understands human language and responds with speech. It is being designed to run on anything from a phone to an automobile and will change the way we interact with open source technologies in profound ways.
Our goal here at Mycroft is to improve this technology to the point that when you interact with the software it is impossible to tell if you are talking to a human or a machine.
This initial release of the Mycroft software represents a significant effort by the Mycroft community to give the open source world access to this important technology. We are all hoping that the software will be useful to the public and will help to usher in a new era of human machine interaction.
Our community welcomes everyone to use Mycroft, improve the software and contribute back to the project. With your help and support we can truly make Mycroft an AI for everyone.
Joshua W Montgomery – May 17, 2016