Commit Graph

52 Commits (0d401c8757d6138628b45778eba9a928034404ad)

Author SHA1 Message Date
penrods eafcc1abfb Several extensions to text normalization:
* intent_failure message now carries along the utterance's lang code
* normalizing query for Wolfram Alpha
* added normalization of "whats" to "what is".  This is technically incorrect ("whats" means more than one instance of "what", as in "the whats and whys of open source"), but that is a rare phrase.  Unfortunately, several STT engines incorrectly output things like "whats 8 + 4", which is grammatically incorrect.  So we'll handle the common and potentially screw up the uncommon.
* more parsing test cases, including a few corrections
2017-03-14 13:43:45 -05:00
penrods cfa79e03a2 Fixes issue #539
The utterance is now placed on the bus along with its language code.  If not specified, it uses "en-us".

Added a new mycroft.util.parse module.  It contains the normalize() function.  Normalization currently does two things:
  * Expands contractions ("they're" -> "they are", etc)
  * Optionally removes articles ("a", "an", "the").  Removing is the default.
  * Textual numbers become digits, up to 20.  E.g. "What is the weather in four days" becomes "What is weather in 4 days".

NOTE:  This is potentially a breaking change!  Remove "the", "a" and "an" from your .voc files!

Skill changes:
  * I cleaned up the .voc files for the default Skills.
  * Split the date_time keyword into an extra entity.  Now a "QueryKeyword.voc" exists, with "what|tell" instead of combing that into "what is time" in the TimeKeyword.voc.
  * Volume skill now accepts 1-11, e.g. "turn volume to 11"
2017-03-14 13:43:45 -05:00