This improves the utility of the _ReplaceableNumber class, and updates
most of the number parsing functions to take tokens rather than text.
This simplifies the interactions between many of the functions, as there
is no need to convert back and forth between text and tokens.
This also adds some tests. Note that there are a few regressions that
will be fixed in a subsequent commit.
This was calling convert_words_to_numbers and parsing out the resuling
numbers, which was a simple way of getting the numebrs in order, but it
choked on anything that didn't match the regex being used to parse
numbers, in particular numbers of the form '6e18'. The better solution
is to directly use extract_numbers_with_text (which now sorts by
start_index) and get the values from there directly.
This is in support of issues-1959.
"Five hours seven and a half minutes" was parsing as 5.5. This is
resolved. Multiple fractions/decimals still cause problems, e.g.
convert_words_to_numbers("seven and a half and nine and a half")
Out[5]: '7 and a 0.5 and 9 and a 0.5'
This is in support of issues-1959.
A small bug caused things like "two hundred twenty" to return only the
"hundred tenty" for the text. This has been fixed.
extract_numbers_with_text was updated to deal with the new return types
of the functions it depends on. Specifically, it accounts for the start
and end index values.
This is in support of issues-1959.
Previously it was assumed that the orgiginal text would be enough to
determine where in a string a number should go, however, in some
scenarios, that does not work, and results in the wrong values being
parsed.
A different, and smarter approach is being taken now, in which the
original string is initially split into a list of tuples of
(index, word) where index is the index of the word within the string.
All subsequent processing is done on these tuples, meaning we always
know exactly where the words were in the orginal string. This should
make text replacement perfect, as we can always sub out the exact,
correct words, based on their indicies.
extract_number_with_text_en now returns the number parsed, the text that
represents the number, the start index, and the end index.
Things are not yet working perfectly. Here is roughly the current state
of the world:
from mycroft.util.lang.parse_en import *
extract_number_with_text_en("this is some two hundred thousand twenty
two hours")
Out[3]: (200022, 'hundred thousand twenty two', 4, 7)
extract_number_with_text_en("this is some twenty two hours")
Out[4]: (22, 'twenty two', 3, 4)
extract_number_with_text_en("this is some twenty hours")
Out[5]: (20, 'twenty', 3, 3)
extract_number_with_text_en("this is some two and a half hours")
Out[6]: (2, 'two', 3, 3)
extract_number_with_text_en("this is some two point five hours")
Out[7]: (2, 'two', 3, 3)
The list of tuples is a bit of a hassle to deal with. In a future
commite the will be replaced with dictionaries, or even better, Token
objects, that contain the word and it's index. This would make the
code easier to reason about (removing lots things like words[0][1]
which has no meaning without deep understanding of the code).
This is in support of issues-1959.
Articles (a, an, the) that appeared immediately before the number were
included in the returned text. This fixes those issues.
The solution isn't super clean - it craetes a new function to wrap the
old one (unavoidable, since the articles can be needed for fractions),
and splits the returned string, the strips leading artivles.
Since strings are immutable, this is probably not super efficient. Might
be better to eventually use lists as much as possible, and only create a
string at the end (though the lists will come with their own problems,
so that could turn out to be a wash). In any case, this works for now.
Issues fixed:
Lists, e.g. "some words one two three" would return (3, "one two three")
Negaitve words were not included in output, e.g. "negative five" would
return (-5, "five").
This is in support of issues-1959.
Any articles (a, an, the) appearing before the number would be included
in the text, e.g. "set a timer for eight minutes" returned:
(8, 'a eight')
This fix clears the number words list if no number words have been
found. Note that articles can't simple be filtered, as they are
often a part of the numebr (e.g. three and a half).
This is in support of issues-1959.
Methods implemented include:
extract_number_with_text
extract_numbers_with_text
convert_words_to_numbers
extract_duration
This is in support of issues-1959. This continues the work of
returning the relevant text that corresponds to a number
parsed from a string.
This is in support of issues-1959. This begins the work of returning the
relevant text that corresponds to a number parsed from a string. Here's
a sample showing the basic functionality/state of the world (showing
some of the remaining cases to handle)
>>> from mycroft.util.lang.parse_en import *
>>> extractnumber_en_with_text("three hours twenty minutes")
(3, 'three')
>>> extractnumber_en_with_text("twenty minutes")
(20, 'twenty')
>>> extractnumber_en_with_text("twenty five minutes")
(25, 'twenty five')
>>> extractnumber_en_with_text("two hundred twenty five minutes")
(225, 'two hundred twenty five')
>>> extractnumber_en_with_text("three and a half minutes")
(3.5, 'three and a half minutes')
>>> extractnumber_en_with_text("three point five minutes")
(3.5, 'three point five minutes')
==== Tech Notes ====
Checks if the word being parsed is relevant to number parsing. If it is
not, and we have already found number words, we return what we have. If
it is, we add it to a list of words representing the current number
being parsed.
==== Documentation Notes ====
The old implementation of extractnumber_en seems to generally return the
last number in the text. This change will cause the first number to be
returned.
This is part of a refactor of extractnumber_en, with the ultimate
goal of making it easier to maintain and extend (should also
improve perf). This is in support of issues-1959.
This is part of a refactor of extractnumber_en, with the ultimate
goal of making it easier to maintain and extend (should also
improve perf). This is in support of issues-1959.
All tests (minus extract_duration, which has not yet been implemented)
are passing at this stage.
This is part of a refactor of extractnumber_en, with the ultimate
goal of making it easier to maintain and extend (should also
improve perf). This is in support of issues-1959.
All tests (minus extract_duration, which has not yet been implemented)
are passing at this stage.
This is part of a refactor of extractnumber_en, with the ultimate goal
of making it easier to maintain and extend (should also improve perf).
This is in support of issues-1959.
Begins a refactor of extractnumber_en, with the ultimate goal of making
it easier to maintain and extend (should also improve perf). This is
in support of issues-1959.
* Add communication from GUI to skills
- "set" events from Qt will set/update a variable in the skills .gui member
- It's possible to add general event handlers using self.gui.register_handler()
- Moved registration of skill_id to just after skill init
* Ensure that simultaneously writes doesn't occur
Wrap WebSocketHandler.write_message() with a lock in an attempt to handle "buffererror: existing exports of data: object cannot be re-sized."
* Add better logging to help debug disconnect issue
* Allow overriding the idle page
SkillGUI.show_page() and SkillGUI.show_pages() now takes an optional
override_idle parameter. This is used as a hint by the mark-2 skill
and if possible the idle screen will not be shown.
* Improve debugging using Logger
* Raise exception when sending a non-existing gui page
* Restore running state to new connections
When a GUI is connected data and running namespaces are synchronised and
shown.
This refactors the code quite a bit moving the GUI state from the GUIConnection
object to the Enclosure.
The GUIConnection object does the handles the sync in the on_connection_open()
method.
* Add gui.page_interaction message
Currently triggered on page change on the display.
* Handle message when gui changes sessionData
* Check if socket exists on gui before sending data
* Increase port on each failure and retry
If we are missing the ".mycroft_cli.conf" file we print a message to the
user informing them that we are ignoring the missing file along with a stack trace. The stack trace is really not necessary so this removes it.
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
To avoid UnicodeDecodeError's when opening files tell Python that we
want to open the file as a utf8 encoded file.
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Ctrl+C was not being properly handled by the CLI, resulting in an unclean
shutdown and other unexpected consequences -- e.g. a
"./start-mycroft.sh debug" would kill the background processes on exit. The
KeyboardException was never being triggered.
Now the SIGINT is being captured and Ctrl+C behaves (as intended) just like
pressing Ctrl+X.