Commit Graph

127 Commits (0d401c8757d6138628b45778eba9a928034404ad)

Author SHA1 Message Date
jarbasal 64a58f3290 tonight remainder 2019-03-27 11:15:24 +00:00
Ruthvicp b28011681e handling tonight in date & time extraction (#2066)
* handling tonight in date & time extraction
2019-03-27 01:41:50 -05:00
jarbasal 42e258610e feature/format_pt 2019-03-26 13:45:59 +01:00
Steve Penrod 44f60ec6f3 Change default lang to None, not English
Much of the code used "en-us" as the default value when not specified.
This limited the internationalization potential.  Changing the default
to None and adds the ability to define the default lang code from other
locations in code.  E.g.

```python

from mycroft.util.lang import set_default_lang

set_default_lang("en-us")
print("English date: "+nice_date(dt))

set_default_lang("de-de")
print("German date: "+nice_date(dt))
```

This allows easier localization of Skills by having the framework set the default without any changes necessary by the Skill writers.

Other minor changes:
* Changed the default return value of get_gender*() to None instead of False
2019-03-14 10:57:31 +01:00
Andreas Lorensen 6c0cd8d427 Add danish 2 - initial work to get danish to core (#2033)
* Danish formatting and parsing functions
2019-03-04 21:40:12 +01:00
Angel Docampo 9c22ced394 Initial castillian format translations
This commit add the initial translations of core functions for format
numbers: nice_number_es, pronounce_number_es and nice_time_es.

==== Localization Notes ====
NONE - Castillian (Spain's spanish)
2019-03-04 15:37:10 +01:00
Ruthvicp ec7ed25ed5 date and time - for tonight, weekdays (#2023)
* date and time - for tonight, weekdays

* Updating the previous commit test_extractdatetime_en

* Editing comments for extract_date_time_en

* Generalized as a marker

Expanded this from just handling "weekends" to any day or plural of the day, "weekend", "weekday" or "weekdays".
2019-02-28 01:15:55 -06:00
Ale df7dfaa006 Update parse_it.py to 18.8.12 (#1990)
* update functions in parse_it.py
* update tests
* translate docstrings to english
2019-02-27 08:00:50 +01:00
Steve Penrod 631875d9c2 Tweak English extract_datetime() parsing
Ambiguous times previously used a generally unexpected rule about
when to jump 12 hours when parsing times.  Now it follows the rule:
If a time is spoken without am/pm indicator, assume the next time
that hasn't passed was intended.

For example:
  "at 7 o'clock"
Would mean 7:00am if spoken at 6:59am, but would mean 7:00pm
if spoken at 7:01am.
2019-02-20 01:56:20 -06:00
Åke 65a7197519
Merge pull request #1977 from ChristopherRogers1991/feature/issue-1959
Add extract_duration() method
2019-02-13 16:08:49 +01:00
Chris Rogers 113352339d Fix pep8 issues.
This is in support of issues-1959.
2019-02-12 17:29:21 -05:00
Chris Rogers acbe46aede Fix documentation
Fix typo and add an explanation for
_extract_number_with_text_en_helper.
2019-02-12 16:36:11 -05:00
Chris Rogers cc0d3da62c Change _Token to a namedtuple.
This is in support of issues-1959.
2019-02-12 16:23:53 -05:00
Ale 307910e53e update format_it.py test_format_it.py (#1984)
* Update format_it.py and tests
2019-02-11 18:07:14 +01:00
Chris Rogers 1bb74f5c79 Use isinstance instead of type.
This is in support of issues-1959.
2019-02-03 12:58:31 -05:00
Chris Rogers cdf7dc3756 Use datetime.timedelta for extract_duration_en.
This is in support of issues-1959.
2019-02-03 12:16:36 -05:00
Chris Rogers d50cb00de8 Fix pep8 issues.
This is in support of issues-1959.
2019-02-02 14:21:37 -05:00
Chris Rogers 9e193c74e0 Make constant and functions private.
Making things weak private, to limit surface area of support. As things
become increasingly stable/tested/useful, it may make sense to open them
up, but for now, keeping them private will limit risk.

This is in support of issue-1959.
2019-02-02 14:07:48 -05:00
Chris Rogers 351381bca2 Fix pep8 issues.
This is in support of issues-1959.
2019-02-02 14:03:19 -05:00
Chris Rogers 7049e65cbe Minor shuffling + update docs.
After many changes, things had gotten a little disorganized, and the
docs were a little out of date. This brings them up to date.

This is in support of issues-1959.
2019-02-02 13:45:22 -05:00
Chris Rogers 534ca2aff9 All regressions in number parsing fixed.
This is in support of issues-1959.
2019-02-02 13:12:25 -05:00
Chris Rogers 6da1ec5c6e Fix regression in number parsing.
Fix regression that caused "X and one half" to parse as just X.

This is in support of issues-1959.
2019-02-01 23:24:45 -05:00
Chris Rogers f4eee8726a Refactor many methods in parse_en.
This improves the utility of the _ReplaceableNumber class, and updates
most of the number parsing functions to take tokens rather than text.
This simplifies the interactions between many of the functions, as there
is no need to convert back and forth between text and tokens.

This also adds some tests. Note that there are a few regressions that
will be fixed in a subsequent commit.
2019-02-01 23:04:54 -05:00
Chris Rogers 95aca10294 Fix _extract_decimal
Actually use the short_scale and ordinals values.
2019-02-01 18:57:28 -05:00
Chris Rogers 2ce632389f Fix and simplfy extract_numbers_en
This was calling convert_words_to_numbers and parsing out the resuling
numbers, which was a simple way of getting the numebrs in order, but it
choked on anything that didn't match the regex being used to parse
numbers, in particular numbers of the form '6e18'. The better solution
is to directly use extract_numbers_with_text (which now sorts by
start_index) and get the values from there directly.

This is in support of issues-1959.
2019-02-01 18:51:03 -05:00
Chris Rogers 6645ab6bfe Add short_scale and ordinal args to helpers.
This is in support of issues-1959.
2019-02-01 18:28:40 -05:00
Chris Rogers 5c74789c2d Lower text before parsing.
This is in support of issues-1959.
2019-02-01 18:21:13 -05:00
Chris Rogers 23edb9eb00 Fix decimal/fraction parsing with leading numbers.
"Five hours seven and a half minutes" was parsing as 5.5. This is
resolved. Multiple fractions/decimals still cause problems, e.g.

convert_words_to_numbers("seven and a half and nine and a half")
Out[5]: '7 and a 0.5 and 9 and a 0.5'

This is in support of issues-1959.
2019-02-01 17:39:01 -05:00
Chris Rogers a3e94bcbc6 Add numbers, e.g. '20', '30' to sums
This is in support of issues-1959.
2019-01-30 22:53:22 -05:00
Chris Rogers 4732feab41 Fix indicies and substitution logic.
Placeholders are inserted into the text to maintain accurate
indicie relative to the original string.

This is in support of issues-1959.
2019-01-30 22:06:02 -05:00
Chris Rogers 49274493d9 Update convert_words_to_numbers logic.
The logic has been updated for start/end indexs.

This is in support of issues-1959.
2019-01-30 21:18:48 -05:00
Chris Rogers 03a445991c Fix multiplies and extract_numbers_with_text
A small bug caused things like "two hundred twenty" to return only the
"hundred tenty" for the text. This has been fixed.

extract_numbers_with_text was updated to deal with the new return types
of the functions it depends on. Specifically, it accounts for the start
and end index values.

This is in support of issues-1959.
2019-01-30 18:24:18 -05:00
Chris Rogers 71836b61ec Fix decimal and fraction parsing.
This updates the _extract_fraction and _extract_decimal functions to
handle the new token format.
2019-01-30 18:03:25 -05:00
Chris Rogers 48214ca66a Introduce tokens for number parsing.
Replace use of tuples with a dedicated class. This improves clarity by
giving named accessors.

This is in support of issues-1959.
2019-01-30 16:48:59 -05:00
Chris Rogers 9db9b6107b Change approach to number/text replacment.
Previously it was assumed that the orgiginal text would be enough to
determine where in a string a number should go, however, in some
scenarios, that does not work, and results in the wrong values being
parsed.

A different, and smarter approach is being taken now, in which the
original string is initially split into a list of tuples of
(index, word) where index is the index of the word within the string.
All subsequent processing is done on these tuples, meaning we always
know exactly where the words were in the orginal string. This should
make text replacement perfect, as we can always sub out the exact,
correct words, based on their indicies.

extract_number_with_text_en now returns the number parsed, the text that
represents the number, the start index, and the end index.

Things are not yet working perfectly. Here is roughly the current state
of the world:

from mycroft.util.lang.parse_en import *
extract_number_with_text_en("this is some two hundred thousand twenty
two hours")
Out[3]: (200022, 'hundred thousand twenty two', 4, 7)
extract_number_with_text_en("this is some twenty two hours")
Out[4]: (22, 'twenty two', 3, 4)
extract_number_with_text_en("this is some twenty hours")
Out[5]: (20, 'twenty', 3, 3)
extract_number_with_text_en("this is some two and a half hours")
Out[6]: (2, 'two', 3, 3)
extract_number_with_text_en("this is some two point five hours")
Out[7]: (2, 'two', 3, 3)

The list of tuples is a bit of a hassle to deal with. In a future
commite the will be replaced with dictionaries, or even better, Token
objects, that contain the word and it's index. This would make the
code easier to reason about (removing lots things like words[0][1]
which has no meaning without deep understanding of the code).

This is in support of issues-1959.
2019-01-29 22:33:23 -05:00
Chris Rogers 8a5bf49651 Handle lists of summation numbers.
Phrases like "twenty thirty forty" would return
(40, "twenty thrity forty"). This changes that so
(40 "forty") is returned.
2019-01-29 21:25:06 -05:00
Chris Rogers 690df0b1d3 Update entract_numbers_en to use the new functions.
This is in support of issues-1959.
2019-01-29 20:57:16 -05:00
Chris Rogers f4723b1026 Cleanup parse_en.
Rename some functions, and fix docs/pep8 issues.
2019-01-29 20:50:13 -05:00
Chris Rogers 9aa02587b3 Fix an issue with extractnumber_en_with_text
Articles (a, an, the) that appeared immediately before the number were
included in the returned text. This fixes those issues.

The solution isn't super clean - it craetes a new function to wrap the
old one (unavoidable, since the articles can be needed for fractions),
and splits the returned string, the strips leading artivles.

Since strings are immutable, this is probably not super efficient. Might
be better to eventually use lists as much as possible, and only create a
string at the end (though the lists will come with their own problems,
so that could turn out to be a wash). In any case, this works for now.
2019-01-29 20:25:38 -05:00
Chris Rogers 0b8e88a325 Fix docs for _initialize_number_data.
This is in support of issues-1959.
2019-01-29 20:03:41 -05:00
Chris Rogers 9ff7fd5452 Fix issues with extractnumber_en_with_text
Issues fixed:
Lists, e.g. "some words one two three" would return (3, "one two three")
Negaitve words were not included in output, e.g. "negative five" would
return (-5, "five").

This is in support of issues-1959.
2019-01-29 19:53:30 -05:00
Chris Rogers b5ffbcc549 Fix edgecase in extractnumber_en_with_text
Any articles (a, an, the) appearing before the number would be included
in the text, e.g. "set a timer for eight minutes" returned:
(8, 'a eight')

This fix clears the number words list if no number words have been
found. Note that articles can't simple be filtered, as they are
often a part of the numebr (e.g. three and a half).

This is in support of issues-1959.
2019-01-28 17:28:27 -05:00
Chris Rogers 12e5fd603a Implement methods useful for extract_duration
Methods implemented include:

extract_number_with_text
extract_numbers_with_text
convert_words_to_numbers
extract_duration

This is in support of issues-1959. This continues the work of
returning the relevant text that corresponds to a number
parsed from a string.
2019-01-28 17:04:12 -05:00
Chris Rogers 5252e710b5 Prevent the conversion of ints to floats>
This is in support of issues-1959. This continues the work of
returning the relevant text that corresponds to a number
parsed from a string.
2019-01-26 18:06:11 -05:00
Chris Rogers 05046c7390 Add ordinal and fraction support.
This is in support of issues-1959. This continues the work of returning
the relevant text that corresponds to a number parsed from a string.
2019-01-26 17:39:56 -05:00
Chris Rogers 49c344c1d7 Return correct text for decimals and fractions
This is in support of issues-1959. This continues the work of returning
the relevant text that corresponds to a number parsed from a string.
2019-01-26 14:36:01 -05:00
Chris Rogers 5a3d809e68 Begin returning text with parsed numbers
This is in support of issues-1959. This begins the work of returning the
relevant text that corresponds to a number parsed from a string. Here's
a sample showing the basic functionality/state of the world (showing
some of the remaining cases to handle)

>>> from mycroft.util.lang.parse_en import *
>>> extractnumber_en_with_text("three hours twenty minutes")
(3, 'three')
>>> extractnumber_en_with_text("twenty minutes")
(20, 'twenty')
>>> extractnumber_en_with_text("twenty five minutes")
(25, 'twenty five')
>>> extractnumber_en_with_text("two hundred twenty five minutes")
(225, 'two hundred twenty five')
>>> extractnumber_en_with_text("three and a half minutes")
(3.5, 'three and a half minutes')
>>> extractnumber_en_with_text("three point five minutes")
(3.5, 'three point five minutes')

====  Tech Notes ====
Checks if the word being parsed is relevant to number parsing. If it is
not, and we have already found number words, we return what we have. If
it is, we add it to a list of words representing the current number
being parsed.

====  Documentation Notes ====
The old implementation of extractnumber_en seems to generally return the
last number in the text. This change will cause the first number to be
returned.
2019-01-26 14:26:00 -05:00
Chris Rogers 6d9447128b Cleanup unnecessary comments/improve docs.
This is part of a refactor of extractnumber_en, with the ultimate
goal of making it easier to maintain and extend (should also
improve perf).  This is in support of issues-1959.
2019-01-25 21:06:29 -05:00
Chris Rogers 8d588743d0 Extract fraction and decimal methods.
This is part of a refactor of extractnumber_en, with the ultimate
goal of making it easier to maintain and extend (should also
improve perf).  This is in support of issues-1959.

All tests (minus extract_duration, which has not yet been implemented)
are passing at this stage.
2019-01-25 21:02:34 -05:00
Chris Rogers 1a176da6b6 Deal with scale numbers and plurarls appropriately.
This is part of a refactor of extractnumber_en, with the ultimate
goal of making it easier to maintain and extend (should also
improve perf).  This is in support of issues-1959.

All tests (minus extract_duration, which has not yet been implemented)
are passing at this stage.
2019-01-25 20:44:59 -05:00