Commit Graph

1100 Commits (master)

Author SHA1 Message Date
Jonathan A. Sternberg 78a32cba0e Queries for `bottom()` with no tags got messed up while changing the implementation
It didn't properly pass the variable reference when creating the
variable iterator so a null iterator got passed back instead.

Duplicate the `top()` tests in TopInt to also test `bottom()` with the
same queries so `bottom()` stops getting neglected so often.
2017-05-30 11:28:41 -05:00
Stuart Carnie 47f97ea134 use parsed measurement and models.Tags 2017-05-26 13:21:59 -07:00
Joe LeGasse 815f740f4c initial fga work
wip

wip

fix tests / build
2017-05-26 13:16:27 -07:00
Jonathan A. Sternberg 9edf236cc8 Maintain the tags of points selected by top() or bottom() when writing the results
When a `SELECT ... INTO ...` is used with `top()` or `bottom()` used
with tags, the points will be written with the tags still intact instead
of converted to fields.
2017-05-23 15:00:21 -05:00
Jonathan A. Sternberg 7b9b55bfc0 Optimize top() and bottom() using an incremental aggregator
The previous version of `top()` and `bottom()` would gather all of the
points to use in a slice, filter them (if necessary), then use a
slightly modified heap sort to retrieve the top or bottom values.

This performed horrendously from the standpoint of memory. Since it
consumed so much memory and spent so much time in allocations (along
with sorting a potentially very large slice), this affected speed too.

These calls have now been modified so they keep the top or bottom points
in a min or max heap. For `top()`, a new point will read the minimum
value from the heap. If the new point is greater than the minimum point,
it will replace the minimum point and fix the heap with the new value.
If the new point is smaller, it discards that point. For `bottom()`, the
process is the opposite.

It will then sort the final result to ensure the correct ordering of the
selected points.

When `top()` or `bottom()` contain a tag to select, they have now been
modified so this query:

    SELECT top(value, host, 2) FROM cpu

Essentially becomes this query:

    SELECT top(value, 2), host FROM (
        SELECT max(value) FROM cpu GROUP BY host
    )

This should drastically increase the performance of all `top()` and
`bottom()` queries.
2017-05-19 11:56:46 -05:00
Jonathan A. Sternberg 7d043dbc61 Add nanosecond duration literal support 2017-05-19 10:44:11 -05:00
rw-influxdata 67279ccc64 Fix AST rewriting panic due to a nil Condition. 2017-05-04 14:51:53 -07:00
Jonathan A. Sternberg df30a4d9c9 Refactor the subquery code and fix outer condition queries
This change refactors the subquery code into a separate builder class to
help allow for more reuse and make the functions smaller and easier to
read.

The previous function that handled most of the code was too big and
impossible to reason through.

This also goes and replaces the complicated logic of aggregates that had
a subquery source with the simpler IteratorMapper. I think the overhead
from the IteratorMapper will be more, but I also believe that the actual
code is simpler and more robust to produce more accurate answers. It
might be a future project to optimize that section of code, but I don't
have any actual numbers for the efficiency of one method and I believe
accuracy and code clarity may be more important at the moment since I am
otherwise incapable of reading my own code.
2017-04-28 17:12:32 -05:00
Jonathan A. Sternberg addc12561f Fix LIMIT and OFFSET for certain aggregate queries
When LIMIT and OFFSET were used with any functions that were not handled
directly by the query engine (anything other than count, max, min, mean,
first, or last), the input to the function would be limited instead of
receiving the full stream of values it was supposed to receive.

This also fixes a bug that caused the server to hang when LIMIT and
OFFSET were used with a selector. When using a selector, the limit and
offset should be handled before the points go to the auxiliary iterator
to be split into different iterators. Limiting happened afterwards which
caused the auxiliary iterator to hang forever.
2017-04-28 15:55:06 -05:00
Ben Johnson 3a46e5dd9e
Remove default upper time bound for DELETE queries. 2017-04-28 12:26:26 -06:00
Jonathan A. Sternberg be3bce5212 top() and bottom() now returns the time for every point
`top()` and `bottom()` will now organize the points by time and also
keep the points original time even when a time grouping is used. At the
same time, `top()` and `bottom()` will no longer honor any fill options
that are present since they don't really make sense for these specific
functions.

This also fixes the aggregate and selectors to honor the ordered
iterator option so iterator remain ordered and to also respect the
buckets that are created by the final dimensions of the query so that
two buckets don't overlap each other within the same reducer. A test has
been added for this situation. This should clarify and encourage the use
of the ordered attribute within the query engine.
2017-04-26 15:07:10 -05:00
Jonathan A. Sternberg 57a2abbc87 Restrict top() and bottom() selectors to be used with no other functions 2017-04-14 10:23:07 -05:00
Cory LaNou 9060a2a5ff
Add HasDefaultDatabase interface to several statements 2017-04-12 13:41:28 -05:00
Jonathan A. Sternberg d653b76b5b Merge pull request #8282 from influxdata/js-8281-influxql-select-tests
Fix influxql select tests
2017-04-11 15:05:08 -05:00
Jonathan A. Sternberg c6e1b83906 Fix influxql select tests
The inputs are now sent to the tested iterator in the correct order so
we can more accurately test each individual select statement.
2017-04-10 20:51:14 -05:00
Jonathan A. Sternberg a550d323c4 Restrict fill(none) and fill(linear) to be usable only with aggregate queries 2017-04-10 15:58:05 -05:00
Jonathan A. Sternberg 0a5e4bd92b Implicitly cast null to false in binary expressions with a boolean
Also more consistently treat a binary expression with strings so it
produces the same value no matter the direction of the expression.
2017-04-06 12:26:04 -05:00
Jason Wilder 8da84e6144 Merge branch 'master' into tsi 2017-04-03 11:21:02 -06:00
Tom Young d2fd3f50aa Add bitwise AND, OR and XOR operators to InfluxQL. 2017-03-31 21:02:02 +01:00
Jonathan A. Sternberg 211e7ea65d Merge pull request #8234 from influxdata/js-8230-fix-window-computation-overflow
Prevent overflowing or underflowing during window computation
2017-03-31 11:09:30 -05:00
zhexuany 232fdae6dd introduce a new function non_negative_difference 2017-03-31 23:08:36 +08:00
Jonathan A. Sternberg 64fb1db5f5 Prevent overflowing or underflowing during window computation
The Window function will now check before it adjusts the offset whether
it is going to overflow or underflow. If it is going to do either, it
sets the start or end time to MinTime or MaxTime.
2017-03-30 16:35:22 -05:00
Jonathan A. Sternberg 2ea805c928 Interpolate between different intervals to find the whole area under the curve 2017-03-30 12:51:52 -05:00
Tom Young cac94a1fc7 Add "integral" function to InfluxQL 2017-03-30 12:07:26 -05:00
Edd Robinson fddaff2cc8 Merge master in 2017-03-29 18:00:28 +01:00
Jonathan A. Sternberg 7e0ed1f5e5 Ensure the input for certain functions in the query engine are ordered
The following functions require ordered input but were not guaranteed to
received ordered input:

* `distinct()`
* `sample()`
* `holt_winters()`
* `holt_winters_with_fit()`
* `derivative()`
* `non_negative_derivative()`
* `difference()`
* `moving_average()`
* `elapsed()`
* `cumulative_sum()`
* `top()`
* `bottom()`

These function calls have now been modified to request that their input
be ordered by the query engine. This will prevent the improper output
that could have been caused by multiple series being merged together or
multiple shards being merged together potentially incorrectly when no
time grouping was specified.

Two additional functions were already correct to begin with (so there
are no bugs with these two, but I'm including their names for
completeness).

* `median()`
* `percentile()`
2017-03-28 13:55:37 -05:00
Jonathan A. Sternberg 24109468c3 Merge pull request #8168 from influxdata/js-8167-math-with-multiple-selectors
Fix a regression when math was used with selectors
2017-03-28 13:31:57 -05:00
Jonathan A. Sternberg 3e52ec7ca2 Merge pull request #7762 from influxdata/js-6541-timezone-support
Support timezone offsets for queries
2017-03-28 10:39:07 -05:00
Jonathan A. Sternberg b14c292cba Fix a regression when math was used with selectors
If there were multiple selectors and math, the query engine would
mistakenly think it was the only selector in the query and would not
match their timestamps.

Fixed the query engine to pass whether the selector should be treated as
a selector so queries like `max(value) * 1, min(value) * 1` will match
the timestamps of the result.
2017-03-27 14:12:15 -05:00
Jonathan A. Sternberg ccf0cb8371 Fix query parser when using addition and subtraction without spaces
Additionally, support unary addition and subtraction for variables,
calls, and parenthesis expressions. Doing `-value` will be the
equivalent of doing `-1 * value` now.
2017-03-24 12:52:19 -05:00
Jonathan A. Sternberg 347b01814e Support timezone offsets for queries
The timezone for a query can now be added to the end with something like
`TZ("America/Los_Angeles")` and it will localize the results of the
query to be in that timezone. The offset will automatically be set to
the offset for that timezone and offsets will automatically adjust for
daylight savings time so grouping by a day will result in a 25 hour day
once a year and a 23 hour day another day of the year.

The automatic adjustment of intervals for timezone offsets changing will
only happen if the group by period is greater than the timezone offset
would be. That means grouping by an hour or less will not be affected by
daylight savings time, but a 2 hour or 1 day interval will be.

The default timezone is UTC and existing queries are unaffected by this
change.

When times are returned as strings (when `epoch=1` is not used), the
results will be returned using the requested timezone format in RFC3339
format.
2017-03-22 15:09:41 -05:00
Jonathan A. Sternberg 33981277bc Fix the time range when an exact timestamp is selected
There is a lot of confusion in the code if the range is [start, end) or
[start, end]. This is not made easier because it is acts one way in some
areas and in another way in some other areas, but it is usually [start,
end]. The `time = ?` syntax assumed that it was [start, end) and added
an extra nanosecond to the end time to accomodate for that, but the
range was actually [start, end] and that caused it to include one extra
nanosecond when it shouldn't have.

This change fixes it so exactly one timestamp is selected when `time = ?`
is used.
2017-03-21 14:55:31 -05:00
Jonathan A. Sternberg a6c09e58a0 Return an error when an invalid duration literal is parsed 2017-03-21 12:10:41 -05:00
Jason Wilder 8f7b251afd Merge branch 'master' into jw-tsi 2017-03-20 17:17:26 -06:00
Jason Wilder 86ad0a45b6 Ensure iterators are closed when query is killed
The underlying iterators were not closed when a query was kill so
although the client would receive an error, the query would continue
on until completion.
2017-03-17 16:00:39 -06:00
Jonathan A. Sternberg 41c8370bbc Fix fill(linear) when multiple series exist and there are null values
When there were multiple series and anything other than the last series
had any null values, the series would start using the first point from
the next series to interpolate points.

Interpolation should not cross between series. Now, the linear fill
checks to make sure the next point is within the same series before
using it to perform interpolation.
2017-03-16 15:54:20 -05:00
Jonathan A. Sternberg 5072db40c2 Forbid wildcards in binary expressions
When rewriting fields, wildcards within binary expressions were skipped.
This now throws an error whenever it finds a wildcard within a binary
expression in order to prevent the panic that occurs.
2017-03-16 14:26:10 -05:00
Jonathan A. Sternberg 208d8507f1 Implement both single and multiline comments in influxql
A single line comment will read until the end of a line and is started
with `--` (just like SQL). A multiline comment is with `/* */`. You
cannot nest multiline comments.
2017-03-15 14:24:09 -05:00
Ben Johnson 358b1e0b05
Merge remote-tracking branch 'upstream/master' into tsi 2017-03-15 10:13:32 -06:00
Jason Wilder b9e5375043 Merge branch '1.2' into jw-merge-12 2017-03-08 13:16:50 -07:00
Jonathan A. Sternberg 83cf8893e1 Include IsRawQuery in the rewritten statement for meta queries 2017-03-06 14:46:33 -06:00
Jason Wilder 675d7c9d65 Merge branch '1.2' into jw-merge12 2017-03-06 11:09:05 -07:00
Jonathan A. Sternberg c5970b59b4 Map types correctly when selecting a field with multiple measurements where one of the measurements is empty 2017-03-01 11:47:26 -06:00
Jonathan A. Sternberg 1fb34e3eef Dividing aggregate functions with different outputs doesn't panic 2017-02-23 18:38:29 -06:00
Jonathan A. Sternberg 72e4dd01b9 Properly select a tag within a subquery
Previously, subqueries would only be able to select tag fields within a
subquery if the subquery used a selector. But, it didn't filter out
aggregates like `mean()` so it would panic instead.

Now it is possible to select the tag directly instead of rewriting the
query in an invalid way.

Some queries in this form will still not work though. For example, the
following still does not function (but also doesn't panic):

    SELECT count(host) FROM (SELECT mean, host FROM (SELECT mean(value) FROM cpu GROUP BY host))
2017-02-23 11:16:22 -06:00
Jonathan A. Sternberg 5a2b458180 Reduce the expression in a subquery to avoid a panic
The builder used for subqueries does not handle parenthesis, but a set
of parenthesis wrapping a field would cause it to panic. This code now
reduces the expression so the parenthesis are removed before being
processed.
2017-02-23 10:14:05 -06:00
Mark Rushakoff 601cbcd084 Merge branch '1.2' into mr-merge-12 2017-02-17 16:14:22 -08:00
Jonathan A. Sternberg 2fe48d6781 Rename zap import back to github.com/uber-go/zap
They rebased a revision we were previously relying upon that allowed us
to use the vanity name so we are reverting back to an older version with
the old import path.
2017-02-17 17:17:22 -06:00
Jonathan A. Sternberg 71f62d33e6 Map types correctly when using a regex and one of the measurements is empty 2017-02-13 18:14:29 -06:00
Mark Rushakoff c762ab49ee Merge pull request #7974 from influxdata/mr-4785-show-databases
Allow non-admin users to execute SHOW DATABASES
2017-02-13 15:04:00 -08:00
Jason Wilder f45a58937c Merge pull request #7998 from influxdata/jw-merge-12
Merge 1.2.1-rc3 to master
2017-02-13 14:24:17 -07:00
Jason Wilder c3de210ded Merge branch '1.2' into jw-merge-12 2017-02-13 11:45:27 -07:00
Mark Rushakoff 53699aa24f Allow non-admin users to execute SHOW DATABASES
This commit introduces a new interface type, influxql.Authorizer, that
is passed as part of a statement's execution context and determines
whether the context is permitted to access a given database. In the
future, the Authorizer interface may be expanded to other resources
besides databases. In this commit, the Authorizer interface is
specifically used to determine which databases are returned when
executing SHOW DATABASES.

When HTTP authentication is enabled, the existing meta.UserInfo struct
implements Authorizer, meaning admin users can SHOW every database, and
non-admin users can SHOW only databases for which they have read and/or
write permission.

When HTTP authentication is disabled, all databases are visible through
SHOW DATABASES.

This addresses a long-standing issue where Chronograf or Grafana would
be unable to list databases if the logged-in user did not have admin
privileges.

Fixes #4785.
2017-02-13 08:59:16 -08:00
Jonathan A. Sternberg 55e64e1edd Fixed String() output for MOD operator and added MOD op precedence 2017-02-10 16:48:05 -06:00
Jason Wilder 8d0f2c3ca9 Merge pull request #7983 from influxdata/jw-parallel-buffers
Increase parallel iterator buffers to improve group by query speed
2017-02-10 11:22:59 -07:00
Jason Wilder 5e42ac411a Increase buffer to improve group by query speed 2017-02-10 11:07:49 -07:00
Jonathan A. Sternberg a0d8c1ca9f Add modulo operator to the query language 2017-02-10 10:16:37 -06:00
Jonathan A. Sternberg 2ad1668c2a Prevent a panic when aggregates are used in an inner query with a raw query
The following types of queries will panic:

    SELECT mean, host FROM (SELECT mean(value) FROM cpu GROUP BY host)
    SELECT top(sum, host, 3) FROM (SELECT sum(value) FROM cpu GROUP BY host)

These queries _should_ work, but due to a current limitation with
aggregate functions, the aggregate functions won't return any auxiliary
fields. So even if a tag is not an auxiliary field, it is treated that
way by the query engine and this query will fail.

Fixing this properly will take a longer period of time. This fix just
prevents the panic from killing the server while we fix this for real.
2017-02-08 11:44:56 -06:00
Jason Wilder 1bc0f68490 Merge branch '1.2' into jw-merge-12 2017-02-07 12:48:36 -07:00
Jonathan A. Sternberg e1fa48d0dd Fix ORDER BY time DESC with ordering series keys
The order of series keys is in ascending alphabetical order, not
descending alphabetical order, when it is ordered by descending time.
This fixes the ordering so points are returned in descending order. The
emitter also had the conditions for choosing which iterator to use in
the wrong direction (which only affects aggregates with `FILL(none)`).
2017-02-06 15:49:12 -06:00
Jonathan A. Sternberg caaad60dcf Fix authentication when subqueries are present
The code that checked if a query was authorized did not account for
sources that were subqueries. Now, the check for the required privileges
will descend into the subquery and add the subqueries required
privileges to the list of required privileges for the entire query.
2017-02-06 09:43:14 -06:00
Jason Wilder 2e95b4043c Merge branch '1.2' into jw-merge-12 2017-02-02 16:40:36 -07:00
Jonathan A. Sternberg e49ba016fa Fix incorrect math when aggregates that emit different times are used
When using `non_negative_derivative()` and `last()` in a math aggregate
with each other, the math would not be matched with each other because
one of those aggregates would emit one fewer point than the others. The
math iterators have been modified so they now track the name and tags of
a point and match based on those.

This isn't necessarily ideal and may come to bite us in the future. We
don't necessarily have a defined structure for all iterators so it can
be difficult to know which of two points is supposed to come first in
the ordering. This uses the common ordering that usually makes sense,
but the query engine is getting complicated enough where I am not 100%
certain that this is correct in all circumstances.
2017-02-02 14:40:41 -06:00
Joe LeGasse dd9278a098 regex: don't use exact match for case insensitive expression
Fixes #7906

In an attempt to reduce the overhead of using regex for exact matches,
the query parser will replace `=~ /^thing$/` with `== 'thing'`, but the
conditions being checked would ignore if any flags were set on the
expression, so `=~ /(?i)^THING$/` was replaced with `== 'THING'`, which
will fail unless the case was already exact. This change ensures that no
flags have been changed from those defaulted by the parser.
2017-02-02 10:49:12 -05:00
Joe LeGasse 93d18d42a6 regex: don't use exact match for case insensitive expression
Fixes #7906

In an attempt to reduce the overhead of using regex for exact matches,
the query parser will replace `=~ /^thing$/` with `== 'thing'`, but the
conditions being checked would ignore if any flags were set on the
expression, so `=~ /(?i)^THING$/` was replaced with `== 'THING'`, which
will fail unless the case was already exact. This change ensures that no
flags have been changed from those defaulted by the parser.
2017-02-02 10:25:08 -05:00
Cory LaNou 8e14f0173b Merge pull request #7924 from influxdata/cjl-7880-elapsed-panic
fix panic in query execution
2017-02-01 12:03:39 -06:00
Cory LaNou e3e319a176 fix panic in query execution 2017-02-01 10:51:26 -06:00
Jonathan A. Sternberg e060fd0aa3 Fix EvalType when a parenthesis expression is used
It did not descend into the expression within the parenthesis correctly
and would just recurse infinitely on itself instead.
2017-01-31 10:35:21 -06:00
Jonathan A. Sternberg e8719c90ab Fix EvalType when a parenthesis expression is used
It did not descend into the expression within the parenthesis correctly
and would just recurse infinitely on itself instead.
2017-01-31 10:19:43 -06:00
Paul Dix a801c9dea6 Merge pull request #7889 from influxdata/js-subquery-fixes
Cherry-pick 1.2 fixes for subqueries into master
2017-01-26 10:49:37 -05:00
Edd Robinson 91ee34b111 Merge pull request #7837 from influxdata/er-tidy
General tidy up and subtle bug fixes
2017-01-26 13:43:07 +00:00
Jonathan A. Sternberg ce54856e3d Expand query dimensions from the subquery
During development, I, at some point, decided that the dimensions should
be expanded based on what was available rather than what was present in
the subquery. I don't really know the rationale for this because I
forgot, but it doesn't make sense or seem to be particularly useful.

Expanding dimensions now just uses the values specified in the subquery
rather than expanding to all available dimensions of the measurement in
the subquery.
2017-01-25 16:33:03 -06:00
Jonathan A. Sternberg 92c5d336b4 Expand query dimensions from the subquery
During development, I, at some point, decided that the dimensions should
be expanded based on what was available rather than what was present in
the subquery. I don't really know the rationale for this because I
forgot, but it doesn't make sense or seem to be particularly useful.

Expanding dimensions now just uses the values specified in the subquery
rather than expanding to all available dimensions of the measurement in
the subquery.
2017-01-25 16:02:37 -06:00
Ben Johnson 047c21f4d9
Merge remote-tracking branch 'upstream/master' into tsi 2017-01-24 09:28:58 -07:00
Jonathan A. Sternberg 83c6d53294 Support the WHERE clause in outer queries with subqueries 2017-01-23 15:01:32 -06:00
Jonathan A. Sternberg 3d4d9062a0 Update subqueries so groupings are propagated to inner queries
Previously, only time expressions got propagated inwards. The reason for
this was simple. If the outer query was going to filter to a specific
time range, then it would be unnecessary for the inner query to output
points within that time frame. It started as an optimization, but became
a feature because there was no reason to have the user repeat the same
time clause for the inner query as the outer query. So we allowed an
aggregate query with an interval to pass validation in the subquery if
the outer query had a time range. But `GROUP BY` clauses were not
propagated because that same logic didn't apply to them. It's not an
optimization there. So while grouping by a tag in the outer query
without grouping by it in the inner query was useless, there wasn't any
particular reason to care.

Then a bug was found where wildcards would propagate the dimensions
correctly, but the outer query containing a group by with the inner
query omitting it wouldn't correctly filter out the outer group by. We
could fix that filtering, but on further review, I had been seeing
people make that same mistake a lot. People seem to just believe that
the grouping should be propagated inwards. Instead of trying to fight
what the user wanted and explicitly erase groupings that weren't
propagated manually, we might as well just propagate them for the user
to make their lives easier. There is no useful situation where you would
want to group into buckets that can't physically exist so we might as
well do _something_ useful.

This will also now propagate time intervals to inner queries since the
same applies there. But, while the interval propagates, the following
query will not pass validation since it is still not possible to use a
grouping interval with a raw query (even if the inner query is an
aggregate):

    SELECT * FROM (SELECT mean(value) FROM cpu) WHERE time > now() - 5m GROUP BY time(1m)

This also means wildcards will behave a bit differently. They will
retrieve dimensions from the sources in the inner query rather than just
using the dimensions in the group by.

Fixing top() and bottom() to return the correct auxiliary fields.
Unfortunately, we were not copying the buffer with the auxiliary fields
so those values would be overwritten by a later point.
2017-01-23 15:01:19 -06:00
Jonathan A. Sternberg 6cd5b690d1 Support the WHERE clause in outer queries with subqueries 2017-01-23 14:49:04 -06:00
Jonathan A. Sternberg f628b4a198 Update subqueries so groupings are propagated to inner queries
Previously, only time expressions got propagated inwards. The reason for
this was simple. If the outer query was going to filter to a specific
time range, then it would be unnecessary for the inner query to output
points within that time frame. It started as an optimization, but became
a feature because there was no reason to have the user repeat the same
time clause for the inner query as the outer query. So we allowed an
aggregate query with an interval to pass validation in the subquery if
the outer query had a time range. But `GROUP BY` clauses were not
propagated because that same logic didn't apply to them. It's not an
optimization there. So while grouping by a tag in the outer query
without grouping by it in the inner query was useless, there wasn't any
particular reason to care.

Then a bug was found where wildcards would propagate the dimensions
correctly, but the outer query containing a group by with the inner
query omitting it wouldn't correctly filter out the outer group by. We
could fix that filtering, but on further review, I had been seeing
people make that same mistake a lot. People seem to just believe that
the grouping should be propagated inwards. Instead of trying to fight
what the user wanted and explicitly erase groupings that weren't
propagated manually, we might as well just propagate them for the user
to make their lives easier. There is no useful situation where you would
want to group into buckets that can't physically exist so we might as
well do _something_ useful.

This will also now propagate time intervals to inner queries since the
same applies there. But, while the interval propagates, the following
query will not pass validation since it is still not possible to use a
grouping interval with a raw query (even if the inner query is an
aggregate):

    SELECT * FROM (SELECT mean(value) FROM cpu) WHERE time > now() - 5m GROUP BY time(1m)

This also means wildcards will behave a bit differently. They will
retrieve dimensions from the sources in the inner query rather than just
using the dimensions in the group by.

Fixing top() and bottom() to return the correct auxiliary fields.
Unfortunately, we were not copying the buffer with the auxiliary fields
so those values would be overwritten by a later point.
2017-01-23 12:38:10 -06:00
Edd Robinson 7374e48999 Remove dead code from influxql 2017-01-17 09:47:34 -08:00
Jonathan A. Sternberg 3ba950b029 Fix for subqueries to use the parallel iterator correctly
Also, fix the `Iterators.Merge(IteratorOptions)` function so it consults
the `Ordered` attribute to determine which iterator it should use to
merge the input iterators.
2017-01-11 10:47:18 -06:00
Jonathan A. Sternberg d7c8c7ca4f Support subquery execution in the query language
This adds query syntax support for subqueries and adds support to the
query engine to execute queries on subqueries.

Subqueries act as a source for another query. It is the equivalent of
writing the results of a query to a temporary database, executing
a query on that temporary database, and then deleting the database
(except this is all performed in-memory).

The syntax is like this:

    SELECT sum(derivative) FROM (SELECT derivative(mean(value)) FROM cpu GROUP BY *)

This will execute derivative and then sum the result of those derivatives.
Another example:

    SELECT max(min) FROM (SELECT min(value) FROM cpu GROUP BY host)

This would let you find the maximum minimum value of each host.

There is complete freedom to mix subqueries with auxiliary fields. The only
caveat is that the following two queries:

    SELECT mean(value) FROM cpu
    SELECT mean(value) FROM (SELECT value FROM cpu)

Have different performance characteristics. The first will calculate
`mean(value)` at the shard level and will be faster, especially when it comes to
clustered setups. The second will process the mean at the top level and will not
include that optimization.
2017-01-07 13:00:48 -06:00
Ben Johnson 409b0165f5
shared in-memory index 2017-01-05 10:09:57 -07:00
Mark Rushakoff 6a94d200c8 Merge remote-tracking branch 'influx/master' into mr-godoc 2017-01-04 13:27:36 -08:00
Mark Rushakoff 61b5a15227 Prefer built-in string -> []rune conversion 2017-01-03 14:45:33 -08:00
Mark Rushakoff 88b8bd2465 Update godoc for package influxql
I did not look at any of the .gen.go files.
2016-12-30 18:02:52 -08:00
Gustav Westling 26b33307ae
Resolved PR comments on test files 2016-12-30 11:42:38 +01:00
Gustav Westling 56d98325da
Removed ineffective assignments, and added checks for errors that previsouly was not checked 2016-12-29 20:26:15 +01:00
Cory LaNou 572da8985c enforce minimum shard duration when creating retention policies 2016-12-20 09:11:43 -06:00
Mark Rushakoff a29781286b Use local RNG in SampleReducer
The reducers already had a local RNG but mistakenly did not use it when
sampling points.

Because the local RNG is not protected by a mutex, there is a slight
speedup as a result of this change:

benchmark                          old ns/op     new ns/op     delta
BenchmarkSampleIterator_1k-4       418           418           +0.00%
BenchmarkSampleIterator_100k-4     434           422           -2.76%
BenchmarkSampleIterator_1M-4       449           439           -2.23%

benchmark                          old allocs     new allocs     delta
BenchmarkSampleIterator_1k-4       3              3              +0.00%
BenchmarkSampleIterator_100k-4     3              3              +0.00%
BenchmarkSampleIterator_1M-4       3              3              +0.00%

benchmark                          old bytes     new bytes     delta
BenchmarkSampleIterator_1k-4       304           304           +0.00%
BenchmarkSampleIterator_100k-4     304           304           +0.00%
BenchmarkSampleIterator_1M-4       304           304           +0.00%

The speedup would presumably increase when multiple sample iterators are
used concurrently.
2016-12-15 12:33:19 -08:00
Jonathan A. Sternberg ec57108520 Use proper uber-go/zap import path
It looks like the real import path to the project is go.uber.org/zap
instead of github.com/uber-go/zap since the example in the project
references that path.
2016-12-15 08:54:14 -06:00
Jonathan A. Sternberg 21502a39e8 Switch logging to use structured logging everywhere
The logging library has been switched to use uber-go/zap. While the
logging has been changed to use structured logging, this commit does not
change any of the logging statements to take advantage of the new
structured log or new log levels. Those changes will come in future
commits.
2016-12-14 10:45:15 -06:00
Jonathan A. Sternberg bffc759cf9 Return the time from a percentile call on an integer
`percentile()` is supposed to be a selector and return the time of the
point, but that only got changed when the input was a float. Updating
the integer processor to also return the time of the point rather than
the beginning of the interval.
2016-12-01 12:34:48 -06:00
Jonathan A. Sternberg e0c1908683 Merge pull request #7644 from influxdata/js-fix-empty-variable-serialization
Quote the empty string as an ident
2016-11-29 12:16:35 -06:00
Jonathan A. Sternberg b4db76cee2 Introduce syntax for marking a partial response with chunking
The `partial` tag has been added to the JSON response of a series and
the result so that a client knows when more of the series or result will
be sent in a future JSON chunk.

This helps interactive clients who don't want to wait for all of the
data to know if it is done processing the current series or the current
result. Previously, the client had to guess if the next chunk would
refer to the same result or a new result and it had to match the name
and tags of the two series to know if they were the same series. Now,
the client just needs to check the `partial` field included with the
response to know if it should expect more.

Fixed `max-row-limit` so it counts rows instead of results and it
truncates the response when the `max-row-limit` is reached.
2016-11-22 11:16:22 -06:00
Jonathan A. Sternberg c957bf7f99 Quote the empty string as an ident
Without this quoting, the function `max("")` turns into `max()` and will
not be reparsed correctly.
2016-11-18 16:25:39 -06:00
Jonathan A. Sternberg e885fe5117 Expand string and boolean fields when using a wildcard with sample() 2016-11-15 15:56:47 -06:00
Jonathan A. Sternberg 64c2d704da Avoid deadlock when max-row-limit is hit
When the `max-row-limit` was hit, the goroutine reading from the results
channel would stop reading from the channel, but it didn't signal to the
sender that it was no longer reading from the results. This caused the
sender to continue trying to send results even though nobody would ever
read it and this created a deadlock.

Include an `AbortCh` on the `ExecutionContext` that will signal when
results are no longer desired so the sender can abort instead of
deadlocking.
2016-11-08 13:12:28 -06:00
Tom Young 24fa1ac1c0 Remove old function which is no longer used. 2016-11-06 13:38:59 +00:00
Jonathan A. Sternberg 1b2fa645ee Fix incorrect grouping when multiple aggregates are used with sparse data
When a query would use a grouping with two different aggregates, it was
possible for one of the aggregates to return a value from a different
series key than the second aggregate. When these series keys didn't
match, the returned grouping would be screwed up because it sorted by
time before checking for name and tags.

This did not happen when the aggregates returned values for the same
series keys because then the iterators were aligned with each other.
2016-11-02 13:35:22 -05:00
Jonathan A. Sternberg 83e998fbed Support the ON syntax in SHOW TAG VALUES
The parser was updated previously in #7295 and the functionality was
supposed to be there, but the wiring in the query engine for that to
happen was never written.
2016-11-01 15:54:45 -05:00
Jonathan A. Sternberg ce1831160d Fix output duration units for SHOW QUERIES
The previous version was showing the microseconds unit when it was
outputting nanoseconds. Now we correctly identify which sub-second unit
to use (milliseconds, microseconds, or nanoseconds) and use the correct
unit while dividing the duration unit correctly to produce the correct
output.

Also updated to use the default duration string instead of our own
custom formatters. It turns out that the string method for
`time.Duration` does the correct thing as long as we truncate the value
first.
2016-10-31 12:48:01 -05:00
Jason Wilder 0b6f5441b9 Add config option to messages when limits exceeded
When a limit is exceeded, we return errors and sometimes log (if appropriate)
that a limit was exceeded.  The messages don't always provide an indication
as to where or how they are configured.

Instead, return the config option (easily searchable for) as well as the limit
currently set and the value that exceeded it when possible.
2016-10-28 14:54:45 -06:00
Jason Wilder af72d9b0e4 Merge pull request #7515 from influxdata/jw-7053
Return parse error from delete/drop when db or rp is specified
2016-10-25 12:05:56 -06:00
Jason Wilder c68b7a192f Return parse error from delete/drop when db or rp is specified
The delete and drop statements apply to the measurement within a db.
The parser allowed a db or rp to be specified and these values were
silently ignored.  This could cause data loss as someone would think
they are only deleting the series within a rp, but they are actually
deleting all their data.

Instead, we return a parse error if a db or rp is specified in the
delete or drop statements.  Ideally, we'd be able to respect the
db and rp, but that requires significant work in the query engine
and tsdb store to make that work.

Fixes #7053
2016-10-25 11:43:15 -06:00
Edd Robinson b12b0d12fb Add regex benchmarks and fix existing approach 2016-10-25 11:10:03 +01:00
Edd Robinson 06d1226b9a Rewrite exact match regexes to use tsdb index
This commit adds support for replacing regexes with non-regex conditions
when possible. Currently the following regexes are supported:

 - host =~ /^foo$/ will be converted into host = 'foo'
 - host !~ /^foo$/ will be converted into host != 'foo'

Note: if the regex expression contains character classes, grouping,
repetition or similar, it may not be rewritten.

For example, the condition: name =~ /^foo|bar$/ will not be rewritten.
Support for this may arrive in the future.

Regexes that can be converted into simpler expression will be able to
take advantage of the tsdb index, making them significantly faster.
2016-10-25 11:10:03 +01:00
Jonathan A. Sternberg 19a61dbb44 Align binary math expression streams by time
Also fills in missing values using the fill expression for any binary
aggregation.
2016-10-18 13:31:13 -05:00
Mark Rushakoff 0ddb7ad842 Disallow derivative call with non-duration 2nd arg
Previously, calling derivative with a non-duration second argument was
allowed during parsing but would panic during execution due to a failed
type conversion. This change ensures the second argument is a duration
literal.
2016-10-17 16:20:53 -07:00
Jonathan A. Sternberg 3496c5b85f Merge pull request #7442 from influxdata/js-5955-make-regex-work-on-field-keys-in-select
Support using regexes to select fields and dimensions
2016-10-17 11:37:47 -05:00
Jonathan A. Sternberg b60b4b371e Support using regexes to select fields and dimensions
The functionality works the same as wildcards, but this time, you can
specify a regular expression.

One limitation is that you can't specify whether you only want to select
fields or tags. Since the regex can be changed to suit the person's
needs, I don't currently think this is an issue.
2016-10-13 22:17:14 -05:00
Jonathan A. Sternberg 95859b8ab4 Remove accidentally added string support for the stddev call
Strings would always return an empty string and stddev is meaningless
when it comes to strings. This removes that functionality so strings
don't automatically get picked up when using a wildcard.
2016-10-10 14:58:28 -05:00
Jonathan A. Sternberg 6afc2a77a5 Implement cumulative_sum() function
The `cumulative_sum()` function can be used to sum each new point and
output the current total. For the following points:

    cpu value=2 0
    cpu value=4 10
    cpu value=6 20

This would output the following points:

    > SELECT cumulative_sum(value) FROM cpu
    time    value
    ----    -----
    0       2
    10      6
    20      12

As can be seen, each new point adds to the sum of the previous point and
outputs the value with the same timestamp.

The function can also be used with an aggregate like `derivative()`.

    > SELECT cumulative_sum(mean(value) FROM cpu WHERE time >= now() - 10m GROUP BY time(1m)
2016-10-07 10:11:53 -05:00
Michael Desa f9b8129770 Add sample function to query language
First Pass at implementing sample

Add sample iterators for all types

Remove size from sample struct

Fix off by one error when generating random number

Add benchmarks for sample iterator

Add test and associated fixes for off by one error

Add test for sample function

Remove NumericLiteral from sample function call

Make clear that the counter is incr w/ each call

Rename IsRandom to AllSamplesSeen

Add a rng for each reducer that is created

The default rng that comes with math/rand has a global lock. To avoid
having to worry about any contention on the lock, each reducer now has
its own time seeded rng.

Add sample function to changelog
2016-10-06 09:41:42 -07:00
Michael Desa 966e5503bf Add fill(linear) to query language
Clean up template for fill average

Change fill(average) to fill(linear)

Update average to linear in infuxql spec

Add Integer Tests and associated fixes

Update CHANGELOG for fill(linear)
2016-10-04 14:27:04 -07:00
Jason Wilder a3fd12198e Avoid extra allocations when evalating binary expressions 2016-09-29 13:18:38 -06:00
Jonathan A. Sternberg 3afdf3cd94 Merge tag 'v1.0.1' 2016-09-27 17:53:33 -05:00
Jonathan A. Sternberg dbc4a9150f Prevent manual use of system queries
Manual use of system queries could result in a user using the query
incorrect. Rather than check to make sure the query was used correctly,
we're just going to prevent users from using those sources so they can't
use them incorrectly.
2016-09-23 10:00:18 -05:00
Cory LaNou acbf193640
add test to prevent future parsing regressions for time durations 2016-09-16 11:44:05 -05:00
Jason Wilder a6d3e46893 Fix panic when parsing ms durations 2016-09-16 08:47:18 -06:00
Jonathan A. Sternberg 635ce337f0 Merge pull request #7304 from influxdata/js-remove-substatement-method
Remove defunct `Substatement()` call
2016-09-15 08:32:40 -05:00
Jonathan A. Sternberg c11cbc5f05 Merge pull request #7309 from influxdata/js-go-vet-for-1.7
Update source files to pass vet checks for go 1.7
2016-09-15 08:32:30 -05:00
Jonathan A. Sternberg 477d6231db Update source files to pass vet checks for go 1.7
The vet checks for some files did not pass for go 1.7. As part of a
preliminary start to making go 1.7 work with this software, go vet
should pass.

Also updated the gogo/protobuf dependency which fixed the code generator
to work with go 1.7 too. Ran `go generate` on the entire repository to
ensure every file was up to date.
2016-09-14 15:01:22 -05:00
Cory LaNou 71f0c7e1e9 return appropriate error if overflowing duration when parsing 2016-09-14 09:27:38 -05:00
Jonathan A. Sternberg 0b94f5dc1a Skip past points at the same time in derivative call within a merged series
The derivative() call would panic if it received two points at the same
time because it tried to divide by zero. The derivative call now skips
past these points. To avoid skipping past these points, use `GROUP BY *`
so that each series is kept separated into their own series.

The difference() call has also been modified to skip past these points.
Even though difference doesn't divide by the time, difference is
supposed to perform the same as derivative, but without dividing by the
time.
2016-09-13 16:57:36 -05:00
Jonathan A. Sternberg dbb8c5570c Duplicate parsing bug in ALTER RETENTION POLICY
Return an error when we encounter the same option twice in ALTER
RETENTION POLICY and remove the `maxNumOptions` number from the parsing
loop. The `maxNumOptions` number would need to be modified if another
option was added to the parsing loop and it didn't correctly prevent
duplicate options from being reported as an error anyway.
2016-09-13 15:56:13 -05:00
Jonathan A. Sternberg aae88fc3c3 Support ON and use default database for SHOW commands
Normalize all of the SHOW commands so they allow both using ON to
specify the database and using the default database. Some commands would
require one and some would require the other and it was confusing when
using the query language.

Affected commands:

* SHOW RETENTION POLICIES
* SHOW MEASUREMENTS
* SHOW SERIES
* SHOW TAG KEYS
* SHOW TAG VALUES
* SHOW FIELD KEYS
2016-09-13 15:36:59 -05:00
Jonathan A. Sternberg 394c13870b Remove defunct `Substatement()` call 2016-09-13 14:17:31 -05:00
Jonathan A. Sternberg 4326da0820 Implement time math for lazy time literals
When attempting to reduce the WHERE clause, the time literals had not
been converted from string literals yet. This adds the functionality to
have it handle the same time math when the time literal is still a
string literal.
2016-09-09 13:34:56 -05:00
Jonathan A. Sternberg 04c59b8941 Fix the dollar sign so it properly handles reserved keywords
The dollar sign would sometimes be accepted as whitespace if it was
immediately followed by a reserved keyword or an invalid character. It
now reads these properly as a bound parameter rather than ignoring the
dollar sign.
2016-09-02 15:32:46 -05:00
Jonathan A. Sternberg 4ff0b10210 Merge pull request #7139 from influxdata/js-7137-show-tag-values-string-method
Properly output the SHOW TAG VALUES command so it can be reparsed
2016-09-01 10:19:19 -05:00
Jonathan A. Sternberg dc2527ce86 Merge branch '1.0' 2016-08-31 14:45:57 -05:00
Jonathan A. Sternberg 23f2d50ecb Use defaults from `meta` package for `CREATE DATABASE`
Instead of having the parser set the defaults, the command will set the
defaults so that the constants for that are actually used. This way we
can also identify which things the user provided and which ones we are
filling with default values.

This allows the meta client to be able to make smarter decisions when
determining if the user requested a conflict or if the requested
capabilities match with what is currently available. If you just say
`CREATE DATABASE WITH NAME myrp`, the user doesn't really care what the
duration of the retention policy is and just wants to use the default.
Now, we can use that information to determine if an existing retention
policy would conflict with what the user requested rather than returning
an error if a default value ever gets changed since the meta client
command can communicate intent more easily.
2016-08-30 13:23:49 -05:00
Nathaniel Cook 888dc8cbd2 Merge pull request #7234 from influxdata/nc-influxql-readme
Update Influxql Readme
2016-08-29 13:09:34 -06:00
Jonathan A. Sternberg f67558c2a7 Merge pull request #7236 from influxdata/js-7220-revert-limit-shard-concurrency
Revert "limit shard concurrency"
2016-08-29 13:41:46 -05:00
Nathaniel Cook 3ab4e9fa1d update InfluxQL readme to reflect current code 2016-08-29 12:33:55 -06:00
Jonathan A. Sternberg c05c7f6360 Revert "limit shard concurrency"
This reverts commit 6c7d56d4bc.
2016-08-29 12:39:52 -05:00
Jonathan A. Sternberg b8a70105aa Fix alter retention policy when all options are used
We added `SHARD DURATION` as an extra option, but forgot to increase the
maximum number of allowable options from 3 to 4. So if 4 options were
used, the last one was ignored. This was commonly `DEFAULT`, but it
could have been any of the options.
2016-08-26 11:25:18 -05:00
Jonathan A. Sternberg 8b234546a8 Merge pull request #7204 from influxdata/1.0
Merge 1.0 branch to master
2016-08-25 15:20:30 -05:00
Jonathan A. Sternberg 10029caf2f Support negative timestamps in the query engine
Negative timestamps are now supported. We also now refuse two
nanoseconds that are at the edge of the minimum time window. One of the
nanoseconds we do not accept is because we need MinInt64 to be used for
some internal comparisons in the TSM engine and it was causing an
underflow when we subtracted one from the minimum time. The second is so
we can have one minimum time that signifies the default minimum that
nobody can write to (so we can implicitly rewrite the timestamp on
aggregate queries) but still use the explicit timestamp if it is given
to us by the user. We aren't able to tell the difference between if the
user provided it or if it was implicit without those values being
different.

If the default minimum time is used with an aggregate query, we rewrite
the time to be the epoch for backwards compatibility since we believe
that's more important than supporting that extra nanosecond.
2016-08-25 12:52:41 -05:00
Jonathan A. Sternberg 993ac1ca2e Remove confusing comment and unnecessary continue 2016-08-23 19:43:18 -05:00
Ashish Gaurav 4e17f9bb13 add mode() function & tests 2016-08-23 19:31:41 -05:00
Edd Robinson 90ff713f21 Fix base64 encoding issue in stats
Fixes #7177.
2016-08-22 15:21:31 +01:00
Ben Johnson 8aa224b22d
reduce memory allocations in index
This commit changes the index to point to index data in the shards
instead of keeping it in-memory on the heap.
2016-08-16 14:09:00 -06:00
Jonathan A. Sternberg f0f7d91d6c Properly output all commands so they can be reparsed
The commands fixed:
* SHOW TAG VALUES
* SHOW STATS
* SHOW DIAGNOSTICS
2016-08-15 15:04:51 -05:00
Jonathan A. Sternberg 87f7c66b8a Merge pull request #7119 from influxdata/js-create-database-use-defaults
Use defaults from `meta` package for `CREATE DATABASE`
2016-08-11 10:34:22 -05:00
Jonathan A. Sternberg 32d10de94f Check in between query statements to see if the query was interrupted
This allows a long series of uninterruptible statements to still be
interrupted for a long running query that might do something like create
or drop many databases.
2016-08-10 15:36:02 -05:00
Jonathan A. Sternberg ab049d7f0a Support mixed duration units
It is now possible to use a mixed duration unit like `1h30m`. The
duration units can be in whatever order as long as they are connected to
each other.

There is a change to the scanner. A token such as `10x` will be scanned
as a duration literal, but will then fail to parse as an invalid
duration. This should not be a breaking change as there is no situation
where `10m10` was a valid order of tokens for the parser.

Fixes #3634.
2016-08-10 13:34:19 -05:00
Jonathan A. Sternberg 3959656968 Add additional statistics to query executor
The query executor would only store the number of active queries and the
query duration so it was impossible to determine how many queries were
actually executed during that timeframe because quick queries would be
gone before the call to gather statistics was made.

This adds two new statistics so track when queries start and when
queries finish and doesn't decrement the counter so the number of
executed queries can be obtained using `derivative()` and
`difference()`.
2016-08-10 11:35:06 -05:00
Jonathan A. Sternberg 530b00bd76 Use defaults from `meta` package for `CREATE DATABASE`
Instead of having the parser set the defaults, the command will set the
defaults so that the constants for that are actually used. This way we
can also identify which things the user provided and which ones we are
filling with default values.

This allows the meta client to be able to make smarter decisions when
determining if the user requested a conflict or if the requested
capabilities match with what is currently available. If you just say
`CREATE DATABASE WITH NAME myrp`, the user doesn't really care what the
duration of the retention policy is and just wants to use the default.
Now, we can use that information to determine if an existing retention
policy would conflict with what the user requested rather than returning
an error if a default value ever gets changed since the meta client
command can communicate intent more easily.
2016-08-09 12:00:06 -05:00
Ben Johnson 55b3e63ced
concurrent series limit
This commit fixes the `MaxSelectSeriesN` limit which was broken by
the implementation of lazy iterators. The setting previously limited
the total number of series but the new implementation limits the
concurrent number of series being processed.
2016-08-09 08:58:01 -06:00