Commit Graph

366 Commits (19f331a450d7542a0b471423adac7b949baa9628)

Author SHA1 Message Date
Jonathan A. Sternberg a0d8c1ca9f Add modulo operator to the query language 2017-02-10 10:16:37 -06:00
Jason Wilder 1bc0f68490 Merge branch '1.2' into jw-merge-12 2017-02-07 12:48:36 -07:00
Jonathan A. Sternberg caaad60dcf Fix authentication when subqueries are present
The code that checked if a query was authorized did not account for
sources that were subqueries. Now, the check for the required privileges
will descend into the subquery and add the subqueries required
privileges to the list of required privileges for the entire query.
2017-02-06 09:43:14 -06:00
Joe LeGasse dd9278a098 regex: don't use exact match for case insensitive expression
Fixes #7906

In an attempt to reduce the overhead of using regex for exact matches,
the query parser will replace `=~ /^thing$/` with `== 'thing'`, but the
conditions being checked would ignore if any flags were set on the
expression, so `=~ /(?i)^THING$/` was replaced with `== 'THING'`, which
will fail unless the case was already exact. This change ensures that no
flags have been changed from those defaulted by the parser.
2017-02-02 10:49:12 -05:00
Joe LeGasse 93d18d42a6 regex: don't use exact match for case insensitive expression
Fixes #7906

In an attempt to reduce the overhead of using regex for exact matches,
the query parser will replace `=~ /^thing$/` with `== 'thing'`, but the
conditions being checked would ignore if any flags were set on the
expression, so `=~ /(?i)^THING$/` was replaced with `== 'THING'`, which
will fail unless the case was already exact. This change ensures that no
flags have been changed from those defaulted by the parser.
2017-02-02 10:25:08 -05:00
Jonathan A. Sternberg e060fd0aa3 Fix EvalType when a parenthesis expression is used
It did not descend into the expression within the parenthesis correctly
and would just recurse infinitely on itself instead.
2017-01-31 10:35:21 -06:00
Jonathan A. Sternberg e8719c90ab Fix EvalType when a parenthesis expression is used
It did not descend into the expression within the parenthesis correctly
and would just recurse infinitely on itself instead.
2017-01-31 10:19:43 -06:00
Paul Dix a801c9dea6 Merge pull request #7889 from influxdata/js-subquery-fixes
Cherry-pick 1.2 fixes for subqueries into master
2017-01-26 10:49:37 -05:00
Edd Robinson 91ee34b111 Merge pull request #7837 from influxdata/er-tidy
General tidy up and subtle bug fixes
2017-01-26 13:43:07 +00:00
Jonathan A. Sternberg ce54856e3d Expand query dimensions from the subquery
During development, I, at some point, decided that the dimensions should
be expanded based on what was available rather than what was present in
the subquery. I don't really know the rationale for this because I
forgot, but it doesn't make sense or seem to be particularly useful.

Expanding dimensions now just uses the values specified in the subquery
rather than expanding to all available dimensions of the measurement in
the subquery.
2017-01-25 16:33:03 -06:00
Jonathan A. Sternberg 92c5d336b4 Expand query dimensions from the subquery
During development, I, at some point, decided that the dimensions should
be expanded based on what was available rather than what was present in
the subquery. I don't really know the rationale for this because I
forgot, but it doesn't make sense or seem to be particularly useful.

Expanding dimensions now just uses the values specified in the subquery
rather than expanding to all available dimensions of the measurement in
the subquery.
2017-01-25 16:02:37 -06:00
Jonathan A. Sternberg 3d4d9062a0 Update subqueries so groupings are propagated to inner queries
Previously, only time expressions got propagated inwards. The reason for
this was simple. If the outer query was going to filter to a specific
time range, then it would be unnecessary for the inner query to output
points within that time frame. It started as an optimization, but became
a feature because there was no reason to have the user repeat the same
time clause for the inner query as the outer query. So we allowed an
aggregate query with an interval to pass validation in the subquery if
the outer query had a time range. But `GROUP BY` clauses were not
propagated because that same logic didn't apply to them. It's not an
optimization there. So while grouping by a tag in the outer query
without grouping by it in the inner query was useless, there wasn't any
particular reason to care.

Then a bug was found where wildcards would propagate the dimensions
correctly, but the outer query containing a group by with the inner
query omitting it wouldn't correctly filter out the outer group by. We
could fix that filtering, but on further review, I had been seeing
people make that same mistake a lot. People seem to just believe that
the grouping should be propagated inwards. Instead of trying to fight
what the user wanted and explicitly erase groupings that weren't
propagated manually, we might as well just propagate them for the user
to make their lives easier. There is no useful situation where you would
want to group into buckets that can't physically exist so we might as
well do _something_ useful.

This will also now propagate time intervals to inner queries since the
same applies there. But, while the interval propagates, the following
query will not pass validation since it is still not possible to use a
grouping interval with a raw query (even if the inner query is an
aggregate):

    SELECT * FROM (SELECT mean(value) FROM cpu) WHERE time > now() - 5m GROUP BY time(1m)

This also means wildcards will behave a bit differently. They will
retrieve dimensions from the sources in the inner query rather than just
using the dimensions in the group by.

Fixing top() and bottom() to return the correct auxiliary fields.
Unfortunately, we were not copying the buffer with the auxiliary fields
so those values would be overwritten by a later point.
2017-01-23 15:01:19 -06:00
Jonathan A. Sternberg f628b4a198 Update subqueries so groupings are propagated to inner queries
Previously, only time expressions got propagated inwards. The reason for
this was simple. If the outer query was going to filter to a specific
time range, then it would be unnecessary for the inner query to output
points within that time frame. It started as an optimization, but became
a feature because there was no reason to have the user repeat the same
time clause for the inner query as the outer query. So we allowed an
aggregate query with an interval to pass validation in the subquery if
the outer query had a time range. But `GROUP BY` clauses were not
propagated because that same logic didn't apply to them. It's not an
optimization there. So while grouping by a tag in the outer query
without grouping by it in the inner query was useless, there wasn't any
particular reason to care.

Then a bug was found where wildcards would propagate the dimensions
correctly, but the outer query containing a group by with the inner
query omitting it wouldn't correctly filter out the outer group by. We
could fix that filtering, but on further review, I had been seeing
people make that same mistake a lot. People seem to just believe that
the grouping should be propagated inwards. Instead of trying to fight
what the user wanted and explicitly erase groupings that weren't
propagated manually, we might as well just propagate them for the user
to make their lives easier. There is no useful situation where you would
want to group into buckets that can't physically exist so we might as
well do _something_ useful.

This will also now propagate time intervals to inner queries since the
same applies there. But, while the interval propagates, the following
query will not pass validation since it is still not possible to use a
grouping interval with a raw query (even if the inner query is an
aggregate):

    SELECT * FROM (SELECT mean(value) FROM cpu) WHERE time > now() - 5m GROUP BY time(1m)

This also means wildcards will behave a bit differently. They will
retrieve dimensions from the sources in the inner query rather than just
using the dimensions in the group by.

Fixing top() and bottom() to return the correct auxiliary fields.
Unfortunately, we were not copying the buffer with the auxiliary fields
so those values would be overwritten by a later point.
2017-01-23 12:38:10 -06:00
Edd Robinson 7374e48999 Remove dead code from influxql 2017-01-17 09:47:34 -08:00
Jonathan A. Sternberg d7c8c7ca4f Support subquery execution in the query language
This adds query syntax support for subqueries and adds support to the
query engine to execute queries on subqueries.

Subqueries act as a source for another query. It is the equivalent of
writing the results of a query to a temporary database, executing
a query on that temporary database, and then deleting the database
(except this is all performed in-memory).

The syntax is like this:

    SELECT sum(derivative) FROM (SELECT derivative(mean(value)) FROM cpu GROUP BY *)

This will execute derivative and then sum the result of those derivatives.
Another example:

    SELECT max(min) FROM (SELECT min(value) FROM cpu GROUP BY host)

This would let you find the maximum minimum value of each host.

There is complete freedom to mix subqueries with auxiliary fields. The only
caveat is that the following two queries:

    SELECT mean(value) FROM cpu
    SELECT mean(value) FROM (SELECT value FROM cpu)

Have different performance characteristics. The first will calculate
`mean(value)` at the shard level and will be faster, especially when it comes to
clustered setups. The second will process the mean at the top level and will not
include that optimization.
2017-01-07 13:00:48 -06:00
Mark Rushakoff 88b8bd2465 Update godoc for package influxql
I did not look at any of the .gen.go files.
2016-12-30 18:02:52 -08:00
Jonathan A. Sternberg e885fe5117 Expand string and boolean fields when using a wildcard with sample() 2016-11-15 15:56:47 -06:00
Tom Young 24fa1ac1c0 Remove old function which is no longer used. 2016-11-06 13:38:59 +00:00
Edd Robinson b12b0d12fb Add regex benchmarks and fix existing approach 2016-10-25 11:10:03 +01:00
Edd Robinson 06d1226b9a Rewrite exact match regexes to use tsdb index
This commit adds support for replacing regexes with non-regex conditions
when possible. Currently the following regexes are supported:

 - host =~ /^foo$/ will be converted into host = 'foo'
 - host !~ /^foo$/ will be converted into host != 'foo'

Note: if the regex expression contains character classes, grouping,
repetition or similar, it may not be rewritten.

For example, the condition: name =~ /^foo|bar$/ will not be rewritten.
Support for this may arrive in the future.

Regexes that can be converted into simpler expression will be able to
take advantage of the tsdb index, making them significantly faster.
2016-10-25 11:10:03 +01:00
Mark Rushakoff 0ddb7ad842 Disallow derivative call with non-duration 2nd arg
Previously, calling derivative with a non-duration second argument was
allowed during parsing but would panic during execution due to a failed
type conversion. This change ensures the second argument is a duration
literal.
2016-10-17 16:20:53 -07:00
Jonathan A. Sternberg 3496c5b85f Merge pull request #7442 from influxdata/js-5955-make-regex-work-on-field-keys-in-select
Support using regexes to select fields and dimensions
2016-10-17 11:37:47 -05:00
Jonathan A. Sternberg b60b4b371e Support using regexes to select fields and dimensions
The functionality works the same as wildcards, but this time, you can
specify a regular expression.

One limitation is that you can't specify whether you only want to select
fields or tags. Since the regex can be changed to suit the person's
needs, I don't currently think this is an issue.
2016-10-13 22:17:14 -05:00
Jonathan A. Sternberg 95859b8ab4 Remove accidentally added string support for the stddev call
Strings would always return an empty string and stddev is meaningless
when it comes to strings. This removes that functionality so strings
don't automatically get picked up when using a wildcard.
2016-10-10 14:58:28 -05:00
Jonathan A. Sternberg 6afc2a77a5 Implement cumulative_sum() function
The `cumulative_sum()` function can be used to sum each new point and
output the current total. For the following points:

    cpu value=2 0
    cpu value=4 10
    cpu value=6 20

This would output the following points:

    > SELECT cumulative_sum(value) FROM cpu
    time    value
    ----    -----
    0       2
    10      6
    20      12

As can be seen, each new point adds to the sum of the previous point and
outputs the value with the same timestamp.

The function can also be used with an aggregate like `derivative()`.

    > SELECT cumulative_sum(mean(value) FROM cpu WHERE time >= now() - 10m GROUP BY time(1m)
2016-10-07 10:11:53 -05:00
Michael Desa f9b8129770 Add sample function to query language
First Pass at implementing sample

Add sample iterators for all types

Remove size from sample struct

Fix off by one error when generating random number

Add benchmarks for sample iterator

Add test and associated fixes for off by one error

Add test for sample function

Remove NumericLiteral from sample function call

Make clear that the counter is incr w/ each call

Rename IsRandom to AllSamplesSeen

Add a rng for each reducer that is created

The default rng that comes with math/rand has a global lock. To avoid
having to worry about any contention on the lock, each reducer now has
its own time seeded rng.

Add sample function to changelog
2016-10-06 09:41:42 -07:00
Michael Desa 966e5503bf Add fill(linear) to query language
Clean up template for fill average

Change fill(average) to fill(linear)

Update average to linear in infuxql spec

Add Integer Tests and associated fixes

Update CHANGELOG for fill(linear)
2016-10-04 14:27:04 -07:00
Jason Wilder a3fd12198e Avoid extra allocations when evalating binary expressions 2016-09-29 13:18:38 -06:00
Jonathan A. Sternberg 3afdf3cd94 Merge tag 'v1.0.1' 2016-09-27 17:53:33 -05:00
Jonathan A. Sternberg dbc4a9150f Prevent manual use of system queries
Manual use of system queries could result in a user using the query
incorrect. Rather than check to make sure the query was used correctly,
we're just going to prevent users from using those sources so they can't
use them incorrectly.
2016-09-23 10:00:18 -05:00
Jonathan A. Sternberg 635ce337f0 Merge pull request #7304 from influxdata/js-remove-substatement-method
Remove defunct `Substatement()` call
2016-09-15 08:32:40 -05:00
Jonathan A. Sternberg aae88fc3c3 Support ON and use default database for SHOW commands
Normalize all of the SHOW commands so they allow both using ON to
specify the database and using the default database. Some commands would
require one and some would require the other and it was confusing when
using the query language.

Affected commands:

* SHOW RETENTION POLICIES
* SHOW MEASUREMENTS
* SHOW SERIES
* SHOW TAG KEYS
* SHOW TAG VALUES
* SHOW FIELD KEYS
2016-09-13 15:36:59 -05:00
Jonathan A. Sternberg 394c13870b Remove defunct `Substatement()` call 2016-09-13 14:17:31 -05:00
Jonathan A. Sternberg 4326da0820 Implement time math for lazy time literals
When attempting to reduce the WHERE clause, the time literals had not
been converted from string literals yet. This adds the functionality to
have it handle the same time math when the time literal is still a
string literal.
2016-09-09 13:34:56 -05:00
Jonathan A. Sternberg 4ff0b10210 Merge pull request #7139 from influxdata/js-7137-show-tag-values-string-method
Properly output the SHOW TAG VALUES command so it can be reparsed
2016-09-01 10:19:19 -05:00
Jonathan A. Sternberg 23f2d50ecb Use defaults from `meta` package for `CREATE DATABASE`
Instead of having the parser set the defaults, the command will set the
defaults so that the constants for that are actually used. This way we
can also identify which things the user provided and which ones we are
filling with default values.

This allows the meta client to be able to make smarter decisions when
determining if the user requested a conflict or if the requested
capabilities match with what is currently available. If you just say
`CREATE DATABASE WITH NAME myrp`, the user doesn't really care what the
duration of the retention policy is and just wants to use the default.
Now, we can use that information to determine if an existing retention
policy would conflict with what the user requested rather than returning
an error if a default value ever gets changed since the meta client
command can communicate intent more easily.
2016-08-30 13:23:49 -05:00
Jonathan A. Sternberg 8b234546a8 Merge pull request #7204 from influxdata/1.0
Merge 1.0 branch to master
2016-08-25 15:20:30 -05:00
Jonathan A. Sternberg 10029caf2f Support negative timestamps in the query engine
Negative timestamps are now supported. We also now refuse two
nanoseconds that are at the edge of the minimum time window. One of the
nanoseconds we do not accept is because we need MinInt64 to be used for
some internal comparisons in the TSM engine and it was causing an
underflow when we subtracted one from the minimum time. The second is so
we can have one minimum time that signifies the default minimum that
nobody can write to (so we can implicitly rewrite the timestamp on
aggregate queries) but still use the explicit timestamp if it is given
to us by the user. We aren't able to tell the difference between if the
user provided it or if it was implicit without those values being
different.

If the default minimum time is used with an aggregate query, we rewrite
the time to be the epoch for backwards compatibility since we believe
that's more important than supporting that extra nanosecond.
2016-08-25 12:52:41 -05:00
Ashish Gaurav 4e17f9bb13 add mode() function & tests 2016-08-23 19:31:41 -05:00
Jonathan A. Sternberg f0f7d91d6c Properly output all commands so they can be reparsed
The commands fixed:
* SHOW TAG VALUES
* SHOW STATS
* SHOW DIAGNOSTICS
2016-08-15 15:04:51 -05:00
Jonathan A. Sternberg 530b00bd76 Use defaults from `meta` package for `CREATE DATABASE`
Instead of having the parser set the defaults, the command will set the
defaults so that the constants for that are actually used. This way we
can also identify which things the user provided and which ones we are
filling with default values.

This allows the meta client to be able to make smarter decisions when
determining if the user requested a conflict or if the requested
capabilities match with what is currently available. If you just say
`CREATE DATABASE WITH NAME myrp`, the user doesn't really care what the
duration of the retention policy is and just wants to use the default.
Now, we can use that information to determine if an existing retention
policy would conflict with what the user requested rather than returning
an error if a default value ever gets changed since the meta client
command can communicate intent more easily.
2016-08-09 12:00:06 -05:00
Jonathan A. Sternberg 4cdfc3280d Move the CQ interval by the group by offset
This will make the period selected by the CQ system work correctly for a
query with an offset.
2016-08-05 14:39:52 -05:00
Cory LaNou 1117526873 remove IF EXISTS/IF NOT EXISTS from influxql language 2016-07-29 12:58:05 -05:00
Jonathan A. Sternberg 23ef9484a4 Support wildcards in aggregate functions 2016-07-28 17:56:32 -05:00
Jonathan A. Sternberg 837a9804cf Refactoring the monitor service to avoid expvar
Truncate the time interval output of the monitor service to be on even
time intervals rather than on every minute based on the start time. This
normalizes the output from the monitor service.
2016-07-07 11:13:58 -05:00
Jonathan A. Sternberg bb060a60c6 Fix regex binary encoding for a measurement
Previously, it encoded the text representation of the regex literal
which included the surrounding slashes used in the query language. The
binary encoding should only include the exact string used to create the
regular expression.
2016-07-05 11:39:41 -05:00
Jonathan A. Sternberg 252cde1e81 Fix golint errors for the influxql package 2016-06-20 08:51:02 -05:00
Jonathan A. Sternberg 9837de793c Support regex and other operations for selecting the key in SHOW TAG VALUES
This adds support for using regex expressions in SHOW TAG VALUES when
selecting the key. Also supporting the `!=` operation for the
comparison. Now you can do any of the following:

    SHOW TAG VALUES WITH KEY != "region"
    SHOW TAG VALUES WITH KEY =~ /region/
    SHOW TAG VALUES WITH KEY !~ /region/

It also adds a new SetLiteral AST node that will potentially be used in
the future to allow set operations for other comparisons in the future.

Fixes #4532.
2016-06-13 10:03:14 -05:00
Jonathan A. Sternberg 2fa6d306c2 Add option to KILL QUERY to kill on a specific host
Option only applies to clustering.
2016-06-07 16:48:07 -05:00
Joe LeGasse f2fd988ab9 Delay parsing of date/time strings until needed
The current code would compare every string literal it crossed and tried
to coerce them to time literals if the _looked_ like date/time strings.

The only time the TimeLiteral was used is when comparing to the the
'time' value in a where clause. This change moves the string parsing
code until we attempt to compare 'time' to a string, at which point we
know we need/want a TimeLiteral, and not just an ordinary string.

Fixes #6727
2016-05-27 09:43:45 -04:00