Commit Graph

28 Commits (933a14e16f0232c5ec883644b7363a59da7bcd90)

Author SHA1 Message Date
Sam Arnold 98361e2073
fix: error instead of panic for statement rewrite failure (#21792) 2021-07-06 11:05:21 -04:00
Sam Arnold 903b8cd0ea
feat(query): Hyper log log operators in influxql (#20603)
* feat(query): hyper log log counting in query engine

In addition to helping with normal queries, this can improve the 'SHOW CARDINALITY'
meta-queries:

time influx -database mydb -execute 'select count_hll(sum_hll(_seriesKey)) from big'
name: big
time count_hll
---- ---------
0    200767781
influx -database mydb -execute   0.06s user 0.12s system 0% cpu 8:49.99 total
2021-02-08 08:38:14 -05:00
Ayan George 2529799abb
refactor: Change ToLower comparisons to EqualFold (#18147)
When comparing strings in a case-insensitive way, strings.EqualFold() is
(almost?) always faster than comparing the results of strings.ToLower().

In addition, strings.EqualFold() never causes an allocation.

This patch replaces case-insensitive string comparisons that use
strings.ToLower() with a strings.EqualFold() call.
2020-05-18 19:46:59 -04:00
Adam ec32b3a4bb fix(query/compile.go): time range was exceeding min/max bounds under certain conditions
added test for missing lower bound
2019-08-14 09:59:17 -04:00
Jonathan A. Sternberg 728bdd6562 Fix the derivative and others time ranges for aggregate data
The derivative function and others similar to it would preload
themselves with data so that the first interval would be the start of
the time range. That meant reading data outside of the time range.

One change to the shard mapper back in v1.4.0 caused the shard mapper to
constrict queries to the intervals given to the shard mapper. This was
correct because the shard mapper can only deal with times it has mapped,
but this broke the functionality of looking back into the past for
the derivative and other functions that used that functionality.

The compiler has been updated with an additional attribute that records
how many intervals in the past will need to be read so that the shard
mapper can include extra times that it may not necessarily read from,
but may be queried because of the above described functionality.
2018-09-10 09:59:33 -05:00
Jonathan A. Sternberg c6811f4725 Fix the inherited interval for derivative and others
The compiler was too strict. The inherited interval from an outer query
should not have caused the inner query to fail because inherited
intervals are only implicitly passed to inner queries that support group
by time functionality. Since the inner query with a derivative doesn't
support grouping by time and the inner query itself doesn't specify a
time, the outer query shouldn't have invalidated the inner query.
2018-08-30 13:01:18 -05:00
Patrick Hemmer 7dc7efd501 rename "triple_exponential_average" -> "triple_exponential_derivative" 2018-05-16 19:40:12 -04:00
Jonathan A. Sternberg d42062def2 Add technical analysis algorithms
This adds numerous technical analysis algorithms:

* exponential_moving_average
* double_exponential_moving_average
* triple_exponential_moving_average
* relative_strength_index
* triple_exponential_average
* kaufmans_efficiency_ratio (commonly referred to as just "Efficiency Ratio")
* kaufmans_adaptive_moving_average
* chande_momentum_oscillator (both the common 'smoothed' version, and the ta-lib version)
2018-04-23 22:27:21 -04:00
Jonathan A. Sternberg 58bcc6fdc9 Fix the validation for multiple nested distinct calls 2018-04-23 14:38:02 -05:00
Tom Young 42581c7432 Add new math functions:
- abs
- asin, acos, atan, atan2
- exp, ln, log, log2, log10
- pow, sqrt
2018-04-17 12:56:36 -05:00
Jonathan A. Sternberg 92dd6ea978 Fix regression to allow now() to be used as the group by offset again 2018-03-26 10:52:32 -05:00
Jonathan A. Sternberg 6e627cfdbf Implement basic trigonometry functions
This adds support for math functions into the query language. Math
functions are special because they are transformations and do not access
the filesystem in the same way aggregate functions do. A transformation
takes one point and always outputs one point making it more similar to
binary expressions so these math functions follow the same rules as
binary expressions.

This also supports using math literals (so you can do `sin(1)`) and the
math functions can be used anywhere such as in a field or an expression.
Both of the following are supported:

    SELECT sin(value) FROM cpu
    SELECT value FROM cpu WHERE sin(value) > 0.5

Arguments are in radians. Degrees is not supported.
2018-03-20 14:13:52 -05:00
Jonathan A. Sternberg f8d60a881d Refactor the math engine to compile the query and use eval
This change makes it so that we simplify the math engine so it doesn't
use a complicated set of nested iterators. That way, we have to change
math in one fewer place.

It also greatly simplifies the query engine as now we can create the
necessary iterators, join them by time, name, and tags, and then use the
cursor interface to read them and use eval to compute the result. It
makes it so the auxiliary iterators and all of their complexity can be
removed.

This also makes use of the new eval functionality that was recently
added to the influxql package.

No math functions have been added, but the scaffolding has been included
so things like trigonometry functions are just a single commit away.

This also introduces a small breaking change. Because of the call
optimization, it is now possible to use the same selector multiple times
as a selector. So if you do this:

    SELECT max(value) * 2, max(value) / 2 FROM cpu

This will now return the timestamp of the max value rather than zero
since this query is considered to have only a single selector rather
than multiple separate selectors. If any aspect of the selector is
different, such as different selector functions or different arguments,
it will consider the selectors to be aggregates like the old behavior.
2018-03-19 15:01:15 -05:00
Jonathan A. Sternberg c8b0c6e166 Update influxql to include the function type evaluators in the query package 2018-03-14 15:42:28 -05:00
Jonathan A. Sternberg 733d842812 Turn the ExecutionContext into a context.Context
Along with modifying ExecutionContext to be a context and have the
TaskManager return the context itself, this also creates a Monitor
interface and exposes the Monitor through the Context. This way, we can
access the monitor from within the query.Select method and keep all of
the limits inside of the query package instead of leaking them into the
statement executor.

An eventual goal is to remove the InterruptCh from the IteratorOptions
and use the Context instead, but for now, we'll just assign the done
channel from the Context to the IteratorOptions so at least they refer
to the same channel.
2018-03-08 14:03:20 -06:00
Jonathan A. Sternberg 9e122eb1a4 Fix the implicit time range in a subquery
The implicit time range for an interval is supposed to be now when no
end is specified. In a subquery though, the interval doesn't exist and
so it doesn't set the end time to now, but to the max time. Since the
subquery qualifies as something that should have the implicit end time
apply, this results in a query that runs slowly because it is filling in
a bunch of unasked for intervals if a fill is specified.

This hack adds the implicit end time if it sees the parent query's end
time is set to the maximum available time.

This is a temporary fix for this problem. The query compilation should
perform these time range calculations in the compilation stage and the
subqueries should use the compilation stage during execution instead of
ignoring it. That work takes a lot more effort though and is more prone
to running into unforeseen bugs.

This fix introduces a subtle, but likely rare to run into bug. If the
top level query specifies the maximum time as the end time and the
subquery has an interval, the subquery should use the end time rather
than now as the time range. With this hack, it will interpret it as an
implicit time rather than an explicit one. This is unlikely to matter
though.
2018-02-27 17:10:10 -06:00
Jonathan A. Sternberg ca471f7d0f Fix regression when math between literals is used in a field 2018-02-14 14:34:34 -05:00
Jonathan A. Sternberg db60a83d5a Fix query compilation so multiple nested distinct calls is allowable
When refactoring the query engine, I thought calling
`count(distinct(value))` multiple times was disallowed and so the
refactor made it so that wasn't possible.

It turns out that this pattern is allowed because since the distinct is
nested, it is aggregated anyway and can be combined with other
aggregates.

This removes the erroneously placed restriction.
2017-11-28 11:09:32 -06:00
Edd Robinson fbcb299b8a Support WHERE time clause in SHOW TAG VALUES
This commit adds time support to SHOW TAG VALUES. Time can be used as
both a lower and upper boundary. However, there are some caveats.

For the `inmem` index, filtering by time will still return all results
because the index data is shared across shards.

For the `tsi1` index, filtering by time will only work down to the shard
lever. Specifically, when querying by time all shards within that time
range will be used to generate the results.
2017-11-06 19:15:01 +00:00
Edd Robinson 98d584b63f Use index for SHOW X meta queries
When a meta query does not include a time component then it can be
answered exclusively by the index. This should result in a much faster
query execution that if the TSM engine was engaged.

This commit rewrites the following queries such that they make use
of the index where no time component is present:

  - SHOW MEASUREMENTS
  - SHOW SERIES
  - SHOW TAG KEYS
  - SHOW FIELD KEYS
2017-11-06 19:15:00 +00:00
Stuart Carnie f3d45ba301 influxdata/influxdb/influxql -> influxdata/influxql 2017-10-30 14:40:26 -07:00
Jonathan A. Sternberg f20cab6e99 Implicitly decide on the lower limit for fill queries when none is present
This allows the query:

    SELECT mean(value) FROM cpu GROUP BY time(1d)

To function in some way that makes sense. The upper limit is implicitly
the `now()` starting time and the lower limit will be whichever interval
the lowest point falls into.

When no lower bound is specified and `max-select-buckets` is specified,
the query will only consider points that would satisfy
`max-select-buckets`. So if you have one point written in 1970, have
another point within the last minute, and then do the above query with
`max-select-buckets` being equal to 10, the older point from 1970 will
not be considered.
2017-10-05 15:56:44 -05:00
Jonathan A. Sternberg 9cbd604603 Fix time constraints in subqueries from the refactor 2017-09-08 11:55:53 -05:00
Jonathan A. Sternberg 1c7bafcd3e Force subqueries to match the parent queries ordering
Previously, subqueries would honor their own ordering. We never really
supported that and I have no idea if it would work since most parts in
the query engine assume that points are being delivered in only one
ordering.

Subqueries have now been modified so if a person tries to do different
ordering, they get an error when running the query. If they specify an
ordering in the top most query, that ordering gets propagated to all
subqueries.

Fixes #8699.
2017-08-28 15:57:40 -05:00
Jonathan A. Sternberg 5dbcd1b06f Merge pull request #8745 from influxdata/js-refactor-validation
Refactor validation code and move it to the compiler
2017-08-28 11:35:24 -05:00
Jonathan A. Sternberg d2fcb893e1 Close the query shard group after the iterators are created
Now, the prepared statement keeps the open resource and closing the open
resource created from `Prepare` is the responsibility of the prepared
statement.

This also nils out the local shard mapping after it is closed to prevent
it from being used after it is closed.
2017-08-28 09:46:11 -05:00
Jonathan A. Sternberg 905e7fe05e Refactor validation code and move it to the compiler
This refactors the validation code so it is more flexible and performs a
small bit of work to make preparing and executing the query easier.

The general idea is that compilation will eventually do more heavy
lifting in creating the initial plan and prepare will construct an
actual plan rather than just doing some basic field rewriting.

This change at least sets us up for that change in the future and moves
the validation code to the query execution instead of in the parser.

This also frees up the parser to parse the complete AST without worrying
if the query itself is valid. That could be useful for client code that
wants to compile a partial query to an AST and then perform
modifications on the AST for some reason.
2017-08-26 17:36:32 -05:00
Jonathan A. Sternberg 8738e72cf1 Refactor the select call into three separate phases
The first call is to compile the query. This performs some initial
processing that can be done before having any access to the shards. At
the moment, it does very little, but it's intended to be changed to
eventually perform initial validations of the query and create an
internal graph structure for the execution of the query.

The second call is to prepare the query. This step has access to the
shard mapper. Right now, it just maps the shards and rewrites the fields
of the query for any wildcards. In the future, it is intended to do the
above, but also to prepare the final directed acyclical graph that will
execute the query.

The third call is to select the query. This step is intended to create
all of the iterators for processing the query. At the moment, much of
the work intended for the second step is performed in the third step.
2017-08-25 07:50:13 -05:00