Commit Graph

52 Commits (2259ada8c39fa594c7f26a92e4a4a430f0687c61)

Author SHA1 Message Date
Jonathan A. Sternberg 9edf236cc8 Maintain the tags of points selected by top() or bottom() when writing the results
When a `SELECT ... INTO ...` is used with `top()` or `bottom()` used
with tags, the points will be written with the tags still intact instead
of converted to fields.
2017-05-23 15:00:21 -05:00
Jonathan A. Sternberg 7b9b55bfc0 Optimize top() and bottom() using an incremental aggregator
The previous version of `top()` and `bottom()` would gather all of the
points to use in a slice, filter them (if necessary), then use a
slightly modified heap sort to retrieve the top or bottom values.

This performed horrendously from the standpoint of memory. Since it
consumed so much memory and spent so much time in allocations (along
with sorting a potentially very large slice), this affected speed too.

These calls have now been modified so they keep the top or bottom points
in a min or max heap. For `top()`, a new point will read the minimum
value from the heap. If the new point is greater than the minimum point,
it will replace the minimum point and fix the heap with the new value.
If the new point is smaller, it discards that point. For `bottom()`, the
process is the opposite.

It will then sort the final result to ensure the correct ordering of the
selected points.

When `top()` or `bottom()` contain a tag to select, they have now been
modified so this query:

    SELECT top(value, host, 2) FROM cpu

Essentially becomes this query:

    SELECT top(value, 2), host FROM (
        SELECT max(value) FROM cpu GROUP BY host
    )

This should drastically increase the performance of all `top()` and
`bottom()` queries.
2017-05-19 11:56:46 -05:00
Jonathan A. Sternberg be3bce5212 top() and bottom() now returns the time for every point
`top()` and `bottom()` will now organize the points by time and also
keep the points original time even when a time grouping is used. At the
same time, `top()` and `bottom()` will no longer honor any fill options
that are present since they don't really make sense for these specific
functions.

This also fixes the aggregate and selectors to honor the ordered
iterator option so iterator remain ordered and to also respect the
buckets that are created by the final dimensions of the query so that
two buckets don't overlap each other within the same reducer. A test has
been added for this situation. This should clarify and encourage the use
of the ordered attribute within the query engine.
2017-04-26 15:07:10 -05:00
zhexuany 232fdae6dd introduce a new function non_negative_difference 2017-03-31 23:08:36 +08:00
Jonathan A. Sternberg 2ea805c928 Interpolate between different intervals to find the whole area under the curve 2017-03-30 12:51:52 -05:00
Tom Young cac94a1fc7 Add "integral" function to InfluxQL 2017-03-30 12:07:26 -05:00
Jonathan A. Sternberg d7c8c7ca4f Support subquery execution in the query language
This adds query syntax support for subqueries and adds support to the
query engine to execute queries on subqueries.

Subqueries act as a source for another query. It is the equivalent of
writing the results of a query to a temporary database, executing
a query on that temporary database, and then deleting the database
(except this is all performed in-memory).

The syntax is like this:

    SELECT sum(derivative) FROM (SELECT derivative(mean(value)) FROM cpu GROUP BY *)

This will execute derivative and then sum the result of those derivatives.
Another example:

    SELECT max(min) FROM (SELECT min(value) FROM cpu GROUP BY host)

This would let you find the maximum minimum value of each host.

There is complete freedom to mix subqueries with auxiliary fields. The only
caveat is that the following two queries:

    SELECT mean(value) FROM cpu
    SELECT mean(value) FROM (SELECT value FROM cpu)

Have different performance characteristics. The first will calculate
`mean(value)` at the shard level and will be faster, especially when it comes to
clustered setups. The second will process the mean at the top level and will not
include that optimization.
2017-01-07 13:00:48 -06:00
Mark Rushakoff 88b8bd2465 Update godoc for package influxql
I did not look at any of the .gen.go files.
2016-12-30 18:02:52 -08:00
Jonathan A. Sternberg bffc759cf9 Return the time from a percentile call on an integer
`percentile()` is supposed to be a selector and return the time of the
point, but that only got changed when the input was a float. Updating
the integer processor to also return the time of the point rather than
the beginning of the interval.
2016-12-01 12:34:48 -06:00
Jonathan A. Sternberg e885fe5117 Expand string and boolean fields when using a wildcard with sample() 2016-11-15 15:56:47 -06:00
Jonathan A. Sternberg 95859b8ab4 Remove accidentally added string support for the stddev call
Strings would always return an empty string and stddev is meaningless
when it comes to strings. This removes that functionality so strings
don't automatically get picked up when using a wildcard.
2016-10-10 14:58:28 -05:00
Jonathan A. Sternberg 6afc2a77a5 Implement cumulative_sum() function
The `cumulative_sum()` function can be used to sum each new point and
output the current total. For the following points:

    cpu value=2 0
    cpu value=4 10
    cpu value=6 20

This would output the following points:

    > SELECT cumulative_sum(value) FROM cpu
    time    value
    ----    -----
    0       2
    10      6
    20      12

As can be seen, each new point adds to the sum of the previous point and
outputs the value with the same timestamp.

The function can also be used with an aggregate like `derivative()`.

    > SELECT cumulative_sum(mean(value) FROM cpu WHERE time >= now() - 10m GROUP BY time(1m)
2016-10-07 10:11:53 -05:00
Michael Desa f9b8129770 Add sample function to query language
First Pass at implementing sample

Add sample iterators for all types

Remove size from sample struct

Fix off by one error when generating random number

Add benchmarks for sample iterator

Add test and associated fixes for off by one error

Add test for sample function

Remove NumericLiteral from sample function call

Make clear that the counter is incr w/ each call

Rename IsRandom to AllSamplesSeen

Add a rng for each reducer that is created

The default rng that comes with math/rand has a global lock. To avoid
having to worry about any contention on the lock, each reducer now has
its own time seeded rng.

Add sample function to changelog
2016-10-06 09:41:42 -07:00
Jonathan A. Sternberg 993ac1ca2e Remove confusing comment and unnecessary continue 2016-08-23 19:43:18 -05:00
Ashish Gaurav 4e17f9bb13 add mode() function & tests 2016-08-23 19:31:41 -05:00
Ashish Gaurav 70c8c021ac added benchmark tests for median aggrergator (Package: influxql,influxql_test) 2016-08-04 08:02:19 +05:30
Nathaniel Cook ce74fe0b06 count and sum return 0 for empty intervals 2016-06-01 15:53:23 -06:00
Nathaniel Cook 6ed0d94343 Add Holt-Winters forecasting method. 2016-05-19 09:24:56 -06:00
Jonathan A. Sternberg a05e2b164e Support booleans for min() and max()
Fixes #6494.
2016-04-29 14:56:22 -04:00
Nathaniel Cook 465f5a375f add elapsed function 2016-04-19 12:54:54 -06:00
Jonathan A. Sternberg 86046bb2d0 Implement derivatives across intervals for aggregate queries
For aggregate queries, derivatives will now alter the start time to one
interval behind and will use that interval to find the derivative of the
first point instead of giving no value for that interval. Null values
will still be discarded so if the interval before the one you are
querying is null, then it will be discarded like if it were in the
middle of the query. You can use `fill(0)` to fill in these values.

This does not apply to raw queries yet.

Also modified the derivative and difference aggregates to use the stream
iterator instead of the reduce slice iterator for space efficiency.

Fixes #3247. Contributes to #5943.
2016-04-15 18:16:08 -04:00
Jonathan A. Sternberg 66a599825b Allow percentile to be used as a selector
Fixes #6292.
2016-04-13 13:29:14 -04:00
Jonathan A. Sternberg 50bd78433c Merge pull request #6291 from influxdata/js-6261-optimize-distinct
Optimize the distinct call
2016-04-12 17:09:10 -04:00
Nathaniel Cook 6ae62e9644 update Percentile to preserve Aux fields since its a selector 2016-04-12 13:34:50 -06:00
Jonathan A. Sternberg 6708d0c439 Optimize the distinct call
Change distinct so it uses a custom reducer that keeps internal state
instead of requiring all of the points to be kept as a slice in memory.

Fixes #6261.
2016-04-11 18:29:50 -04:00
Jonathan A. Sternberg 6453dbc249 Implement simple moving average
The simple moving average will gradually emit points instead of waiting
until the end. This should apply to derivative and difference in the
future too.

Fixes #6112.
2016-03-29 14:36:43 -04:00
Jonathan A. Sternberg a9720f926e Implement the difference function
The difference function is implemented very similar to how derivative is
implemented. It is an aggregate function that acts over the entire
aggregate. This function will also have the same problems that
derivative has with getting values from the previous interval or point.
This will be fixed separately as part of #5943.

Fixes #1825.
2016-03-29 09:27:12 -04:00
Jonathan A. Sternberg 43a5e84aaf Merge pull request #6047 from influxdata/js-6040-boolean-distinct
Support the distinct() call for booleans
2016-03-17 17:17:21 -04:00
Jonathan A. Sternberg e47426ff6e Support integer literals in the query language
Numbers in the query without any decimal will now be emitted as integers
instead and be parsed as an IntegerLiteral. This ensures we keep the
original context that a query was issued with and allows us to act more
similar to how programming languages are typically structured when it
comes to floats and ints.

This adds functionality for dealing with integers promoting to floats in
the various different places where math are used.

Fixes #5744 and #5629.
2016-03-17 10:37:34 -04:00
Jonathan A. Sternberg 2e7816ebd9 Support the distinct() call for booleans
Normalize the time for the distinct() call to either be at the beginning
of the group by interval or the start time similar to every other call.
The timestamp previously just showed the first time found and didn't
make a lot of sense in the context of what the function was supposed to
do.

Fixes #6040.
2016-03-17 09:32:54 -04:00
Nathaniel Cook 4961a4435b Fix nil comparison for top/bottom 2016-03-07 15:21:22 -07:00
Nathaniel Cook 46fc6e5516 Expose Reduce Functions for Kapacitor 2016-03-07 14:03:14 -07:00
Jonathan A. Sternberg 9c5bc8ab2b Refactor reduce slice func to use the aggregator and emitter 2016-03-07 13:25:45 -05:00
Jonathan A. Sternberg 8d89a203a2 Fix sorting for distinct by sorting by value when the point time is the same 2016-03-03 19:09:38 -05:00
Jonathan A. Sternberg e3660fae93 Support all iterator types for count(), first(), and last()
All three of these iterators are supposed to support all four types of
iterators, but the implementation was never done for string or boolean.

Fixes #5886.
2016-03-02 23:49:55 -05:00
Jonathan A. Sternberg 1c543b28a9 Refactored call iterators to make them public and more usable as a library
This refactor is primarily to support Kapacitor. Kapacitor doesn't care
about the iterators and mostly keeps the points it handles in memory.
The iterator interface is more than Kapacitor cares about.

This commit refactors and opens up the internals of aggregating and
reducing incoming points so it can be used by an outside library with
the same code. It also makes the iterators used by the call iterators
publically usable with new functionality.

Reducers are split into two methods which are separate interfaces that
can be combined for dealing with casting between different types. The
Aggregator interfaces accept points into the aggregator and retain any
internal state they need. The Emitter interface will then create a point
from that aggregated state which can be fed to the iterator. The
Emitters do not fill in the name or tag of the point as that is expected
to be done by the person aggregating the point. While the Emitters do
sometimes fill in the time, that value will also be overwritten by the
iterator. Filling in the time is to allow a future version that will
allow returning the point time instead of just the interval time.
2016-03-02 16:10:49 -05:00
Jonathan A. Sternberg d11bc6182c Improve mean accuracy while retaining the speedup with a custom iterator
Fixes #5852.
2016-03-02 14:48:11 -05:00
Jonathan A. Sternberg 7a03df2af1 Remove the non-unreachable panics in the new query engine
The only panics left are ones that should be unreachable unless there is
a bug.

Fixes #5777.
2016-02-22 12:52:43 -05:00
Jonathan A. Sternberg 18c7c554ba Optimize the mean() call by moving the calculation into the shard iterator
A new attribute has been added to points to track how many points were
used to calculate that point. This is particularly useful for finding
the mean as we can then split mean calculation into two phases: one at
the shard level and a second at the shards level.

This optimization is now used so we don't have to hold so many points in
memory while calculating the mean.
2016-02-16 10:32:34 -05:00
Jonathan A. Sternberg 42b9166000 Support derivative() call for integer fields in the new query engine
Fixes #5640.
2016-02-12 11:36:59 -05:00
Sergei Egorov eef0e41a7e Optimize ReducePercentile method: do not call len() twice + move sorting after index check 2016-02-11 20:05:34 +02:00
Ben Johnson 607750ab1b add SHOW MEASUREMENTS iterator 2016-02-10 09:40:28 -07:00
Jonathan A. Sternberg dbb9b36d84 Support integers with top() and bottom() and fix point ordering
top() and bottom() point ordering was incorrect and using an inefficient
method of sorting. It has now been updated to use a heap and ordering is
being done by value first and time second (with earlier times always
taking priority).

Removed unit tests that test using `time` inside of the query to get the
real time instead of the interval time and only allowing the default
behavior. We will have another mechanism to get the real time during an
interval, but the current method is deprecated.

The top() and bottom() methods now have integer support.
2016-02-10 09:40:27 -07:00
Ben Johnson a0fe0ca437 fix new query engine test regressions 2016-02-10 09:40:27 -07:00
Jonathan A. Sternberg 76b49b3ab3 Fixed a bug in first() and last() where the time was lost
last() would always return the last output of the iterator (which isn't
necessarily the last time value due to how the merge iterator works) and
first() would always return the first output of the iterator (wrong for
the same reason).

Now the time is kept by the reduce function and the times are wiped as
part of the reduce iterator after the value has been found.
2016-02-10 09:40:26 -07:00
Jonathan A. Sternberg 03ad7a4e40 Move the integerReduceSliceFloatIterator to call_iterator.go
It matches more in functionality to the functions in call_iterator.go
than iterator.go. iterator.go mostly has base iterators and
call_iterator.go has iterators related to functional calls, which is
the only time integerReduceSliceFloatIterator is used.
2016-02-10 09:40:26 -07:00
Ben Johnson b8918a780c integer support 2016-02-10 09:40:25 -07:00
Jonathan A. Sternberg 97752df03d Repair regressions in derivatives with group by tests
Also fixes the `first()` and `last()` calls to do the same thing as
`min()` and `max()` by returning the time corresponding to the start of
the interval rather than the point's real time.
2016-02-10 09:40:25 -07:00
Jonathan A. Sternberg e0ac29dd2d Implement most of top() and bottom()
This does not implement the time selector, but everything else is
implemented. Unfortunately, there are no tests for bottom() in the old
query engine, so only top() is properly tested.
2016-02-10 09:40:25 -07:00
Ben Johnson 00806de9b8 refactor query engine 2016-02-10 09:40:25 -07:00