The `cumulative_sum()` function can be used to sum each new point and
output the current total. For the following points:
cpu value=2 0
cpu value=4 10
cpu value=6 20
This would output the following points:
> SELECT cumulative_sum(value) FROM cpu
time value
---- -----
0 2
10 6
20 12
As can be seen, each new point adds to the sum of the previous point and
outputs the value with the same timestamp.
The function can also be used with an aggregate like `derivative()`.
> SELECT cumulative_sum(mean(value) FROM cpu WHERE time >= now() - 10m GROUP BY time(1m)
The derivative() call would panic if it received two points at the same
time because it tried to divide by zero. The derivative call now skips
past these points. To avoid skipping past these points, use `GROUP BY *`
so that each series is kept separated into their own series.
The difference() call has also been modified to skip past these points.
Even though difference doesn't divide by the time, difference is
supposed to perform the same as derivative, but without dividing by the
time.
The derivative function had an arbitrary limitation that would cause it
to set the value to zero if the previous value was after the next value.
This caused all `ORDER BY desc` queries with `derivative()` to always
return zero values.
Fixes#4675.
For aggregate queries, derivatives will now alter the start time to one
interval behind and will use that interval to find the derivative of the
first point instead of giving no value for that interval. Null values
will still be discarded so if the interval before the one you are
querying is null, then it will be discarded like if it were in the
middle of the query. You can use `fill(0)` to fill in these values.
This does not apply to raw queries yet.
Also modified the derivative and difference aggregates to use the stream
iterator instead of the reduce slice iterator for space efficiency.
Fixes#3247. Contributes to #5943.
The simple moving average will gradually emit points instead of waiting
until the end. This should apply to derivative and difference in the
future too.
Fixes#6112.
Previously the call iterator would normalize the time to the interval
for all calls. This meant that when `first()` or `last()` was called
with no group by interval the value would be found for each shard, the
time was normalized, then it tried to find the value between the shards
(but no longer with any time data as that had already been eliminated).
This removes part of the time logic from the call iterators and makes a
new iterator `IntervalIterator` to normalize the times as they come out
of the underlying iterator.
Fixes#5890.
This refactor is primarily to support Kapacitor. Kapacitor doesn't care
about the iterators and mostly keeps the points it handles in memory.
The iterator interface is more than Kapacitor cares about.
This commit refactors and opens up the internals of aggregating and
reducing incoming points so it can be used by an outside library with
the same code. It also makes the iterators used by the call iterators
publically usable with new functionality.
Reducers are split into two methods which are separate interfaces that
can be combined for dealing with casting between different types. The
Aggregator interfaces accept points into the aggregator and retain any
internal state they need. The Emitter interface will then create a point
from that aggregated state which can be fed to the iterator. The
Emitters do not fill in the name or tag of the point as that is expected
to be done by the person aggregating the point. While the Emitters do
sometimes fill in the time, that value will also be overwritten by the
iterator. Filling in the time is to allow a future version that will
allow returning the point time instead of just the interval time.
Since we set up the aggregate queries so that iterator is ordered,
we only need to look at the first value. This cuts about 10 seconds
off a large single series query running first.
If 2 or more points during this map-and-reduce share the same timestamp,
the tie is broken by looking at the value. This ensures that these
functions operate in a deterministic manner.
This solution due to @jwilder
This change moves tracking of next timestamp and values to simple
slices, as performance measurement showed that Peek() on TagSet cursors
was a huge performance drain. There is much more that can be done here,
but with this in place query performance has been restored to 0.9.1
levels.
This change also uses -1 to indicate that no value is available for a
given timestamp.