The previous version of `top()` and `bottom()` would gather all of the
points to use in a slice, filter them (if necessary), then use a
slightly modified heap sort to retrieve the top or bottom values.
This performed horrendously from the standpoint of memory. Since it
consumed so much memory and spent so much time in allocations (along
with sorting a potentially very large slice), this affected speed too.
These calls have now been modified so they keep the top or bottom points
in a min or max heap. For `top()`, a new point will read the minimum
value from the heap. If the new point is greater than the minimum point,
it will replace the minimum point and fix the heap with the new value.
If the new point is smaller, it discards that point. For `bottom()`, the
process is the opposite.
It will then sort the final result to ensure the correct ordering of the
selected points.
When `top()` or `bottom()` contain a tag to select, they have now been
modified so this query:
SELECT top(value, host, 2) FROM cpu
Essentially becomes this query:
SELECT top(value, 2), host FROM (
SELECT max(value) FROM cpu GROUP BY host
)
This should drastically increase the performance of all `top()` and
`bottom()` queries.
`top()` and `bottom()` will now organize the points by time and also
keep the points original time even when a time grouping is used. At the
same time, `top()` and `bottom()` will no longer honor any fill options
that are present since they don't really make sense for these specific
functions.
This also fixes the aggregate and selectors to honor the ordered
iterator option so iterator remain ordered and to also respect the
buckets that are created by the final dimensions of the query so that
two buckets don't overlap each other within the same reducer. A test has
been added for this situation. This should clarify and encourage the use
of the ordered attribute within the query engine.
First Pass at implementing sample
Add sample iterators for all types
Remove size from sample struct
Fix off by one error when generating random number
Add benchmarks for sample iterator
Add test and associated fixes for off by one error
Add test for sample function
Remove NumericLiteral from sample function call
Make clear that the counter is incr w/ each call
Rename IsRandom to AllSamplesSeen
Add a rng for each reducer that is created
The default rng that comes with math/rand has a global lock. To avoid
having to worry about any contention on the lock, each reducer now has
its own time seeded rng.
Add sample function to changelog
This also switches the remaining iterators to be lazy so they can return
errors properly. They needed to be converted to lazy initialization
anyway, which has the side effect of making it much easier for us to
propagate the underlying error during initialization.
Updated the Emitter to return errors when it cannot read properly from
the iterators.
This commit makes a number of performance improvements to
reduce allocations during query execution. Several objects
and buffers are now reused across the components to avoid
allocations.
Previously a simple `count(value)` query across 1M points
would require 26,000+ allocations. After the changes in
this commit that number has been reduced to 88.
Use of the iterator is spread out into both `IteratorCreators` and
inside of the iterators themselves. Part of the interrupt must be
handled inside of the engine so it stops trying to emit points when an
interrupt is found and another part of the interrupt has to happen when
combining the iterators so it doesn't just start reading the next shard.
Previously the call iterator would normalize the time to the interval
for all calls. This meant that when `first()` or `last()` was called
with no group by interval the value would be found for each shard, the
time was normalized, then it tried to find the value between the shards
(but no longer with any time data as that had already been eliminated).
This removes part of the time logic from the call iterators and makes a
new iterator `IntervalIterator` to normalize the times as they come out
of the underlying iterator.
Fixes#5890.
All three of these iterators are supposed to support all four types of
iterators, but the implementation was never done for string or boolean.
Fixes#5886.
A new attribute has been added to points to track how many points were
used to calculate that point. This is particularly useful for finding
the mean as we can then split mean calculation into two phases: one at
the shard level and a second at the shards level.
This optimization is now used so we don't have to hold so many points in
memory while calculating the mean.
Previously reduce iterators just separated points by tags. If you had
identical tags but different names, it would group those together so you
could have these two points:
cpu value=1
mem value=2
When you performed a `mean(value)` call and included both cpu and mem as
sources, it would return one mem series with a value of 1.5 instead of
two serieses.
last() would always return the last output of the iterator (which isn't
necessarily the last time value due to how the merge iterator works) and
first() would always return the first output of the iterator (wrong for
the same reason).
Now the time is kept by the reduce function and the times are wiped as
part of the reduce iterator after the value has been found.