* feat(query): hyper log log counting in query engine
In addition to helping with normal queries, this can improve the 'SHOW CARDINALITY'
meta-queries:
time influx -database mydb -execute 'select count_hll(sum_hll(_seriesKey)) from big'
name: big
time count_hll
---- ---------
0 200767781
influx -database mydb -execute 0.06s user 0.12s system 0% cpu 8:49.99 total
* fix(query): Group By queries with offset that crosses a DST boundary can fail
Customer reported that a GROUP BY query with an offset that caused an interval
to cross a daylight savings change inserted an extra output row off by one hour.
This fix ensured that the start time for the interval of a GROUP BY operator is
correctly set before calculating the time zone offset for that date and time.
Add TestGroupByIterator_DST() in query/iterator_test.go
for regression testing of this bug.
Fixes https://github.com/influxdata/influxdb/issues/20238
When `top()` or `bottom()` were used and selected auxiliary values, they
would return the wrong values that would be equal to the last point
selected. This is because the aggregators saved the memory address of
the auxiliary fields instead of copying them over. Since the same
auxiliary fields memory location is used for every value returned by the
storage engine, this resulted in the values being incorrect because they
were overwritten with incorrect values.
This fixes that so the auxiliary fields are copied out when they are
saved rather than only the memory location.
This change makes it so that we simplify the math engine so it doesn't
use a complicated set of nested iterators. That way, we have to change
math in one fewer place.
It also greatly simplifies the query engine as now we can create the
necessary iterators, join them by time, name, and tags, and then use the
cursor interface to read them and use eval to compute the result. It
makes it so the auxiliary iterators and all of their complexity can be
removed.
This also makes use of the new eval functionality that was recently
added to the influxql package.
No math functions have been added, but the scaffolding has been included
so things like trigonometry functions are just a single commit away.
This also introduces a small breaking change. Because of the call
optimization, it is now possible to use the same selector multiple times
as a selector. So if you do this:
SELECT max(value) * 2, max(value) / 2 FROM cpu
This will now return the timestamp of the max value rather than zero
since this query is considered to have only a single selector rather
than multiple separate selectors. If any aspect of the selector is
different, such as different selector functions or different arguments,
it will consider the selectors to be aggregates like the old behavior.
Along with modifying ExecutionContext to be a context and have the
TaskManager return the context itself, this also creates a Monitor
interface and exposes the Monitor through the Context. This way, we can
access the monitor from within the query.Select method and keep all of
the limits inside of the query package instead of leaking them into the
statement executor.
An eventual goal is to remove the InterruptCh from the IteratorOptions
and use the Context instead, but for now, we'll just assign the done
channel from the Context to the IteratorOptions so at least they refer
to the same channel.
* Introduces EXPLAIN ANALYZE command, which
produces a detailed tree of operations used to
execute the query.
introduce context.Context to APIs
metrics package
* create groups of named measurements
* safe for concurrent access
tracing package
EXPLAIN ANALYZE implementation for OSS
Serialize EXPLAIN ANALYZE traces from remote nodes
use context.Background for tests
group with other stdlib packages
additional documentation and remove unused API
use influxdb/pkg/testing/assert
remove testify reference
This allows the query:
SELECT mean(value) FROM cpu GROUP BY time(1d)
To function in some way that makes sense. The upper limit is implicitly
the `now()` starting time and the lower limit will be whichever interval
the lowest point falls into.
When no lower bound is specified and `max-select-buckets` is specified,
the query will only consider points that would satisfy
`max-select-buckets`. So if you have one point written in 1970, have
another point within the last minute, and then do the above query with
`max-select-buckets` being equal to 10, the older point from 1970 will
not be considered.
Originally, casting was performed inside of the query engine especially
for call iterators. Currently, the engine takes care of all casting so
we just need to normalize the iterators types for type safety reasons
rather than actual functional reasons.
Removing this code. Cover coverage showed that it was not hit when run
against the actual server. I ran the tests package and got code coverage
of the query package while running the tests in that package.
This change provides a clear separation between the query engine
mechanics and the query language so that the language can be parsed and
dealt with separate from the query engine itself.