The following query was fixed previously:
SELECT 'value' FROM cpu
This ended up hitting the `buildExprIterator()` code path and was
handled properly. But this query:
SELECT 'value', value FROM cpu
This took a different code path that would trigger a panic because it
triggered a panic instead of an error condition. This code path has now
been modified to trigger an error instead of a panic.
Fixes#6248.
Fixes#6211.
In Go-land packages with the same name, e.g., internal, do not clash
with each other when they're in different parts of the project. However
with protobufs definitions will clash if they share the same package
name.
This commit renames the influxql protobuf package to `influxql` to
avoid a clash with a message definition in another protobuf package
called internal. Go package aliases allow us to continue to refer to the
internal package as `internal` rather than `influxql`.
The QueryExecutor had a lot of dead code made obsolete by the query
engine refactor that has now been removed. The TSDBStore interface has
also been cleaned up so we can have multiple implementations of this
(such as a local and remote version).
A StatementExecutor interface has been created for adding custom
functionality to the QueryExecutor that may not be available in the open
source version. The QueryExecutor delegate all statement execution to
the StatementExecutor and the QueryExecutor will only keep track of
housekeeping. Implementing additional queries is as simple as wrapping
the cluster.StatementExecutor struct or replacing it with something
completely different.
The PointsWriter in the QueryExecutor has been changed to a simple
interface that implements the one method needed by the query executor.
This is to allow different PointsWriter implementations to be used by
the QueryExecutor. It has also been moved into the StatementExecutor
instead.
The TSDBStore interface has now been modified to contain the code for
creating an IteratorCreator. This is so the underlying TSDBStore can
implement different ways of accessing the underlying shards rather than
always having to access each shard individually (such as batch
requests).
Remove the show servers handling. This isn't a valid command in the open
source version of InfluxDB anymore.
The QueryManager interface is now built into QueryExecutor and is no
longer necessary. The StatementExecutor and QueryExecutor split allows
task management to much more easily be built into QueryExecutor rather
than as a separate struct.
A bigger refactor of these functions is needed to support #3290, but
this will work for the more common case that someone uses double quotes
instead of single quotes when surrounding a time literal.
Fixes#3932.
This commit sets the `MergeIterator.init` flag after initialization.
Previously this would generate a new heap on every call to `Next()`
which caused some aggregate queries to slow by ~10,000%.
The MergeIterator creation function would call `peek()` on the iterator
to initialize the heap. Since this function can sometimes take a long
time (such as a huge aggregate query on a shard), the
`influxql.Select()` wouldn't return until the query had already been
completed.
The `influxql.Select()` call should be just the creation of the
iterators and shouldn't calculate anything. This is important for future
features like the point limiter that have to be initialized after the
`influxql.Select()` call.
The simple moving average will gradually emit points instead of waiting
until the end. This should apply to derivative and difference in the
future too.
Fixes#6112.
Related to #6140, but won't actually fix that problem. It will correctly
stop new queries from being started during shutdown and will send the
interrupt signal to queries during shutdown.
Since the interrupt signal is asynchronous, there isn't currently a way
to wait for the queries to complete themselves before shutting down the
engine.
The difference function is implemented very similar to how derivative is
implemented. It is an aggregate function that acts over the entire
aggregate. This function will also have the same problems that
derivative has with getting values from the previous interval or point.
This will be fixed separately as part of #5943.
Fixes#1825.
Allows configuration of shard group duration at database creation, and retention
policy create/alter time.
Query examples:
```
CREATE DATABASE testdb WITH DURATION 90d SHARD DURATION 30m NAME rp_testdb
CREATE RETENTION POLICY rp_testdb2 ON testdb DURATION INF REPLICATION 1 SHARD DURATION 30m
ALTER RETENTION POLICY rp_testdb2 ON testdb SHARD DURATION 1h
```
This can be useful with long duration retention policies with lots of data, where
you can split into smaller shards to relieve memory pressure.
This allows multiple semicolons in a row now and also requires that a
semicolon separate commands. The query specification says this is
required, but a boolean error in `ParseQuery` makes one semicolon
optional and multiple semicolons an error.
Fixes#5728.
This commit adds an `IteratorStats` that holds aggregate
iterator processing information. A method is also added to
`Iterator` to return the stats:
Stats() influxql.IteratorStats
The remote iterators will also emit their stats in the point
stream upon first connection, on a given interval, and then
finally once the last point has been sent.
Use of the iterator is spread out into both `IteratorCreators` and
inside of the iterators themselves. Part of the interrupt must be
handled inside of the engine so it stops trying to emit points when an
interrupt is found and another part of the interrupt has to happen when
combining the iterators so it doesn't just start reading the next shard.
While this allows a query to be killed, it doesn't really do anything
yet since the interrupt happens only after the first row gets emitted
(the entire first series).
This section of code will likely have to be refactored to make this work
since we need a way to interrupt a currently running iterator.
The currently running queries can be listed with the command
`SHOW QUERIES` and it will display the current commands that have been
run, the database they were run against, and how long they have been
running.
Numbers in the query without any decimal will now be emitted as integers
instead and be parsed as an IntegerLiteral. This ensures we keep the
original context that a query was issued with and allows us to act more
similar to how programming languages are typically structured when it
comes to floats and ints.
This adds functionality for dealing with integers promoting to floats in
the various different places where math are used.
Fixes#5744 and #5629.
Normalize the time for the distinct() call to either be at the beginning
of the group by interval or the start time similar to every other call.
The timestamp previously just showed the first time found and didn't
make a lot of sense in the context of what the function was supposed to
do.
Fixes#6040.
Internal system series start with an underscore prefix but
restricting this prevents users who already use an underscore
prefix in their series names.
Fixes#5870
This commit moves the `tsdb.Store.ExpandSources()` function onto
the `influxql.IteratorCreator` and provides support for issuing
source expansion across a cluster.
Now the AuxIterator will know when it is backgrounded so that it can
stop reading from the primary iterator when all of the child iterators
have been closed.
This commit moves the `tsdb.Store.ExpandSources()` function onto
the `influxql.IteratorCreator` and provides support for issuing
source expansion across a cluster.
The primary input iterator for an aux iterator would continue trying to
send points to a closed channel even after an aux iterator had already
been closed.
This changes the aux iterators to use sync.Cond instead of channels and
lower level syncing primitives for handling buffered input/output.
Fixes#5974.
Also fixes derivative calls with an aggregate function to require a
group by interval. The call without a group by interval doesn't make
sense as it will never return anything since it will always have one
point.
Fixes#5968.
`SHOW TAG VALUES` output has been modified to print the measurement name
for every measurement and to return the output in two columns: key and
value. An example output might be:
> SHOW TAG VALUES WITH KEY IN (host, region)
name: cpu
---------
key value
host server01
region useast
name: mem
---------
key value
host server02
region useast
`measurementsByExpr` has been taught how to handle reserved keys (ones
with an underscore at the beginning) to allow reusing that function and
skipping over expressions that don't matter to the call.
Fixes#5593.
Previously the call iterator would normalize the time to the interval
for all calls. This meant that when `first()` or `last()` was called
with no group by interval the value would be found for each shard, the
time was normalized, then it tried to find the value between the shards
(but no longer with any time data as that had already been eliminated).
This removes part of the time logic from the call iterators and makes a
new iterator `IntervalIterator` to normalize the times as they come out
of the underlying iterator.
Fixes#5890.
All three of these iterators are supposed to support all four types of
iterators, but the implementation was never done for string or boolean.
Fixes#5886.
This refactor is primarily to support Kapacitor. Kapacitor doesn't care
about the iterators and mostly keeps the points it handles in memory.
The iterator interface is more than Kapacitor cares about.
This commit refactors and opens up the internals of aggregating and
reducing incoming points so it can be used by an outside library with
the same code. It also makes the iterators used by the call iterators
publically usable with new functionality.
Reducers are split into two methods which are separate interfaces that
can be combined for dealing with casting between different types. The
Aggregator interfaces accept points into the aggregator and retain any
internal state they need. The Emitter interface will then create a point
from that aggregated state which can be fed to the iterator. The
Emitters do not fill in the name or tag of the point as that is expected
to be done by the person aggregating the point. While the Emitters do
sometimes fill in the time, that value will also be overwritten by the
iterator. Filling in the time is to allow a future version that will
allow returning the point time instead of just the interval time.
The limit iterator would short circuit if there were no dimensions and
all points had been read. It also needs to consider that multiple
sources will require reading the entire iterator too, so the short
circuit requires only a single source.
Fixes#5871.
The RPC handler for remote queries would attempt to reuse a closed
connection for certain commands that didn't use pooling. The RPC
commands that close the connection have been fixed to not try reusing
the connection.
When creating an iterator, if there are no points to return, the points
decoder would hit an EOF that it didn't catch and would return that
error back to the client who made the request. It now properly returns
no points by using a `nilFloatIterator` if there are no points to
return.
This fixes remote execution when a cluster has nothing to return.
The dimensions array in `RewriteWildcards` gets emptied by an earlier
section of the code and then tries to iterate over that empty slice to
append it to the list of dimensions.
That makes the loop dead code that can't ever be hit.
Also improve the efficiency of this method by not creating a new slice
when there are no wildcards. We already check at the beginning of the
function if there is a wildcard out of necessity. There's no point in
making a new slice and copying the contents if we know that there will
be no wildcards to expand.
It also improves memory efficiency by assuming that if a wildcard
exists, there is only one and the pre-allocated slice can take advantage
of that. If there are multiple wildcards, then a new slice will have to
be created in the middle of the loop to raise the capacity.
When a wildcard is specified for the field but not the dimensions, the
dimensions get added to the list of fields as part of
`RewriteWildcards()`.
But when a dimension was given with no wildcard, the dimension didn't
get removed from the wildcard in the fields section. This teaches the
rewriter to disclude dimensions explicitly included from being expanded
as a field. Now this statement when a measurement has one tag named host
and a field named value:
SELECT * FROM cpu GROUP BY host
Would expand to this:
SELECT value FROM cpu GROUP BY host
Instead of this:
SELECT host, value FROM cpu GROUP BY host
If you want the latter behavior, you can include it like this:
SELECT host, * FROM cpu GROUP BY host
Fixes#5770.
The name of the column will be every measurement located inside of the
math expression in the order they are encountered in within the
expression.
Also handle `*influxql.ParenExpr` in the function
`(*influxql.Field).Name()`
Fixes#5730.
A new attribute has been added to points to track how many points were
used to calculate that point. This is particularly useful for finding
the mean as we can then split mean calculation into two phases: one at
the shard level and a second at the shards level.
This optimization is now used so we don't have to hold so many points in
memory while calculating the mean.
The select call and the query executor would both calculate the time
range, but in separate ways. The query executor needed some way to pass
in the implicit end time that is placed there by the query executor.
Fixes#5636.