Along with modifying ExecutionContext to be a context and have the
TaskManager return the context itself, this also creates a Monitor
interface and exposes the Monitor through the Context. This way, we can
access the monitor from within the query.Select method and keep all of
the limits inside of the query package instead of leaking them into the
statement executor.
An eventual goal is to remove the InterruptCh from the IteratorOptions
and use the Context instead, but for now, we'll just assign the done
channel from the Context to the IteratorOptions so at least they refer
to the same channel.
Remove the `Query` prefix from some structs and interfaces. They were
there so when the query engine was in the same package as influxql,
these would be differentiated. Now that the package name is query, the
extra prefix seems redundant.
The implicit time range for an interval is supposed to be now when no
end is specified. In a subquery though, the interval doesn't exist and
so it doesn't set the end time to now, but to the max time. Since the
subquery qualifies as something that should have the implicit end time
apply, this results in a query that runs slowly because it is filling in
a bunch of unasked for intervals if a fill is specified.
This hack adds the implicit end time if it sees the parent query's end
time is set to the maximum available time.
This is a temporary fix for this problem. The query compilation should
perform these time range calculations in the compilation stage and the
subqueries should use the compilation stage during execution instead of
ignoring it. That work takes a lot more effort though and is more prone
to running into unforeseen bugs.
This fix introduces a subtle, but likely rare to run into bug. If the
top level query specifies the maximum time as the end time and the
subquery has an interval, the subquery should use the end time rather
than now as the time range. With this hack, it will interpret it as an
implicit time rather than an explicit one. This is unlikely to matter
though.
There is a strange race condition where a query can be killed and finish
at approximately the same time. If this happens, the query gets
retrieved by the killing task, the query gets closed by the normal
processing thread, and then the killing task attempts to kill the query
afterwards. Since the close doesn't mark the query as already killed
(since it's not killed, just merely unused), the killing thread attempts
to close the channel again.
Mark the query as killed whenever it is closed to prevent a double close
from happening. This should never cause the status to be erroneously
reported since the query status is removed from the query table within
the same lock scope.
When refactoring the query engine, I thought calling
`count(distinct(value))` multiple times was disallowed and so the
refactor made it so that wasn't possible.
It turns out that this pattern is allowed because since the distinct is
nested, it is aggregated anyway and can be combined with other
aggregates.
This removes the erroneously placed restriction.
If the close happens when next is being called, it can result in a race
condition where the current iterator gets set to nil after the initial
check.
This also fixes the finalizer so it runs the close method in a goroutine
instead of running it by itself. This is because all finalizers run on
the same goroutine so a close that takes a long time can cause a backup
for all finalizers. This also removes the redundant call to
`runtime.SetFinalizer` from the finalizer itself because a finalizer,
when called, has already cleared itself.
The previous sha was taken from a revision on a devel branch that I
thought would continue staying in the tree after it was merged. That
revision was rebased away and the API was changed for the logger.
This updates the usage of the logger and adds a simple package for
constructing the base logger.
The 1.0 version of zap changed the format of the default console logger
so this change moves over to this new logger instead of attempting to
retain backwards compatibility with the old format.
This commit adds time support to SHOW TAG VALUES. Time can be used as
both a lower and upper boundary. However, there are some caveats.
For the `inmem` index, filtering by time will still return all results
because the index data is shared across shards.
For the `tsi1` index, filtering by time will only work down to the shard
lever. Specifically, when querying by time all shards within that time
range will be used to generate the results.
When a meta query does not include a time component then it can be
answered exclusively by the index. This should result in a much faster
query execution that if the TSM engine was engaged.
This commit rewrites the following queries such that they make use
of the index where no time component is present:
- SHOW MEASUREMENTS
- SHOW SERIES
- SHOW TAG KEYS
- SHOW FIELD KEYS
This prevents a channel from being closed twice when the task manager is
closed. That same check existed to make sure a panic didn't happen when
detaching a killed query, but the check was forgotten when closing the
task manager.
This also adds a new error when attempting to kill an already killed
query.
* Introduces EXPLAIN ANALYZE command, which
produces a detailed tree of operations used to
execute the query.
introduce context.Context to APIs
metrics package
* create groups of named measurements
* safe for concurrent access
tracing package
EXPLAIN ANALYZE implementation for OSS
Serialize EXPLAIN ANALYZE traces from remote nodes
use context.Background for tests
group with other stdlib packages
additional documentation and remove unused API
use influxdb/pkg/testing/assert
remove testify reference
This allows the query:
SELECT mean(value) FROM cpu GROUP BY time(1d)
To function in some way that makes sense. The upper limit is implicitly
the `now()` starting time and the lower limit will be whichever interval
the lowest point falls into.
When no lower bound is specified and `max-select-buckets` is specified,
the query will only consider points that would satisfy
`max-select-buckets`. So if you have one point written in 1970, have
another point within the last minute, and then do the above query with
`max-select-buckets` being equal to 10, the older point from 1970 will
not be considered.
Field math works similar to condition evaluation, but not the exact same
because we have more information to work with in field expressions than
we do in conditional math because fields retain the information about
their source while conditions do not.
The main difference is that you cannot add an unsigned literal to the
output of an integer iterator while you can inside of a condition. You
can perform math on a positive integer literal to an unsigned iterator.
Inside of the condition, we aren't sure if an integer is because of a
literal or because of an iterator so we can't make that distinction.
This more accurately shows whether or not a query has been killed.
Instead of automatically removing it from the query table when it's
killed even though goroutines and iterators may still be open, it now
marks the process as killed. This should allow us to more accurately
determine if a query has been stalled and is still using resources on
the server.
This is related to #8848, but not directly connected.
Originally, casting was performed inside of the query engine especially
for call iterators. Currently, the engine takes care of all casting so
we just need to normalize the iterators types for type safety reasons
rather than actual functional reasons.
Removing this code. Cover coverage showed that it was not hit when run
against the actual server. I ran the tests package and got code coverage
of the query package while running the tests in that package.
All time ranges are combined with AND regardless of context and
regardless of whether it makes any logical sense.
That was the previous behavior and, unfortunately, a lot of people rely
on it.
It prints the statistics of each iterator that will access the storage
engine. For each access of the storage engine, it will print the number
of shards that will potentially be accessed, the number of files that
may be accessed, the number of series that will be created, the number
of blocks, and the size of those blocks.