The stream iterator would ignore an error that happened when reading
points. This may have caused it to potentially return an error that got
ignored and then to try invoking `Next()` on an iterator in an invalid
state and that iterator would then actually return a point which it
wasn't supposed to.
Also added some defensive coding to that same call to prevent a nil map
from being assigned to in the event of an invalid iterator returning
junk data.
The following would, erroneously, not strip the tag from the inner
query:
SELECT value FROM (SELECT value FROM cpu GROUP BY host)
The inner query was supposed to group by the host tag, but the outer
query should strip it away since it is not being grouped by anymore.
This fixes things so that the result will have the tags stripped away
when they are not requested in the grouping.
It has previously been allowed for a subquery to use a tag within a
function (such as `count()`) when the tag is from a subquery and the
subquery itself references a field at some point to perform the join.
This functionality regressed in 1.6 because of a change in how
subqueries were executed that forgot to treat a tag the same as a string
field.
This fixes that regression and adds a test case to avoid hitting that
regression again.
The derivative function and others similar to it would preload
themselves with data so that the first interval would be the start of
the time range. That meant reading data outside of the time range.
One change to the shard mapper back in v1.4.0 caused the shard mapper to
constrict queries to the intervals given to the shard mapper. This was
correct because the shard mapper can only deal with times it has mapped,
but this broke the functionality of looking back into the past for
the derivative and other functions that used that functionality.
The compiler has been updated with an additional attribute that records
how many intervals in the past will need to be read so that the shard
mapper can include extra times that it may not necessarily read from,
but may be queried because of the above described functionality.
The compiler was too strict. The inherited interval from an outer query
should not have caused the inner query to fail because inherited
intervals are only implicitly passed to inner queries that support group
by time functionality. Since the inner query with a derivative doesn't
support grouping by time and the inner query itself doesn't specify a
time, the outer query shouldn't have invalidated the inner query.
The passed in argument wasn't correct so it always returned null instead
of the appropriate value.
Also includes unit tests for all of the math functions and restricts the
`asin()` and `acos()` functions to floats only since those functions
don't give any meaningful results when using integers or unsigned.
When `top()` or `bottom()` were used and selected auxiliary values, they
would return the wrong values that would be equal to the last point
selected. This is because the aggregators saved the memory address of
the auxiliary fields instead of copying them over. Since the same
auxiliary fields memory location is used for every value returned by the
storage engine, this resulted in the values being incorrect because they
were overwritten with incorrect values.
This fixes that so the auxiliary fields are copied out when they are
saved rather than only the memory location.
Previously, the task manager was modified to keep the query status so it
could track which queries were running and which ones were killed.
In those previous versions, we removed a task from the process table
as soon as it was killed and did not remove it after it had finished
executing. This meant there could be zombie goroutines running in the
background that were impossible to see.
When the task manager was updated to track the task status, we forgot to
expose the status in the public interface so consumers could see the
task status.
This adds numerous technical analysis algorithms:
* exponential_moving_average
* double_exponential_moving_average
* triple_exponential_moving_average
* relative_strength_index
* triple_exponential_average
* kaufmans_efficiency_ratio (commonly referred to as just "Efficiency Ratio")
* kaufmans_adaptive_moving_average
* chande_momentum_oscillator (both the common 'smoothed' version, and the ta-lib version)
This also fixes the cursor system to abandon iterators that will not
produce meaningful results since the variables are all unknown types.
This creates a weird behavior that existed in previous releases and we
are keeping here for backwards compatibility. If a subquery referenced a
field that didn't exist in the subquery, it will return nothing. But, if
there are two subqueries and one of them has the field exist and the
other doesn't, the second will return all null values.
This is not complete, but it is a starting point for more thorough tests
of subqueries.
This also reorders the use of `cmp.Diff` so the `want` is first and
`got` is second. This way, the `want` shows up as a minus sign in the
diff rather than, confusingly, as a plus sign.
This adds support for math functions into the query language. Math
functions are special because they are transformations and do not access
the filesystem in the same way aggregate functions do. A transformation
takes one point and always outputs one point making it more similar to
binary expressions so these math functions follow the same rules as
binary expressions.
This also supports using math literals (so you can do `sin(1)`) and the
math functions can be used anywhere such as in a field or an expression.
Both of the following are supported:
SELECT sin(value) FROM cpu
SELECT value FROM cpu WHERE sin(value) > 0.5
Arguments are in radians. Degrees is not supported.
This change makes it so that we simplify the math engine so it doesn't
use a complicated set of nested iterators. That way, we have to change
math in one fewer place.
It also greatly simplifies the query engine as now we can create the
necessary iterators, join them by time, name, and tags, and then use the
cursor interface to read them and use eval to compute the result. It
makes it so the auxiliary iterators and all of their complexity can be
removed.
This also makes use of the new eval functionality that was recently
added to the influxql package.
No math functions have been added, but the scaffolding has been included
so things like trigonometry functions are just a single commit away.
This also introduces a small breaking change. Because of the call
optimization, it is now possible to use the same selector multiple times
as a selector. So if you do this:
SELECT max(value) * 2, max(value) / 2 FROM cpu
This will now return the timestamp of the max value rather than zero
since this query is considered to have only a single selector rather
than multiple separate selectors. If any aspect of the selector is
different, such as different selector functions or different arguments,
it will consider the selectors to be aggregates like the old behavior.
The Cursor returned will be capable of scanning rows into a structure.
It replaces part of the function for why the Emitter existed. The
Emitter would both join the resulting rows and then transform the values
into a models.Row so it could be returned to the results.
In the future, we will be able to use the Cursor directly to write out
values which should be more memory efficient.
Along with modifying ExecutionContext to be a context and have the
TaskManager return the context itself, this also creates a Monitor
interface and exposes the Monitor through the Context. This way, we can
access the monitor from within the query.Select method and keep all of
the limits inside of the query package instead of leaking them into the
statement executor.
An eventual goal is to remove the InterruptCh from the IteratorOptions
and use the Context instead, but for now, we'll just assign the done
channel from the Context to the IteratorOptions so at least they refer
to the same channel.
Remove the `Query` prefix from some structs and interfaces. They were
there so when the query engine was in the same package as influxql,
these would be differentiated. Now that the package name is query, the
extra prefix seems redundant.
The implicit time range for an interval is supposed to be now when no
end is specified. In a subquery though, the interval doesn't exist and
so it doesn't set the end time to now, but to the max time. Since the
subquery qualifies as something that should have the implicit end time
apply, this results in a query that runs slowly because it is filling in
a bunch of unasked for intervals if a fill is specified.
This hack adds the implicit end time if it sees the parent query's end
time is set to the maximum available time.
This is a temporary fix for this problem. The query compilation should
perform these time range calculations in the compilation stage and the
subqueries should use the compilation stage during execution instead of
ignoring it. That work takes a lot more effort though and is more prone
to running into unforeseen bugs.
This fix introduces a subtle, but likely rare to run into bug. If the
top level query specifies the maximum time as the end time and the
subquery has an interval, the subquery should use the end time rather
than now as the time range. With this hack, it will interpret it as an
implicit time rather than an explicit one. This is unlikely to matter
though.
There is a strange race condition where a query can be killed and finish
at approximately the same time. If this happens, the query gets
retrieved by the killing task, the query gets closed by the normal
processing thread, and then the killing task attempts to kill the query
afterwards. Since the close doesn't mark the query as already killed
(since it's not killed, just merely unused), the killing thread attempts
to close the channel again.
Mark the query as killed whenever it is closed to prevent a double close
from happening. This should never cause the status to be erroneously
reported since the query status is removed from the query table within
the same lock scope.
When refactoring the query engine, I thought calling
`count(distinct(value))` multiple times was disallowed and so the
refactor made it so that wasn't possible.
It turns out that this pattern is allowed because since the distinct is
nested, it is aggregated anyway and can be combined with other
aggregates.
This removes the erroneously placed restriction.