Clean up template for fill average
Change fill(average) to fill(linear)
Update average to linear in infuxql spec
Add Integer Tests and associated fixes
Update CHANGELOG for fill(linear)
Negative timestamps are now supported. We also now refuse two
nanoseconds that are at the edge of the minimum time window. One of the
nanoseconds we do not accept is because we need MinInt64 to be used for
some internal comparisons in the TSM engine and it was causing an
underflow when we subtracted one from the minimum time. The second is so
we can have one minimum time that signifies the default minimum that
nobody can write to (so we can implicitly rewrite the timestamp on
aggregate queries) but still use the explicit timestamp if it is given
to us by the user. We aren't able to tell the difference between if the
user provided it or if it was implicit without those values being
different.
If the default minimum time is used with an aggregate query, we rewrite
the time to be the epoch for backwards compatibility since we believe
that's more important than supporting that extra nanosecond.
This commit limits queries to only process one shard at a time.
However, within a shard, multiple series can still be processed in
parallel. Shard iterators are lazily instantiated during query
execution to limit the amount of memory a given query uses.
The tsdb package had a substantial amount of dead code related to the
old query engine still in there. It is no longer used, so it was removed
since it was left unmaintained. There is likely still more code that is
the same, but wasn't found as part of this code cleanup.
influxql has dead code show up because of the code generation so it is
not included in this pruning.
This allows us to add additional options to ExecuteQuery without
creating parameter bloat.
Removing the unused Series structs. Their necessity was removed by a
previous commit, but the structs were not removed yet.
Add another type of interrupt iterator that monitors the interrupt
channel and calls `Close()` on the iterator when the interrupt happens.
It will primarily be used for asynchronously closing the ReaderIterator,
but it will only close the read side of the connection properly. More
work needs to be done to allow closing the write side efficiently.
Casting syntax is done with the PostgreSQL syntax `field1::float` to
specify which type should be used when selecting a field. You can also
do `field1::field` or `tag1::tag` to specify that a field or tag should
be selected.
This makes it possible to select a tag when a field key and a tag key
conflict with each other in a measurement. It also means it's possible
to choose a field with a specific type if multiple shards disagree. If
no types are given, the same ordering for how a type is chosen is used
to determine which type to return.
The FieldDimensions method has been updated to return the data type for
the fields that get returned. The SeriesKeys function has also been
removed since it is no longer needed. SeriesKeys was originally used for
the fill iterator, but then expanded to be used by auxiliary iterators
for determining the channel iterator types. The fill iterator doesn't
need it anymore and the auxiliary types are better served by
FieldDimensions implementing that functionality, so SeriesKeys is no
longer needed.
Fixes#6519.
This commit changes the `SeriesIterator` to process one measurement
at a time and uses a `floatFastDedupeIterator` to avoid point
encoding during deduplication.
The interrupt iterator currently introduces a non-trivial amount of
overhead to queries by checking for interrupts every 256 points.
This commit adjusts that check to every 5000 points.
There are also several places where nested field access has been
adjusted to minimize field lookups.
This also switches the remaining iterators to be lazy so they can return
errors properly. They needed to be converted to lazy initialization
anyway, which has the side effect of making it much easier for us to
propagate the underlying error during initialization.
Updated the Emitter to return errors when it cannot read properly from
the iterators.
For aggregate queries, derivatives will now alter the start time to one
interval behind and will use that interval to find the derivative of the
first point instead of giving no value for that interval. Null values
will still be discarded so if the interval before the one you are
querying is null, then it will be discarded like if it were in the
middle of the query. You can use `fill(0)` to fill in these values.
This does not apply to raw queries yet.
Also modified the derivative and difference aggregates to use the stream
iterator instead of the reduce slice iterator for space efficiency.
Fixes#3247. Contributes to #5943.
This commit changes the channel iterators to use a double buffer
to reduce allocations. The caller of `Iterator.Next()` must copy
out the point before calling `Next()` again.
This commit makes a number of performance improvements to
reduce allocations during query execution. Several objects
and buffers are now reused across the components to avoid
allocations.
Previously a simple `count(value)` query across 1M points
would require 26,000+ allocations. After the changes in
this commit that number has been reduced to 88.
Fixes#6211.
In Go-land packages with the same name, e.g., internal, do not clash
with each other when they're in different parts of the project. However
with protobufs definitions will clash if they share the same package
name.
This commit renames the influxql protobuf package to `influxql` to
avoid a clash with a message definition in another protobuf package
called internal. Go package aliases allow us to continue to refer to the
internal package as `internal` rather than `influxql`.
This commit sets the `MergeIterator.init` flag after initialization.
Previously this would generate a new heap on every call to `Next()`
which caused some aggregate queries to slow by ~10,000%.
The MergeIterator creation function would call `peek()` on the iterator
to initialize the heap. Since this function can sometimes take a long
time (such as a huge aggregate query on a shard), the
`influxql.Select()` wouldn't return until the query had already been
completed.
The `influxql.Select()` call should be just the creation of the
iterators and shouldn't calculate anything. This is important for future
features like the point limiter that have to be initialized after the
`influxql.Select()` call.
The simple moving average will gradually emit points instead of waiting
until the end. This should apply to derivative and difference in the
future too.
Fixes#6112.
This commit adds an `IteratorStats` that holds aggregate
iterator processing information. A method is also added to
`Iterator` to return the stats:
Stats() influxql.IteratorStats
The remote iterators will also emit their stats in the point
stream upon first connection, on a given interval, and then
finally once the last point has been sent.
Use of the iterator is spread out into both `IteratorCreators` and
inside of the iterators themselves. Part of the interrupt must be
handled inside of the engine so it stops trying to emit points when an
interrupt is found and another part of the interrupt has to happen when
combining the iterators so it doesn't just start reading the next shard.
This commit moves the `tsdb.Store.ExpandSources()` function onto
the `influxql.IteratorCreator` and provides support for issuing
source expansion across a cluster.
Now the AuxIterator will know when it is backgrounded so that it can
stop reading from the primary iterator when all of the child iterators
have been closed.
This commit moves the `tsdb.Store.ExpandSources()` function onto
the `influxql.IteratorCreator` and provides support for issuing
source expansion across a cluster.
The primary input iterator for an aux iterator would continue trying to
send points to a closed channel even after an aux iterator had already
been closed.
This changes the aux iterators to use sync.Cond instead of channels and
lower level syncing primitives for handling buffered input/output.
Fixes#5974.
Previously the call iterator would normalize the time to the interval
for all calls. This meant that when `first()` or `last()` was called
with no group by interval the value would be found for each shard, the
time was normalized, then it tried to find the value between the shards
(but no longer with any time data as that had already been eliminated).
This removes part of the time logic from the call iterators and makes a
new iterator `IntervalIterator` to normalize the times as they come out
of the underlying iterator.
Fixes#5890.
All three of these iterators are supposed to support all four types of
iterators, but the implementation was never done for string or boolean.
Fixes#5886.
This refactor is primarily to support Kapacitor. Kapacitor doesn't care
about the iterators and mostly keeps the points it handles in memory.
The iterator interface is more than Kapacitor cares about.
This commit refactors and opens up the internals of aggregating and
reducing incoming points so it can be used by an outside library with
the same code. It also makes the iterators used by the call iterators
publically usable with new functionality.
Reducers are split into two methods which are separate interfaces that
can be combined for dealing with casting between different types. The
Aggregator interfaces accept points into the aggregator and retain any
internal state they need. The Emitter interface will then create a point
from that aggregated state which can be fed to the iterator. The
Emitters do not fill in the name or tag of the point as that is expected
to be done by the person aggregating the point. While the Emitters do
sometimes fill in the time, that value will also be overwritten by the
iterator. Filling in the time is to allow a future version that will
allow returning the point time instead of just the interval time.
The limit iterator would short circuit if there were no dimensions and
all points had been read. It also needs to consider that multiple
sources will require reading the entire iterator too, so the short
circuit requires only a single source.
Fixes#5871.
A new attribute has been added to points to track how many points were
used to calculate that point. This is particularly useful for finding
the mean as we can then split mean calculation into two phases: one at
the shard level and a second at the shards level.
This optimization is now used so we don't have to hold so many points in
memory while calculating the mean.
Querying an integer field with a fill value will cause a cast error
because the underlying type is a float64 rather than an int64. Add a
function that will coerce the value to the correct type.
It may be more appropriate in the future to have the fill iterator read
the underlying iterator and cast to the appropriate type rather than
coerce the fill value to the correct type, but this solution works for
our current scenario well.
The additional locks shouldn't be necessary due to how the code is used,
but should prevent any potential data races in case we accidentally do
something bad.
The AuxIterator streams points to the underlying iterators. When it
started automatically, race conditions occurred between the stream
closing the iterators and creating iterators from the AuxIterator.