Commit Graph

69 Commits (74c6a0c1c590a102705442fcb49f372c85f1beb1)

Author SHA1 Message Date
Jonathan A. Sternberg 19a61dbb44 Align binary math expression streams by time
Also fills in missing values using the fill expression for any binary
aggregation.
2016-10-18 13:31:13 -05:00
Michael Desa 966e5503bf Add fill(linear) to query language
Clean up template for fill average

Change fill(average) to fill(linear)

Update average to linear in infuxql spec

Add Integer Tests and associated fixes

Update CHANGELOG for fill(linear)
2016-10-04 14:27:04 -07:00
Jonathan A. Sternberg c05c7f6360 Revert "limit shard concurrency"
This reverts commit 6c7d56d4bc.
2016-08-29 12:39:52 -05:00
Jonathan A. Sternberg 10029caf2f Support negative timestamps in the query engine
Negative timestamps are now supported. We also now refuse two
nanoseconds that are at the edge of the minimum time window. One of the
nanoseconds we do not accept is because we need MinInt64 to be used for
some internal comparisons in the TSM engine and it was causing an
underflow when we subtracted one from the minimum time. The second is so
we can have one minimum time that signifies the default minimum that
nobody can write to (so we can implicitly rewrite the timestamp on
aggregate queries) but still use the explicit timestamp if it is given
to us by the user. We aren't able to tell the difference between if the
user provided it or if it was implicit without those values being
different.

If the default minimum time is used with an aggregate query, we rewrite
the time to be the epoch for backwards compatibility since we believe
that's more important than supporting that extra nanosecond.
2016-08-25 12:52:41 -05:00
Ben Johnson 6c7d56d4bc
limit shard concurrency
This commit limits queries to only process one shard at a time.
However, within a shard, multiple series can still be processed in
parallel. Shard iterators are lazily instantiated during query
execution to limit the amount of memory a given query uses.
2016-08-05 09:45:57 -06:00
Jason Wilder 19546faab3 Release cursor/iterator resources aggressively 2016-08-03 00:21:39 -06:00
Jonathan A. Sternberg 3bd51d3537 Fix fill(previous) when used with math operators 2016-06-29 09:54:12 -05:00
Jonathan A. Sternberg 497db2a6d3 Removing dead code from every package except influxql
The tsdb package had a substantial amount of dead code related to the
old query engine still in there. It is no longer used, so it was removed
since it was left unmaintained. There is likely still more code that is
the same, but wasn't found as part of this code cleanup.

influxql has dead code show up because of the code generation so it is
not included in this pruning.
2016-06-20 22:41:07 -05:00
Jonathan A. Sternberg 252cde1e81 Fix golint errors for the influxql package 2016-06-20 08:51:02 -05:00
Jonathan A. Sternberg 71c8e9e567 Refactor ExecuteQuery to take options as a struct
This allows us to add additional options to ExecuteQuery without
creating parameter bloat.

Removing the unused Series structs. Their necessity was removed by a
previous commit, but the structs were not removed yet.

Add another type of interrupt iterator that monitors the interrupt
channel and calls `Close()` on the iterator when the interrupt happens.
It will primarily be used for asynchronously closing the ReaderIterator,
but it will only close the read side of the connection properly. More
work needs to be done to allow closing the write side efficiently.
2016-06-01 12:30:52 -05:00
Jonathan A. Sternberg 23f6a706bb Support cast syntax for selecting a specific type
Casting syntax is done with the PostgreSQL syntax `field1::float` to
specify which type should be used when selecting a field. You can also
do `field1::field` or `tag1::tag` to specify that a field or tag should
be selected.

This makes it possible to select a tag when a field key and a tag key
conflict with each other in a measurement. It also means it's possible
to choose a field with a specific type if multiple shards disagree. If
no types are given, the same ordering for how a type is chosen is used
to determine which type to return.

The FieldDimensions method has been updated to return the data type for
the fields that get returned. The SeriesKeys function has also been
removed since it is no longer needed. SeriesKeys was originally used for
the fill iterator, but then expanded to be used by auxiliary iterators
for determining the channel iterator types. The fill iterator doesn't
need it anymore and the auxiliary types are better served by
FieldDimensions implementing that functionality, so SeriesKeys is no
longer needed.

Fixes #6519.
2016-05-16 12:08:29 -04:00
Ben Johnson 078e561820
parallelize iterators 2016-05-09 10:25:30 -06:00
Ben Johnson 49eb3b8d04
optimize show series iterator
This commit changes the `SeriesIterator` to process one measurement
at a time and uses a `floatFastDedupeIterator` to avoid point
encoding during deduplication.
2016-05-03 08:52:44 -06:00
Ben Johnson 1b6524a7bf
reduce interrupt iterator checks
The interrupt iterator currently introduces a non-trivial amount of
overhead to queries by checking for interrupts every 256 points.
This commit adjusts that check to every 5000 points.

There are also several places where nested field access has been
adjusted to minimize field lookups.
2016-04-26 12:16:07 -06:00
Jonathan A. Sternberg 7ec2a991d5 Modify all of the iterators to allow returning an error on Next()
This also switches the remaining iterators to be lazy so they can return
errors properly. They needed to be converted to lazy initialization
anyway, which has the side effect of making it much easier for us to
propagate the underlying error during initialization.

Updated the Emitter to return errors when it cannot read properly from
the iterators.
2016-04-18 11:17:55 -04:00
Jonathan A. Sternberg 86046bb2d0 Implement derivatives across intervals for aggregate queries
For aggregate queries, derivatives will now alter the start time to one
interval behind and will use that interval to find the derivative of the
first point instead of giving no value for that interval. Null values
will still be discarded so if the interval before the one you are
querying is null, then it will be discarded like if it were in the
middle of the query. You can use `fill(0)` to fill in these values.

This does not apply to raw queries yet.

Also modified the derivative and difference aggregates to use the stream
iterator instead of the reduce slice iterator for space efficiency.

Fixes #3247. Contributes to #5943.
2016-04-15 18:16:08 -04:00
Ben Johnson 4f381d03d7
add double buffer on chan iterator
This commit changes the channel iterators to use a double buffer
to reduce allocations. The caller of `Iterator.Next()` must copy
out the point before calling `Next()` again.
2016-04-14 13:52:13 -06:00
Ben Johnson 525e22c92b
tsm1 query engine alloc reduction
This commit makes a number of performance improvements to
reduce allocations during query execution. Several objects
and buffers are now reused across the components to avoid
allocations.

Previously a simple `count(value)` query across 1M points
would require 26,000+ allocations. After the changes in
this commit that number has been reduced to 88.
2016-04-11 14:50:59 -06:00
Jonathan A. Sternberg fa5a38dcd4 Fixing aggregate queries with no GROUP BY to include the end time
Queries with a time constraint but no group by would not include the
final point from the underlying iterator.

Fixes #6229.
2016-04-07 14:11:28 -04:00
Edd Robinson dfee15bd19 Scopes influxql Protobuf package to prevent clashes
Fixes #6211.

In Go-land packages with the same name, e.g., internal, do not clash
with each other when they're in different parts of the project. However
with protobufs definitions will clash if they share the same package
name.

This commit renames the influxql protobuf package to `influxql` to
avoid a clash with a message definition in another protobuf package
called internal. Go package aliases allow us to continue to refer to the
internal package as `internal` rather than `influxql`.
2016-04-05 13:36:47 +01:00
Jonathan A. Sternberg 43e3330480 Fix the reader iterator so it doesn't read the first point when creating the iterator 2016-04-01 17:31:28 -04:00
Ben Johnson b28c4db3d0 mark merge iterator as initialized
This commit sets the `MergeIterator.init` flag after initialization.
Previously this would generate a new heap on every call to `Next()`
which caused some aggregate queries to slow by ~10,000%.
2016-03-31 09:56:23 -06:00
Jonathan A. Sternberg 178a6e2f0a Merge pull request #6113 from influxdata/js-6112-simple-moving-average
Implement simple moving average
2016-03-30 20:57:55 -04:00
Jonathan A. Sternberg 278b0950a7 Perform lazy initialization of the heap for the MergeIterator
The MergeIterator creation function would call `peek()` on the iterator
to initialize the heap. Since this function can sometimes take a long
time (such as a huge aggregate query on a shard), the
`influxql.Select()` wouldn't return until the query had already been
completed.

The `influxql.Select()` call should be just the creation of the
iterators and shouldn't calculate anything. This is important for future
features like the point limiter that have to be initialized after the
`influxql.Select()` call.
2016-03-30 16:08:55 -04:00
Jonathan A. Sternberg 6453dbc249 Implement simple moving average
The simple moving average will gradually emit points instead of waiting
until the end. This should apply to derivative and difference in the
future too.

Fixes #6112.
2016-03-29 14:36:43 -04:00
Jonathan A. Sternberg 114e734ee5 Fix a bad merge that removed ExpandSources from AuxIterators
Regenerated the protobuf file for influxql to use a newer protobuf.
2016-03-22 16:36:22 -04:00
Ben Johnson 6e1c1da25b reduce allocations in query execution
This commit removes some heap objects by converting them from
pointer references to non-pointers or by reusing buffers.
2016-03-22 09:51:39 -06:00
Ben Johnson d58c6608fe add InterruptIterator.Stats() 2016-03-21 16:38:18 -06:00
Ben Johnson 7156c1f9bd add IteratorStats
This commit adds an `IteratorStats` that holds aggregate
iterator processing information. A method is also added to
`Iterator` to return the stats:

	Stats() influxql.IteratorStats

The remote iterators will also emit their stats in the point
stream upon first connection, on a given interval, and then
finally once the last point has been sent.
2016-03-21 16:25:19 -06:00
Jonathan A. Sternberg 6655ca7769 Create a new interrupt iterator that will stop emitting points after an interrupt
Use of the iterator is spread out into both `IteratorCreators` and
inside of the iterators themselves. Part of the interrupt must be
handled inside of the engine so it stops trying to emit points when an
interrupt is found and another part of the interrupt has to happen when
combining the iterators so it doesn't just start reading the next shard.
2016-03-21 12:07:07 -04:00
Cory LaNou ba6a95e9bc Merge pull request #5994 from influxdata/single-server-lite
Single Server
2016-03-14 16:11:37 -05:00
Jonathan A. Sternberg 94916082c9 Make binary expressions with either point being nil return a nil point
This also fixes integer to float and float/integer to boolean binary
expressions to correctly work with nil points at all.

Related to #5973.
2016-03-14 13:27:59 -04:00
Ben Johnson e96185f993 add support for remote expansion of regex
This commit moves the `tsdb.Store.ExpandSources()` function onto
the `influxql.IteratorCreator` and provides support for issuing
source expansion across a cluster.
2016-03-14 16:55:53 +00:00
Jonathan A. Sternberg 3f68bd12ee Merge pull request #5979 from influxdata/js-5974-aux-iterator-close-panic
Fix aux iterators to respect early closing
2016-03-14 12:03:50 -04:00
Jonathan A. Sternberg 0042866002 Teach the AuxIterator how to background
Now the AuxIterator will know when it is backgrounded so that it can
stop reading from the primary iterator when all of the child iterators
have been closed.
2016-03-14 11:12:02 -04:00
Jonathan A. Sternberg 74d51e3842 Support nil values in binary math expressions with two iterators
Related to #5959 and #5973.
2016-03-11 15:57:35 -05:00
Ben Johnson beda072426 add support for remote expansion of regex
This commit moves the `tsdb.Store.ExpandSources()` function onto
the `influxql.IteratorCreator` and provides support for issuing
source expansion across a cluster.
2016-03-11 12:40:07 -07:00
Jonathan A. Sternberg 09a9b3c53e Fix aux iterators to respect early closing
The primary input iterator for an aux iterator would continue trying to
send points to a closed channel even after an aux iterator had already
been closed.

This changes the aux iterators to use sync.Cond instead of channels and
lower level syncing primitives for handling buffered input/output.

Fixes #5974.
2016-03-11 12:07:32 -05:00
Jonathan A. Sternberg 9c5bc8ab2b Refactor reduce slice func to use the aggregator and emitter 2016-03-07 13:25:45 -05:00
Jonathan A. Sternberg 9113839e4c Fix sorting of `first()` and `last()` calls across shards
Previously the call iterator would normalize the time to the interval
for all calls. This meant that when `first()` or `last()` was called
with no group by interval the value would be found for each shard, the
time was normalized, then it tried to find the value between the shards
(but no longer with any time data as that had already been eliminated).

This removes part of the time logic from the call iterators and makes a
new iterator `IntervalIterator` to normalize the times as they come out
of the underlying iterator.

Fixes #5890.
2016-03-03 21:15:43 -05:00
Jonathan A. Sternberg e3660fae93 Support all iterator types for count(), first(), and last()
All three of these iterators are supposed to support all four types of
iterators, but the implementation was never done for string or boolean.

Fixes #5886.
2016-03-02 23:49:55 -05:00
Jonathan A. Sternberg 2440568b27 Merge pull request #5875 from influxdata/js-5852-mean-function-accuracy
Improve mean accuracy while retaining the speedup with a custom iterator
2016-03-02 17:09:58 -05:00
Jonathan A. Sternberg 1c543b28a9 Refactored call iterators to make them public and more usable as a library
This refactor is primarily to support Kapacitor. Kapacitor doesn't care
about the iterators and mostly keeps the points it handles in memory.
The iterator interface is more than Kapacitor cares about.

This commit refactors and opens up the internals of aggregating and
reducing incoming points so it can be used by an outside library with
the same code. It also makes the iterators used by the call iterators
publically usable with new functionality.

Reducers are split into two methods which are separate interfaces that
can be combined for dealing with casting between different types. The
Aggregator interfaces accept points into the aggregator and retain any
internal state they need. The Emitter interface will then create a point
from that aggregated state which can be fed to the iterator. The
Emitters do not fill in the name or tag of the point as that is expected
to be done by the person aggregating the point. While the Emitters do
sometimes fill in the time, that value will also be overwritten by the
iterator. Filling in the time is to allow a future version that will
allow returning the point time instead of just the interval time.
2016-03-02 16:10:49 -05:00
Jonathan A. Sternberg d11bc6182c Improve mean accuracy while retaining the speedup with a custom iterator
Fixes #5852.
2016-03-02 14:48:11 -05:00
Jonathan A. Sternberg 87fc143732 Fix limit iterator with multiple sources
The limit iterator would short circuit if there were no dimensions and
all points had been read. It also needs to consider that multiple
sources will require reading the entire iterator too, so the short
circuit requires only a single source.

Fixes #5871.
2016-03-01 21:44:45 -05:00
Ben Johnson 0dda9f6608 add remote execution
This commit adds remote execution to the query engine.
2016-02-25 08:41:20 -07:00
Jonathan A. Sternberg 18c7c554ba Optimize the mean() call by moving the calculation into the shard iterator
A new attribute has been added to points to track how many points were
used to calculate that point. This is particularly useful for finding
the mean as we can then split mean calculation into two phases: one at
the shard level and a second at the shards level.

This optimization is now used so we don't have to hold so many points in
memory while calculating the mean.
2016-02-16 10:32:34 -05:00
Jonathan A. Sternberg 73ee204fbc Cast number fill values for the fill iterator
Querying an integer field with a fill value will cause a cast error
because the underlying type is a float64 rather than an int64. Add a
function that will coerce the value to the correct type.

It may be more appropriate in the future to have the fill iterator read
the underlying iterator and cast to the appropriate type rather than
coerce the fill value to the correct type, but this solution works for
our current scenario well.
2016-02-11 15:21:50 -05:00
Jonathan A. Sternberg 98810a363a Correct the AuxIterator test and adding some additional locks
The additional locks shouldn't be necessary due to how the code is used,
but should prevent any potential data races in case we accidentally do
something bad.
2016-02-10 09:40:31 -07:00
Jonathan A. Sternberg ed151598ff Modify the AuxIterator to include a Start method
The AuxIterator streams points to the underlying iterators. When it
started automatically, race conditions occurred between the stream
closing the iterators and creating iterators from the AuxIterator.
2016-02-10 09:40:30 -07:00