Commit Graph

64 Commits (74c6a0c1c590a102705442fcb49f372c85f1beb1)

Author SHA1 Message Date
Jonathan A. Sternberg c05c7f6360 Revert "limit shard concurrency"
This reverts commit 6c7d56d4bc.
2016-08-29 12:39:52 -05:00
Jonathan A. Sternberg 10029caf2f Support negative timestamps in the query engine
Negative timestamps are now supported. We also now refuse two
nanoseconds that are at the edge of the minimum time window. One of the
nanoseconds we do not accept is because we need MinInt64 to be used for
some internal comparisons in the TSM engine and it was causing an
underflow when we subtracted one from the minimum time. The second is so
we can have one minimum time that signifies the default minimum that
nobody can write to (so we can implicitly rewrite the timestamp on
aggregate queries) but still use the explicit timestamp if it is given
to us by the user. We aren't able to tell the difference between if the
user provided it or if it was implicit without those values being
different.

If the default minimum time is used with an aggregate query, we rewrite
the time to be the epoch for backwards compatibility since we believe
that's more important than supporting that extra nanosecond.
2016-08-25 12:52:41 -05:00
Jonathan A. Sternberg 4cdfc3280d Move the CQ interval by the group by offset
This will make the period selected by the CQ system work correctly for a
query with an offset.
2016-08-05 14:39:52 -05:00
Ben Johnson 6c7d56d4bc
limit shard concurrency
This commit limits queries to only process one shard at a time.
However, within a shard, multiple series can still be processed in
parallel. Shard iterators are lazily instantiated during query
execution to limit the amount of memory a given query uses.
2016-08-05 09:45:57 -06:00
Jason Wilder 0b60862248 Close drained iterators
Aux and condition iterators where not closed which could
cause TSM files to leak if they were queried against while
a compaction was running.
2016-07-28 20:25:37 -06:00
Ben Johnson 5df6f75545
check for nil iterator creation
This commit checks if an iterator is `nil` before adding to an
iterator list during creation.
2016-07-27 13:54:56 -06:00
Jonathan A. Sternberg 8e1b036b0a Modify the max nanosecond time to be one nanosecond less
The highest time represented by a nanosecond needs to be used for an
exclusive range, so the maximum time needs to be one less than the
possible maximum number of nanoseconds representable by an int64 so that
we don't lose a point at that one time.

Previously worked in the open source version because the timestamp used
for finding a shard would be truncated by the retention policy so the
lookup time didn't run into this edge case because it didn't rest on the
truncation boundary. Since that point didn't really belong in that shard
group and was placed there by mistake, it's best to fix this bug since
the timestamp used to create the shard group should be capable of
retrieving it.
2016-06-16 12:15:41 -05:00
Jonathan A. Sternberg b972c220aa Merge pull request #6757 from influxdata/js-refactor-execute-query
Refactor ExecuteQuery to take options as a struct
2016-06-07 10:35:52 -05:00
Ben Johnson 3fa5cefa32
add Iterators.Merge() 2016-06-03 10:27:17 -06:00
Jonathan A. Sternberg 1e84b22407 Update SHOW TAG VALUES to use a fast dedupe iterator
Include a benchmark test for the fast dedupe iterator.
2016-06-02 22:03:59 -05:00
Jonathan A. Sternberg 71c8e9e567 Refactor ExecuteQuery to take options as a struct
This allows us to add additional options to ExecuteQuery without
creating parameter bloat.

Removing the unused Series structs. Their necessity was removed by a
previous commit, but the structs were not removed yet.

Add another type of interrupt iterator that monitors the interrupt
channel and calls `Close()` on the iterator when the interrupt happens.
It will primarily be used for asynchronously closing the ReaderIterator,
but it will only close the read side of the connection properly. More
work needs to be done to allow closing the write side efficiently.
2016-06-01 12:30:52 -05:00
Nathaniel Cook 2927fee2d1 update comment on MaxTime 2016-05-27 11:07:50 +01:00
Nathaniel Cook 9314ae8e80 fix overflow in window iterator and holt winters roundTime 2016-05-27 11:07:50 +01:00
Edd Robinson f4fc905fa9 Reject timestamps too far in future 2016-05-27 11:07:48 +01:00
Jonathan A. Sternberg 23f6a706bb Support cast syntax for selecting a specific type
Casting syntax is done with the PostgreSQL syntax `field1::float` to
specify which type should be used when selecting a field. You can also
do `field1::field` or `tag1::tag` to specify that a field or tag should
be selected.

This makes it possible to select a tag when a field key and a tag key
conflict with each other in a measurement. It also means it's possible
to choose a field with a specific type if multiple shards disagree. If
no types are given, the same ordering for how a type is chosen is used
to determine which type to return.

The FieldDimensions method has been updated to return the data type for
the fields that get returned. The SeriesKeys function has also been
removed since it is no longer needed. SeriesKeys was originally used for
the fill iterator, but then expanded to be used by auxiliary iterators
for determining the channel iterator types. The fill iterator doesn't
need it anymore and the auxiliary types are better served by
FieldDimensions implementing that functionality, so SeriesKeys is no
longer needed.

Fixes #6519.
2016-05-16 12:08:29 -04:00
Ben Johnson 078e561820
parallelize iterators 2016-05-09 10:25:30 -06:00
Ben Johnson fdf34d4356
move call iterator to series level
This commit moves the `CallIterator` to wrap the individual series
instead of wrapping a shard. This allows individual points to be
aggregated before being merged.

This will cause a small increase in memory usuage per series but
it shows a 20% decrease in query time when there are a moderate
number of points per series.
2016-05-05 09:59:03 -06:00
Ben Johnson 417df18396 Merge pull request #6533 from benbjohnson/optimize-show-series
Optimize SHOW SERIES
2016-05-03 09:15:21 -06:00
Jonathan A. Sternberg a2a5c32770 Merge pull request #6539 from influxdata/js-6495-fix-aggregates-with-empty-shards
Fix aggregate returns when data is missing from some shards
2016-05-03 10:56:21 -04:00
Ben Johnson 49eb3b8d04
optimize show series iterator
This commit changes the `SeriesIterator` to process one measurement
at a time and uses a `floatFastDedupeIterator` to avoid point
encoding during deduplication.
2016-05-03 08:52:44 -06:00
Jonathan A. Sternberg d6d0addcec Fix aggregate returns when data is missing from some shards
If a shard is empty for a specific field and the field type is something
other than a float, a nil iterator would get returned from one of the
empty shards and cause the combined iterators to be cast to the float
type and all other iterator types to be discarded (or for integers, to
be cast).

This is rare since most aggregates don't accept strings or booleans, but
for queries like:

    SELECT distinct(string) FROM mydata

It would result in nothing getting returned if one of the shards didn't
have a value for `string`.

This change modifies the query engine to return nil for the shards
instead of a fake iterator and then to only use the fake iterator if the
final aggregate iterator is nil (meaning that no iterators could be
constructed for the field from any shard).

Fixes #6495.
2016-05-03 10:41:22 -04:00
Jonathan A. Sternberg 64556e4f8e Support offset argument in the GROUP BY time(...) call
An offset of `time(1m, now())` will anchor the offset to the current
time of the query. The default offset is `0s` which is the current
default anyway.

This fixes #2074 by making time zone offset support unnecessary. Time
comparisons can use timezones inside of the time clause and the offset
needed for non-hour timezone differences can be used as part of the
offset argument.
2016-05-02 14:02:35 -04:00
Jonathan A. Sternberg c8c38e15cd Merge pull request #6386 from influxdata/js-iterator-next-error
Modify all of the iterators to allow returning an error on Next()
2016-04-20 10:39:53 -04:00
Nathaniel Cook 465f5a375f add elapsed function 2016-04-19 12:54:54 -06:00
Jonathan A. Sternberg 7ec2a991d5 Modify all of the iterators to allow returning an error on Next()
This also switches the remaining iterators to be lazy so they can return
errors properly. They needed to be converted to lazy initialization
anyway, which has the side effect of making it much easier for us to
propagate the underlying error during initialization.

Updated the Emitter to return errors when it cannot read properly from
the iterators.
2016-04-18 11:17:55 -04:00
Ben Johnson 4f381d03d7
add double buffer on chan iterator
This commit changes the channel iterators to use a double buffer
to reduce allocations. The caller of `Iterator.Next()` must copy
out the point before calling `Next()` again.
2016-04-14 13:52:13 -06:00
Ben Johnson 525e22c92b
tsm1 query engine alloc reduction
This commit makes a number of performance improvements to
reduce allocations during query execution. Several objects
and buffers are now reused across the components to avoid
allocations.

Previously a simple `count(value)` query across 1M points
would require 26,000+ allocations. After the changes in
this commit that number has been reduced to 88.
2016-04-11 14:50:59 -06:00
Jonathan A. Sternberg fa5a38dcd4 Fixing aggregate queries with no GROUP BY to include the end time
Queries with a time constraint but no group by would not include the
final point from the underlying iterator.

Fixes #6229.
2016-04-07 14:11:28 -04:00
Edd Robinson dfee15bd19 Scopes influxql Protobuf package to prevent clashes
Fixes #6211.

In Go-land packages with the same name, e.g., internal, do not clash
with each other when they're in different parts of the project. However
with protobufs definitions will clash if they share the same package
name.

This commit renames the influxql protobuf package to `influxql` to
avoid a clash with a message definition in another protobuf package
called internal. Go package aliases allow us to continue to refer to the
internal package as `internal` rather than `influxql`.
2016-04-05 13:36:47 +01:00
Jonathan A. Sternberg 43e3330480 Fix the reader iterator so it doesn't read the first point when creating the iterator 2016-04-01 17:31:28 -04:00
Jonathan A. Sternberg c193bde61c Throw an error when time is compared to an invalid literal
A bigger refactor of these functions is needed to support #3290, but
this will work for the more common case that someone uses double quotes
instead of single quotes when surrounding a time literal.

Fixes #3932.
2016-03-31 11:29:20 -06:00
Jonathan A. Sternberg 3a7d537ee6 Merge pull request #6028 from influxdata/js-5116-default-no-fill-for-select-into
Modify fill(null) to fill(none) in SELECT INTO queries
2016-03-22 12:13:17 -04:00
Ben Johnson 7156c1f9bd add IteratorStats
This commit adds an `IteratorStats` that holds aggregate
iterator processing information. A method is also added to
`Iterator` to return the stats:

	Stats() influxql.IteratorStats

The remote iterators will also emit their stats in the point
stream upon first connection, on a given interval, and then
finally once the last point has been sent.
2016-03-21 16:25:19 -06:00
Jonathan A. Sternberg 6655ca7769 Create a new interrupt iterator that will stop emitting points after an interrupt
Use of the iterator is spread out into both `IteratorCreators` and
inside of the iterators themselves. Part of the interrupt must be
handled inside of the engine so it stops trying to emit points when an
interrupt is found and another part of the interrupt has to happen when
combining the iterators so it doesn't just start reading the next shard.
2016-03-21 12:07:07 -04:00
Jonathan A. Sternberg eab6ac3871 Modify fill(null) to fill(none) in SELECT INTO queries
Fixes #5116.
2016-03-16 11:14:41 -04:00
Cory LaNou ba6a95e9bc Merge pull request #5994 from influxdata/single-server-lite
Single Server
2016-03-14 16:11:37 -05:00
Jonathan A. Sternberg 0042866002 Teach the AuxIterator how to background
Now the AuxIterator will know when it is backgrounded so that it can
stop reading from the primary iterator when all of the child iterators
have been closed.
2016-03-14 11:12:02 -04:00
Ben Johnson beda072426 add support for remote expansion of regex
This commit moves the `tsdb.Store.ExpandSources()` function onto
the `influxql.IteratorCreator` and provides support for issuing
source expansion across a cluster.
2016-03-11 12:40:07 -07:00
Jonathan A. Sternberg 09a9b3c53e Fix aux iterators to respect early closing
The primary input iterator for an aux iterator would continue trying to
send points to a closed channel even after an aux iterator had already
been closed.

This changes the aux iterators to use sync.Cond instead of channels and
lower level syncing primitives for handling buffered input/output.

Fixes #5974.
2016-03-11 12:07:32 -05:00
Jonathan A. Sternberg 9c5bc8ab2b Refactor reduce slice func to use the aggregator and emitter 2016-03-07 13:25:45 -05:00
Jonathan A. Sternberg 9113839e4c Fix sorting of `first()` and `last()` calls across shards
Previously the call iterator would normalize the time to the interval
for all calls. This meant that when `first()` or `last()` was called
with no group by interval the value would be found for each shard, the
time was normalized, then it tried to find the value between the shards
(but no longer with any time data as that had already been eliminated).

This removes part of the time logic from the call iterators and makes a
new iterator `IntervalIterator` to normalize the times as they come out
of the underlying iterator.

Fixes #5890.
2016-03-03 21:15:43 -05:00
Jonathan A. Sternberg 8d89a203a2 Fix sorting for distinct by sorting by value when the point time is the same 2016-03-03 19:09:38 -05:00
Jonathan A. Sternberg cddc1b2241 Fix remote execution for partially replicated clusters
The RPC handler for remote queries would attempt to reuse a closed
connection for certain commands that didn't use pooling. The RPC
commands that close the connection have been fixed to not try reusing
the connection.

When creating an iterator, if there are no points to return, the points
decoder would hit an EOF that it didn't catch and would return that
error back to the client who made the request. It now properly returns
no points by using a `nilFloatIterator` if there are no points to
return.

This fixes remote execution when a cluster has nothing to return.
2016-02-25 17:46:51 -05:00
Ben Johnson 16eea8eecc add SeriesList marshaling 2016-02-25 15:38:16 -07:00
Ben Johnson 0dda9f6608 add remote execution
This commit adds remote execution to the query engine.
2016-02-25 08:41:20 -07:00
Jonathan A. Sternberg 559a11d0ab Pass the implicit end time from the query executor to the select call
The select call and the query executor would both calculate the time
range, but in separate ways. The query executor needed some way to pass
in the implicit end time that is placed there by the query executor.

Fixes #5636.
2016-02-12 16:03:24 -05:00
Jonathan A. Sternberg d79eb29a23 Add documentation comments to influxql.Series and influxql.SeriesList 2016-02-10 09:40:31 -07:00
Jonathan A. Sternberg 98810a363a Correct the AuxIterator test and adding some additional locks
The additional locks shouldn't be necessary due to how the code is used,
but should prevent any potential data races in case we accidentally do
something bad.
2016-02-10 09:40:31 -07:00
Jonathan A. Sternberg ed151598ff Modify the AuxIterator to include a Start method
The AuxIterator streams points to the underlying iterators. When it
started automatically, race conditions occurred between the stream
closing the iterators and creating iterators from the AuxIterator.
2016-02-10 09:40:30 -07:00
Jonathan A. Sternberg 5c9cdd05c2 Modify the fill iterator to not produce empty name/tags for a time range
If a name/tag combination does not have any points in the time range at
all, fill will not generate points for it.
2016-02-10 09:40:30 -07:00