influxdb

Commit Graph

Author	SHA1	Message	Date
Stuart Carnie	47f97ea134	use parsed measurement and models.Tags	2017-05-26 13:21:59 -07:00
Joe LeGasse	815f740f4c	initial fga work wip wip fix tests / build	2017-05-26 13:16:27 -07:00
Jonathan A. Sternberg	be3bce5212	top() and bottom() now returns the time for every point `top()` and `bottom()` will now organize the points by time and also keep the points original time even when a time grouping is used. At the same time, `top()` and `bottom()` will no longer honor any fill options that are present since they don't really make sense for these specific functions. This also fixes the aggregate and selectors to honor the ordered iterator option so iterator remain ordered and to also respect the buckets that are created by the final dimensions of the query so that two buckets don't overlap each other within the same reducer. A test has been added for this situation. This should clarify and encourage the use of the ordered attribute within the query engine.	2017-04-26 15:07:10 -05:00
Jonathan A. Sternberg	211e7ea65d	Merge pull request #8234 from influxdata/js-8230-fix-window-computation-overflow Prevent overflowing or underflowing during window computation	2017-03-31 11:09:30 -05:00
Jonathan A. Sternberg	64fb1db5f5	Prevent overflowing or underflowing during window computation The Window function will now check before it adjusts the offset whether it is going to overflow or underflow. If it is going to do either, it sets the start or end time to MinTime or MaxTime.	2017-03-30 16:35:22 -05:00
Tom Young	cac94a1fc7	Add "integral" function to InfluxQL	2017-03-30 12:07:26 -05:00
Jonathan A. Sternberg	347b01814e	Support timezone offsets for queries The timezone for a query can now be added to the end with something like `TZ("America/Los_Angeles")` and it will localize the results of the query to be in that timezone. The offset will automatically be set to the offset for that timezone and offsets will automatically adjust for daylight savings time so grouping by a day will result in a 25 hour day once a year and a 23 hour day another day of the year. The automatic adjustment of intervals for timezone offsets changing will only happen if the group by period is greater than the timezone offset would be. That means grouping by an hour or less will not be affected by daylight savings time, but a 2 hour or 1 day interval will be. The default timezone is UTC and existing queries are unaffected by this change. When times are returned as strings (when `epoch=1` is not used), the results will be returned using the requested timezone format in RFC3339 format.	2017-03-22 15:09:41 -05:00
Jonathan A. Sternberg	2ad1668c2a	Prevent a panic when aggregates are used in an inner query with a raw query The following types of queries will panic: SELECT mean, host FROM (SELECT mean(value) FROM cpu GROUP BY host) SELECT top(sum, host, 3) FROM (SELECT sum(value) FROM cpu GROUP BY host) These queries _should_ work, but due to a current limitation with aggregate functions, the aggregate functions won't return any auxiliary fields. So even if a tag is not an auxiliary field, it is treated that way by the query engine and this query will fail. Fixing this properly will take a longer period of time. This fix just prevents the panic from killing the server while we fix this for real.	2017-02-08 11:44:56 -06:00
Jonathan A. Sternberg	83c6d53294	Support the WHERE clause in outer queries with subqueries	2017-01-23 15:01:32 -06:00
Jonathan A. Sternberg	3d4d9062a0	Update subqueries so groupings are propagated to inner queries Previously, only time expressions got propagated inwards. The reason for this was simple. If the outer query was going to filter to a specific time range, then it would be unnecessary for the inner query to output points within that time frame. It started as an optimization, but became a feature because there was no reason to have the user repeat the same time clause for the inner query as the outer query. So we allowed an aggregate query with an interval to pass validation in the subquery if the outer query had a time range. But `GROUP BY` clauses were not propagated because that same logic didn't apply to them. It's not an optimization there. So while grouping by a tag in the outer query without grouping by it in the inner query was useless, there wasn't any particular reason to care. Then a bug was found where wildcards would propagate the dimensions correctly, but the outer query containing a group by with the inner query omitting it wouldn't correctly filter out the outer group by. We could fix that filtering, but on further review, I had been seeing people make that same mistake a lot. People seem to just believe that the grouping should be propagated inwards. Instead of trying to fight what the user wanted and explicitly erase groupings that weren't propagated manually, we might as well just propagate them for the user to make their lives easier. There is no useful situation where you would want to group into buckets that can't physically exist so we might as well do _something_ useful. This will also now propagate time intervals to inner queries since the same applies there. But, while the interval propagates, the following query will not pass validation since it is still not possible to use a grouping interval with a raw query (even if the inner query is an aggregate): SELECT * FROM (SELECT mean(value) FROM cpu) WHERE time > now() - 5m GROUP BY time(1m) This also means wildcards will behave a bit differently. They will retrieve dimensions from the sources in the inner query rather than just using the dimensions in the group by. Fixing top() and bottom() to return the correct auxiliary fields. Unfortunately, we were not copying the buffer with the auxiliary fields so those values would be overwritten by a later point.	2017-01-23 15:01:19 -06:00
Jonathan A. Sternberg	3ba950b029	Fix for subqueries to use the parallel iterator correctly Also, fix the `Iterators.Merge(IteratorOptions)` function so it consults the `Ordered` attribute to determine which iterator it should use to merge the input iterators.	2017-01-11 10:47:18 -06:00
Jonathan A. Sternberg	d7c8c7ca4f	Support subquery execution in the query language This adds query syntax support for subqueries and adds support to the query engine to execute queries on subqueries. Subqueries act as a source for another query. It is the equivalent of writing the results of a query to a temporary database, executing a query on that temporary database, and then deleting the database (except this is all performed in-memory). The syntax is like this: SELECT sum(derivative) FROM (SELECT derivative(mean(value)) FROM cpu GROUP BY *) This will execute derivative and then sum the result of those derivatives. Another example: SELECT max(min) FROM (SELECT min(value) FROM cpu GROUP BY host) This would let you find the maximum minimum value of each host. There is complete freedom to mix subqueries with auxiliary fields. The only caveat is that the following two queries: SELECT mean(value) FROM cpu SELECT mean(value) FROM (SELECT value FROM cpu) Have different performance characteristics. The first will calculate `mean(value)` at the shard level and will be faster, especially when it comes to clustered setups. The second will process the mean at the top level and will not include that optimization.	2017-01-07 13:00:48 -06:00
Mark Rushakoff	88b8bd2465	Update godoc for package influxql I did not look at any of the .gen.go files.	2016-12-30 18:02:52 -08:00
Jonathan A. Sternberg	c05c7f6360	Revert "limit shard concurrency" This reverts commit `6c7d56d4bc`.	2016-08-29 12:39:52 -05:00
Jonathan A. Sternberg	10029caf2f	Support negative timestamps in the query engine Negative timestamps are now supported. We also now refuse two nanoseconds that are at the edge of the minimum time window. One of the nanoseconds we do not accept is because we need MinInt64 to be used for some internal comparisons in the TSM engine and it was causing an underflow when we subtracted one from the minimum time. The second is so we can have one minimum time that signifies the default minimum that nobody can write to (so we can implicitly rewrite the timestamp on aggregate queries) but still use the explicit timestamp if it is given to us by the user. We aren't able to tell the difference between if the user provided it or if it was implicit without those values being different. If the default minimum time is used with an aggregate query, we rewrite the time to be the epoch for backwards compatibility since we believe that's more important than supporting that extra nanosecond.	2016-08-25 12:52:41 -05:00
Jonathan A. Sternberg	4cdfc3280d	Move the CQ interval by the group by offset This will make the period selected by the CQ system work correctly for a query with an offset.	2016-08-05 14:39:52 -05:00
Ben Johnson	6c7d56d4bc	limit shard concurrency This commit limits queries to only process one shard at a time. However, within a shard, multiple series can still be processed in parallel. Shard iterators are lazily instantiated during query execution to limit the amount of memory a given query uses.	2016-08-05 09:45:57 -06:00
Jason Wilder	0b60862248	Close drained iterators Aux and condition iterators where not closed which could cause TSM files to leak if they were queried against while a compaction was running.	2016-07-28 20:25:37 -06:00
Ben Johnson	5df6f75545	check for nil iterator creation This commit checks if an iterator is `nil` before adding to an iterator list during creation.	2016-07-27 13:54:56 -06:00
Jonathan A. Sternberg	8e1b036b0a	Modify the max nanosecond time to be one nanosecond less The highest time represented by a nanosecond needs to be used for an exclusive range, so the maximum time needs to be one less than the possible maximum number of nanoseconds representable by an int64 so that we don't lose a point at that one time. Previously worked in the open source version because the timestamp used for finding a shard would be truncated by the retention policy so the lookup time didn't run into this edge case because it didn't rest on the truncation boundary. Since that point didn't really belong in that shard group and was placed there by mistake, it's best to fix this bug since the timestamp used to create the shard group should be capable of retrieving it.	2016-06-16 12:15:41 -05:00
Jonathan A. Sternberg	b972c220aa	Merge pull request #6757 from influxdata/js-refactor-execute-query Refactor ExecuteQuery to take options as a struct	2016-06-07 10:35:52 -05:00
Ben Johnson	3fa5cefa32	add Iterators.Merge()	2016-06-03 10:27:17 -06:00
Jonathan A. Sternberg	1e84b22407	Update SHOW TAG VALUES to use a fast dedupe iterator Include a benchmark test for the fast dedupe iterator.	2016-06-02 22:03:59 -05:00
Jonathan A. Sternberg	71c8e9e567	Refactor ExecuteQuery to take options as a struct This allows us to add additional options to ExecuteQuery without creating parameter bloat. Removing the unused Series structs. Their necessity was removed by a previous commit, but the structs were not removed yet. Add another type of interrupt iterator that monitors the interrupt channel and calls `Close()` on the iterator when the interrupt happens. It will primarily be used for asynchronously closing the ReaderIterator, but it will only close the read side of the connection properly. More work needs to be done to allow closing the write side efficiently.	2016-06-01 12:30:52 -05:00
Nathaniel Cook	2927fee2d1	update comment on MaxTime	2016-05-27 11:07:50 +01:00
Nathaniel Cook	9314ae8e80	fix overflow in window iterator and holt winters roundTime	2016-05-27 11:07:50 +01:00
Edd Robinson	f4fc905fa9	Reject timestamps too far in future	2016-05-27 11:07:48 +01:00
Jonathan A. Sternberg	23f6a706bb	Support cast syntax for selecting a specific type Casting syntax is done with the PostgreSQL syntax `field1::float` to specify which type should be used when selecting a field. You can also do `field1::field` or `tag1::tag` to specify that a field or tag should be selected. This makes it possible to select a tag when a field key and a tag key conflict with each other in a measurement. It also means it's possible to choose a field with a specific type if multiple shards disagree. If no types are given, the same ordering for how a type is chosen is used to determine which type to return. The FieldDimensions method has been updated to return the data type for the fields that get returned. The SeriesKeys function has also been removed since it is no longer needed. SeriesKeys was originally used for the fill iterator, but then expanded to be used by auxiliary iterators for determining the channel iterator types. The fill iterator doesn't need it anymore and the auxiliary types are better served by FieldDimensions implementing that functionality, so SeriesKeys is no longer needed. Fixes #6519.	2016-05-16 12:08:29 -04:00
Ben Johnson	078e561820	parallelize iterators	2016-05-09 10:25:30 -06:00
Ben Johnson	fdf34d4356	move call iterator to series level This commit moves the `CallIterator` to wrap the individual series instead of wrapping a shard. This allows individual points to be aggregated before being merged. This will cause a small increase in memory usuage per series but it shows a 20% decrease in query time when there are a moderate number of points per series.	2016-05-05 09:59:03 -06:00
Ben Johnson	417df18396	Merge pull request #6533 from benbjohnson/optimize-show-series Optimize SHOW SERIES	2016-05-03 09:15:21 -06:00
Jonathan A. Sternberg	a2a5c32770	Merge pull request #6539 from influxdata/js-6495-fix-aggregates-with-empty-shards Fix aggregate returns when data is missing from some shards	2016-05-03 10:56:21 -04:00
Ben Johnson	49eb3b8d04	optimize show series iterator This commit changes the `SeriesIterator` to process one measurement at a time and uses a `floatFastDedupeIterator` to avoid point encoding during deduplication.	2016-05-03 08:52:44 -06:00
Jonathan A. Sternberg	d6d0addcec	Fix aggregate returns when data is missing from some shards If a shard is empty for a specific field and the field type is something other than a float, a nil iterator would get returned from one of the empty shards and cause the combined iterators to be cast to the float type and all other iterator types to be discarded (or for integers, to be cast). This is rare since most aggregates don't accept strings or booleans, but for queries like: SELECT distinct(string) FROM mydata It would result in nothing getting returned if one of the shards didn't have a value for `string`. This change modifies the query engine to return nil for the shards instead of a fake iterator and then to only use the fake iterator if the final aggregate iterator is nil (meaning that no iterators could be constructed for the field from any shard). Fixes #6495.	2016-05-03 10:41:22 -04:00
Jonathan A. Sternberg	64556e4f8e	Support offset argument in the GROUP BY time(...) call An offset of `time(1m, now())` will anchor the offset to the current time of the query. The default offset is `0s` which is the current default anyway. This fixes #2074 by making time zone offset support unnecessary. Time comparisons can use timezones inside of the time clause and the offset needed for non-hour timezone differences can be used as part of the offset argument.	2016-05-02 14:02:35 -04:00
Jonathan A. Sternberg	c8c38e15cd	Merge pull request #6386 from influxdata/js-iterator-next-error Modify all of the iterators to allow returning an error on Next()	2016-04-20 10:39:53 -04:00
Nathaniel Cook	465f5a375f	add elapsed function	2016-04-19 12:54:54 -06:00
Jonathan A. Sternberg	7ec2a991d5	Modify all of the iterators to allow returning an error on Next() This also switches the remaining iterators to be lazy so they can return errors properly. They needed to be converted to lazy initialization anyway, which has the side effect of making it much easier for us to propagate the underlying error during initialization. Updated the Emitter to return errors when it cannot read properly from the iterators.	2016-04-18 11:17:55 -04:00
Ben Johnson	4f381d03d7	add double buffer on chan iterator This commit changes the channel iterators to use a double buffer to reduce allocations. The caller of `Iterator.Next()` must copy out the point before calling `Next()` again.	2016-04-14 13:52:13 -06:00
Ben Johnson	525e22c92b	tsm1 query engine alloc reduction This commit makes a number of performance improvements to reduce allocations during query execution. Several objects and buffers are now reused across the components to avoid allocations. Previously a simple `count(value)` query across 1M points would require 26,000+ allocations. After the changes in this commit that number has been reduced to 88.	2016-04-11 14:50:59 -06:00
Jonathan A. Sternberg	fa5a38dcd4	Fixing aggregate queries with no GROUP BY to include the end time Queries with a time constraint but no group by would not include the final point from the underlying iterator. Fixes #6229.	2016-04-07 14:11:28 -04:00
Edd Robinson	dfee15bd19	Scopes influxql Protobuf package to prevent clashes Fixes #6211. In Go-land packages with the same name, e.g., internal, do not clash with each other when they're in different parts of the project. However with protobufs definitions will clash if they share the same package name. This commit renames the influxql protobuf package to `influxql` to avoid a clash with a message definition in another protobuf package called internal. Go package aliases allow us to continue to refer to the internal package as `internal` rather than `influxql`.	2016-04-05 13:36:47 +01:00
Jonathan A. Sternberg	43e3330480	Fix the reader iterator so it doesn't read the first point when creating the iterator	2016-04-01 17:31:28 -04:00
Jonathan A. Sternberg	c193bde61c	Throw an error when time is compared to an invalid literal A bigger refactor of these functions is needed to support #3290, but this will work for the more common case that someone uses double quotes instead of single quotes when surrounding a time literal. Fixes #3932.	2016-03-31 11:29:20 -06:00
Jonathan A. Sternberg	3a7d537ee6	Merge pull request #6028 from influxdata/js-5116-default-no-fill-for-select-into Modify fill(null) to fill(none) in SELECT INTO queries	2016-03-22 12:13:17 -04:00
Ben Johnson	7156c1f9bd	add IteratorStats This commit adds an `IteratorStats` that holds aggregate iterator processing information. A method is also added to `Iterator` to return the stats: Stats() influxql.IteratorStats The remote iterators will also emit their stats in the point stream upon first connection, on a given interval, and then finally once the last point has been sent.	2016-03-21 16:25:19 -06:00
Jonathan A. Sternberg	6655ca7769	Create a new interrupt iterator that will stop emitting points after an interrupt Use of the iterator is spread out into both `IteratorCreators` and inside of the iterators themselves. Part of the interrupt must be handled inside of the engine so it stops trying to emit points when an interrupt is found and another part of the interrupt has to happen when combining the iterators so it doesn't just start reading the next shard.	2016-03-21 12:07:07 -04:00
Jonathan A. Sternberg	eab6ac3871	Modify fill(null) to fill(none) in SELECT INTO queries Fixes #5116.	2016-03-16 11:14:41 -04:00
Cory LaNou	ba6a95e9bc	Merge pull request #5994 from influxdata/single-server-lite Single Server	2016-03-14 16:11:37 -05:00
Jonathan A. Sternberg	0042866002	Teach the AuxIterator how to background Now the AuxIterator will know when it is backgrounded so that it can stop reading from the primary iterator when all of the child iterators have been closed.	2016-03-14 11:12:02 -04:00

1 2

77 Commits (810c1839638b2239f2e15d52bdc69bd586b1ab0f)