influxdb

Commit Graph

Author	SHA1	Message	Date
Jonathan A. Sternberg	f628b4a198	Update subqueries so groupings are propagated to inner queries Previously, only time expressions got propagated inwards. The reason for this was simple. If the outer query was going to filter to a specific time range, then it would be unnecessary for the inner query to output points within that time frame. It started as an optimization, but became a feature because there was no reason to have the user repeat the same time clause for the inner query as the outer query. So we allowed an aggregate query with an interval to pass validation in the subquery if the outer query had a time range. But `GROUP BY` clauses were not propagated because that same logic didn't apply to them. It's not an optimization there. So while grouping by a tag in the outer query without grouping by it in the inner query was useless, there wasn't any particular reason to care. Then a bug was found where wildcards would propagate the dimensions correctly, but the outer query containing a group by with the inner query omitting it wouldn't correctly filter out the outer group by. We could fix that filtering, but on further review, I had been seeing people make that same mistake a lot. People seem to just believe that the grouping should be propagated inwards. Instead of trying to fight what the user wanted and explicitly erase groupings that weren't propagated manually, we might as well just propagate them for the user to make their lives easier. There is no useful situation where you would want to group into buckets that can't physically exist so we might as well do _something_ useful. This will also now propagate time intervals to inner queries since the same applies there. But, while the interval propagates, the following query will not pass validation since it is still not possible to use a grouping interval with a raw query (even if the inner query is an aggregate): SELECT * FROM (SELECT mean(value) FROM cpu) WHERE time > now() - 5m GROUP BY time(1m) This also means wildcards will behave a bit differently. They will retrieve dimensions from the sources in the inner query rather than just using the dimensions in the group by. Fixing top() and bottom() to return the correct auxiliary fields. Unfortunately, we were not copying the buffer with the auxiliary fields so those values would be overwritten by a later point.	2017-01-23 12:38:10 -06:00
Jonathan A. Sternberg	d7c8c7ca4f	Support subquery execution in the query language This adds query syntax support for subqueries and adds support to the query engine to execute queries on subqueries. Subqueries act as a source for another query. It is the equivalent of writing the results of a query to a temporary database, executing a query on that temporary database, and then deleting the database (except this is all performed in-memory). The syntax is like this: SELECT sum(derivative) FROM (SELECT derivative(mean(value)) FROM cpu GROUP BY *) This will execute derivative and then sum the result of those derivatives. Another example: SELECT max(min) FROM (SELECT min(value) FROM cpu GROUP BY host) This would let you find the maximum minimum value of each host. There is complete freedom to mix subqueries with auxiliary fields. The only caveat is that the following two queries: SELECT mean(value) FROM cpu SELECT mean(value) FROM (SELECT value FROM cpu) Have different performance characteristics. The first will calculate `mean(value)` at the shard level and will be faster, especially when it comes to clustered setups. The second will process the mean at the top level and will not include that optimization.	2017-01-07 13:00:48 -06:00
Mark Rushakoff	a29781286b	Use local RNG in SampleReducer The reducers already had a local RNG but mistakenly did not use it when sampling points. Because the local RNG is not protected by a mutex, there is a slight speedup as a result of this change: benchmark old ns/op new ns/op delta BenchmarkSampleIterator_1k-4 418 418 +0.00% BenchmarkSampleIterator_100k-4 434 422 -2.76% BenchmarkSampleIterator_1M-4 449 439 -2.23% benchmark old allocs new allocs delta BenchmarkSampleIterator_1k-4 3 3 +0.00% BenchmarkSampleIterator_100k-4 3 3 +0.00% BenchmarkSampleIterator_1M-4 3 3 +0.00% benchmark old bytes new bytes delta BenchmarkSampleIterator_1k-4 304 304 +0.00% BenchmarkSampleIterator_100k-4 304 304 +0.00% BenchmarkSampleIterator_1M-4 304 304 +0.00% The speedup would presumably increase when multiple sample iterators are used concurrently.	2016-12-15 12:33:19 -08:00
Michael Desa	f9b8129770	Add sample function to query language First Pass at implementing sample Add sample iterators for all types Remove size from sample struct Fix off by one error when generating random number Add benchmarks for sample iterator Add test and associated fixes for off by one error Add test for sample function Remove NumericLiteral from sample function call Make clear that the counter is incr w/ each call Rename IsRandom to AllSamplesSeen Add a rng for each reducer that is created The default rng that comes with math/rand has a global lock. To avoid having to worry about any contention on the lock, each reducer now has its own time seeded rng. Add sample function to changelog	2016-10-06 09:41:42 -07:00
Jonathan A. Sternberg	252cde1e81	Fix golint errors for the influxql package	2016-06-20 08:51:02 -05:00
Nathaniel Cook	ce74fe0b06	count and sum return 0 for empty intervals	2016-06-01 15:53:23 -06:00
Nathaniel Cook	465f5a375f	add elapsed function	2016-04-19 12:54:54 -06:00
Jonathan A. Sternberg	6708d0c439	Optimize the distinct call Change distinct so it uses a custom reducer that keeps internal state instead of requiring all of the points to be kept as a slice in memory. Fixes #6261.	2016-04-11 18:29:50 -04:00
Jonathan A. Sternberg	9c5bc8ab2b	Refactor reduce slice func to use the aggregator and emitter	2016-03-07 13:25:45 -05:00
Jonathan A. Sternberg	e3660fae93	Support all iterator types for count(), first(), and last() All three of these iterators are supposed to support all four types of iterators, but the implementation was never done for string or boolean. Fixes #5886.	2016-03-02 23:49:55 -05:00
Jonathan A. Sternberg	1c543b28a9	Refactored call iterators to make them public and more usable as a library This refactor is primarily to support Kapacitor. Kapacitor doesn't care about the iterators and mostly keeps the points it handles in memory. The iterator interface is more than Kapacitor cares about. This commit refactors and opens up the internals of aggregating and reducing incoming points so it can be used by an outside library with the same code. It also makes the iterators used by the call iterators publically usable with new functionality. Reducers are split into two methods which are separate interfaces that can be combined for dealing with casting between different types. The Aggregator interfaces accept points into the aggregator and retain any internal state they need. The Emitter interface will then create a point from that aggregated state which can be fed to the iterator. The Emitters do not fill in the name or tag of the point as that is expected to be done by the person aggregating the point. While the Emitters do sometimes fill in the time, that value will also be overwritten by the iterator. Filling in the time is to allow a future version that will allow returning the point time instead of just the interval time.	2016-03-02 16:10:49 -05:00

11 Commits (810c1839638b2239f2e15d52bdc69bd586b1ab0f)