influxdb

Commit Graph

Author	SHA1	Message	Date
Jonathan A. Sternberg	7b9b55bfc0	Optimize top() and bottom() using an incremental aggregator The previous version of `top()` and `bottom()` would gather all of the points to use in a slice, filter them (if necessary), then use a slightly modified heap sort to retrieve the top or bottom values. This performed horrendously from the standpoint of memory. Since it consumed so much memory and spent so much time in allocations (along with sorting a potentially very large slice), this affected speed too. These calls have now been modified so they keep the top or bottom points in a min or max heap. For `top()`, a new point will read the minimum value from the heap. If the new point is greater than the minimum point, it will replace the minimum point and fix the heap with the new value. If the new point is smaller, it discards that point. For `bottom()`, the process is the opposite. It will then sort the final result to ensure the correct ordering of the selected points. When `top()` or `bottom()` contain a tag to select, they have now been modified so this query: SELECT top(value, host, 2) FROM cpu Essentially becomes this query: SELECT top(value, 2), host FROM ( SELECT max(value) FROM cpu GROUP BY host ) This should drastically increase the performance of all `top()` and `bottom()` queries.	2017-05-19 11:56:46 -05:00
Jonathan A. Sternberg	be3bce5212	top() and bottom() now returns the time for every point `top()` and `bottom()` will now organize the points by time and also keep the points original time even when a time grouping is used. At the same time, `top()` and `bottom()` will no longer honor any fill options that are present since they don't really make sense for these specific functions. This also fixes the aggregate and selectors to honor the ordered iterator option so iterator remain ordered and to also respect the buckets that are created by the final dimensions of the query so that two buckets don't overlap each other within the same reducer. A test has been added for this situation. This should clarify and encourage the use of the ordered attribute within the query engine.	2017-04-26 15:07:10 -05:00
Michael Desa	f9b8129770	Add sample function to query language First Pass at implementing sample Add sample iterators for all types Remove size from sample struct Fix off by one error when generating random number Add benchmarks for sample iterator Add test and associated fixes for off by one error Add test for sample function Remove NumericLiteral from sample function call Make clear that the counter is incr w/ each call Rename IsRandom to AllSamplesSeen Add a rng for each reducer that is created The default rng that comes with math/rand has a global lock. To avoid having to worry about any contention on the lock, each reducer now has its own time seeded rng. Add sample function to changelog	2016-10-06 09:41:42 -07:00
Ashish Gaurav	4e17f9bb13	add mode() function & tests	2016-08-23 19:31:41 -05:00
Ashish Gaurav	70c8c021ac	added benchmark tests for median aggrergator (Package: influxql,influxql_test)	2016-08-04 08:02:19 +05:30
Jonathan A. Sternberg	1e84b22407	Update SHOW TAG VALUES to use a fast dedupe iterator Include a benchmark test for the fast dedupe iterator.	2016-06-02 22:03:59 -05:00
Jonathan A. Sternberg	a05e2b164e	Support booleans for min() and max() Fixes #6494.	2016-04-29 14:56:22 -04:00
Jonathan A. Sternberg	7ec2a991d5	Modify all of the iterators to allow returning an error on Next() This also switches the remaining iterators to be lazy so they can return errors properly. They needed to be converted to lazy initialization anyway, which has the side effect of making it much easier for us to propagate the underlying error during initialization. Updated the Emitter to return errors when it cannot read properly from the iterators.	2016-04-18 11:17:55 -04:00
Ben Johnson	f7f35affd2	add distinct iterator benchmark	2016-04-12 13:22:03 -06:00
Ben Johnson	525e22c92b	tsm1 query engine alloc reduction This commit makes a number of performance improvements to reduce allocations during query execution. Several objects and buffers are now reused across the components to avoid allocations. Previously a simple `count(value)` query across 1M points would require 26,000+ allocations. After the changes in this commit that number has been reduced to 88.	2016-04-11 14:50:59 -06:00
Jonathan A. Sternberg	6655ca7769	Create a new interrupt iterator that will stop emitting points after an interrupt Use of the iterator is spread out into both `IteratorCreators` and inside of the iterators themselves. Part of the interrupt must be handled inside of the engine so it stops trying to emit points when an interrupt is found and another part of the interrupt has to happen when combining the iterators so it doesn't just start reading the next shard.	2016-03-21 12:07:07 -04:00
Jonathan A. Sternberg	9113839e4c	Fix sorting of `first()` and `last()` calls across shards Previously the call iterator would normalize the time to the interval for all calls. This meant that when `first()` or `last()` was called with no group by interval the value would be found for each shard, the time was normalized, then it tried to find the value between the shards (but no longer with any time data as that had already been eliminated). This removes part of the time logic from the call iterators and makes a new iterator `IntervalIterator` to normalize the times as they come out of the underlying iterator. Fixes #5890.	2016-03-03 21:15:43 -05:00
Jonathan A. Sternberg	e3660fae93	Support all iterator types for count(), first(), and last() All three of these iterators are supposed to support all four types of iterators, but the implementation was never done for string or boolean. Fixes #5886.	2016-03-02 23:49:55 -05:00
Jonathan A. Sternberg	7a03df2af1	Remove the non-unreachable panics in the new query engine The only panics left are ones that should be unreachable unless there is a bug. Fixes #5777.	2016-02-22 12:52:43 -05:00
Jonathan A. Sternberg	18c7c554ba	Optimize the mean() call by moving the calculation into the shard iterator A new attribute has been added to points to track how many points were used to calculate that point. This is particularly useful for finding the mean as we can then split mean calculation into two phases: one at the shard level and a second at the shards level. This optimization is now used so we don't have to hold so many points in memory while calculating the mean.	2016-02-16 10:32:34 -05:00
Ben Johnson	5a0d1ab7c1	rename influxdb/influxdb to influxdata/influxdb This commit changes all the import and URL references from: github.com/influxdb/influxdb to: github.com/influxdata/influxdb	2016-02-10 10:26:18 -07:00
Jonathan A. Sternberg	c602503c7c	Fix reduce iterators to separate by name Previously reduce iterators just separated points by tags. If you had identical tags but different names, it would group those together so you could have these two points: cpu value=1 mem value=2 When you performed a `mean(value)` call and included both cpu and mem as sources, it would return one mem series with a value of 1.5 instead of two serieses.	2016-02-10 09:40:28 -07:00
Jonathan A. Sternberg	76b49b3ab3	Fixed a bug in first() and last() where the time was lost last() would always return the last output of the iterator (which isn't necessarily the last time value due to how the merge iterator works) and first() would always return the first output of the iterator (wrong for the same reason). Now the time is kept by the reduce function and the times are wiped as part of the reduce iterator after the value has been found.	2016-02-10 09:40:26 -07:00
Ben Johnson	b8918a780c	integer support	2016-02-10 09:40:25 -07:00
Ben Johnson	00806de9b8	refactor query engine	2016-02-10 09:40:25 -07:00
Ben Johnson	cde973f409	refactor query engine	2016-02-10 09:40:24 -07:00

21 Commits (643b2eb30cd4726e58a9052ee6f15ec9bea759cf)