influxdb

Commit Graph

Author	SHA1	Message	Date
Jason Wilder	8fda621d8b	Fix memory spike when compacting overwritten points If a large series contains a point that is overwritten, the compactor would load the whole series into RAM during a full compaction. If the series was large, it could cause very large RAM spikes and OOMs. The change reworks the compactor to merge blocks more incrementally similar to the fix done in #6556. Fixes #6557	2016-05-18 15:25:55 -06:00
Jonathan A. Sternberg	946968ba23	Fixing panic in SHOW FIELD KEYS caused by `733a17d` The list of field keys in the index may have differed from the field keys in the actual shard. Fixing `SHOW FIELD KEYS` so it relies only on the shard rather than the index. Fixes #6659.	2016-05-18 14:43:50 -04:00
Edd Robinson	f78e67d09c	Fix concurrent map access panic	2016-05-18 17:56:50 +01:00
Joe LeGasse	af432e7d12	Fix loop variable reuse in database close Fixes #6650	2016-05-17 11:25:39 -04:00
Jonathan A. Sternberg	42cdaf0365	Merge pull request #6529 from influxdata/js-6519-select-tag-key-specifier Support cast syntax for selecting a specific type	2016-05-16 12:30:14 -04:00
Jonathan A. Sternberg	23f6a706bb	Support cast syntax for selecting a specific type Casting syntax is done with the PostgreSQL syntax `field1::float` to specify which type should be used when selecting a field. You can also do `field1::field` or `tag1::tag` to specify that a field or tag should be selected. This makes it possible to select a tag when a field key and a tag key conflict with each other in a measurement. It also means it's possible to choose a field with a specific type if multiple shards disagree. If no types are given, the same ordering for how a type is chosen is used to determine which type to return. The FieldDimensions method has been updated to return the data type for the fields that get returned. The SeriesKeys function has also been removed since it is no longer needed. SeriesKeys was originally used for the fill iterator, but then expanded to be used by auxiliary iterators for determining the channel iterator types. The fill iterator doesn't need it anymore and the auxiliary types are better served by FieldDimensions implementing that functionality, so SeriesKeys is no longer needed. Fixes #6519.	2016-05-16 12:08:29 -04:00
Jason Wilder	ce141eae37	Merge pull request #6637 from influxdata/jw-revert-compact Revert "Fix memory spike when compacting overwritten points"	2016-05-16 09:46:24 -06:00
Jason Wilder	23fc9ff748	Revert "Fix memory spike when compacting overwritten points" This reverts commit `d99c5e26f6`.	2016-05-16 09:30:34 -06:00
Jonathan A. Sternberg	a17f3d960a	SHOW TAG VALUES accepts != and !~ in WHERE clause Fixes #6607.	2016-05-16 08:51:09 -04:00
Jason Wilder	5b6f3afefa	Limit concurrent shards loading to number of cores available	2016-05-13 15:41:32 -06:00
Jason Wilder	11871958c6	Merge pull request #6618 from influxdata/jw-shard-load Optimize shard index loading	2016-05-13 14:16:17 -06:00
Jason Wilder	9e54adc719	Speed up drop database Drop database was closing and deleting each shard dir individually and serially. It would then delete the empty database dirs. This changes drop database to close all shards in parallel and run one os.RemoveAll to remove everything under the db dir which is more efficient. This also reworked the locking to avoid locking the tsdb.Store for long periods of time. That can cause queries and writes for other databases to block as well.	2016-05-13 10:26:28 -06:00
Jason Wilder	0dbd4893da	Optimize shard index loading On data sets with many series and potentially large series keys, the cost of parsing the key and re-indexing can be high. Loading the TSM keys into the index was being done repeatedly for series that were already index by an earlier TSM file. This was wasted worked and slows down shard loading. Parsing the key was also innefficient and allocated a new string slice. This was simplified to remove that allocation.	2016-05-12 14:02:42 -06:00
Ben Johnson	7afb73aa99	Merge pull request #6598 from benbjohnson/parallelize-planning Parallelize query planning	2016-05-12 09:00:58 -06:00
Jonathan A. Sternberg	89346bb618	Merge pull request #6600 from influxdata/0.13 Merge 0.13 release candidate back to master	2016-05-11 13:04:26 -04:00
Ben Johnson	668bae57df	parallelize query planning This commit changes the `tsm1.Engine` to create individual series iterators in batches so that it can be parallelized. Iterators are combined at the end so they can be redistributed to the parallelized merge iterator.	2016-05-11 10:38:11 -06:00
Cory LaNou	c32906a366	Merge pull request #6593 from influxdata/cjl-copyshard create shard snapshot	2016-05-10 20:01:59 -05:00
Jonathan A. Sternberg	8353b0c20f	Merge pull request #6592 from influxdata/js-3451-show-field-keys-with-field-type Update SHOW FIELD KEYS to return the field type with the field key	2016-05-10 14:13:17 -04:00
Jason Wilder	d8490f1170	Merge pull request #6587 from influxdata/jw-validate-fields Fix for merge values	2016-05-10 11:56:07 -06:00
Jonathan A. Sternberg	733a17d9e9	Update SHOW FIELD KEYS to return the field type with the field key Fixes #3451.	2016-05-10 13:16:57 -04:00
Cory LaNou	f415cf89ad	wip	2016-05-10 11:01:03 -05:00
Jason Wilder	9b86bfea2a	Merge pull request #6582 from eleme/fix_engine_cache_size fix cache size of engine	2016-05-10 09:01:03 -06:00
Jason Wilder	8839cabd41	Add benchmark for Merge	2016-05-10 08:39:55 -06:00
Cory LaNou	4d30ea1eb3	minor PR feedback refactor	2016-05-10 08:14:51 -05:00
Cory LaNou	a3bf3e2ef1	added baseline backup/restore plumbing	2016-05-10 08:14:51 -05:00
Jason Wilder	4f39cb2f97	Fix case where Merge return unsorted values	2016-05-09 15:40:34 -06:00
Ben Johnson	078e561820	parallelize iterators	2016-05-09 10:25:30 -06:00
Jonathan A. Sternberg	3f4072be7a	Fix SHOW TAG VALUES condition to not filter "name" erroneously Before #6038 was merged, we needed to filter "name" so that it didn't accidentally hit the code path that used "name" to check the name of a measurement. This was changed to "_name" to avoid a conflict with a legitimate tag that used "name" as the key. SHOW TAG VALUES was never modified to remove the code that filtered out "name". This removes that line of code so a condition with "name" doesn't get removed erroneously. Example: SHOW TAG VALUES WITH KEY = host WHERE "name" = 'jsternberg' Fixes #6581.	2016-05-09 10:27:53 -04:00
thbourlove	22c2e7e1c5	fix cache memory size of engine	2016-05-09 21:29:34 +08:00
Jason Wilder	d99c5e26f6	Fix memory spike when compacting overwritten points If a large series contains a point that is overwritten, the compactor would load the whole series into RAM during a full compaction. If the series was large, it could cause very large RAM spikes and OOMs. The change reworks the compactor to merge blocks more incrementally similar to the fix done in #6556.	2016-05-05 22:31:30 -06:00
Ben Johnson	4c45f8ec32	Merge pull request #6560 from benbjohnson/optimize-tsm1-call-iterator Move call iterator to series level	2016-05-05 11:13:53 -06:00
Ben Johnson	fdf34d4356	move call iterator to series level This commit moves the `CallIterator` to wrap the individual series instead of wrapping a shard. This allows individual points to be aggregated before being merged. This will cause a small increase in memory usuage per series but it shows a 20% decrease in query time when there are a moderate number of points per series.	2016-05-05 09:59:03 -06:00
Jason Wilder	a0ac754802	Fix loading huge series into RAM when points are overwritten In some query scenarios, if there are a lot of points on disk spread across many blocks in TSM files and a point is overwritten near the begginning of the shard's timerange, the full series could be loaded into RAM triggering OOMs and huge allocations. The issue was that the KeyCursor code that handles overwriting points had a simple implementation that just deduped the whole series in this case. This falls over when the series is quite large. Instead, the KeyCursor has been changed to only decode blocks with updated points. It then keeps track of what section of the blocks have been read so they are not re-read when the later points are decoded. Since the points in a block are always sorted, the code was also changed to remove the Deduplicate calls since they end up reallocating the slice. Instead, we do a sorted merge and re-use the slice as much as we can.	2016-05-05 09:34:44 -06:00
Jason Wilder	57cb3fdbc0	Merge pull request #6522 from influxdata/tp-tsm-dump Dump TSM files to line protocol	2016-05-03 10:44:33 -06:00
Jason Wilder	4196554f51	Fix overwriting points returning wrong value The cursors were returning the wrong value in the case when points existed in both the cache and tsm files with the same timestamp. The cache value should have been returned, but the tsm value was returned incorrectly. Fixes #6439	2016-05-03 09:21:31 -06:00
Ben Johnson	417df18396	Merge pull request #6533 from benbjohnson/optimize-show-series Optimize SHOW SERIES	2016-05-03 09:15:21 -06:00
Edd Robinson	fd77dbe648	Merge pull request #6546 from influxdata/er-build-tag Fix invalid build tag	2016-05-03 16:00:39 +01:00
Jonathan A. Sternberg	a2a5c32770	Merge pull request #6539 from influxdata/js-6495-fix-aggregates-with-empty-shards Fix aggregate returns when data is missing from some shards	2016-05-03 10:56:21 -04:00
Ben Johnson	49eb3b8d04	optimize show series iterator This commit changes the `SeriesIterator` to process one measurement at a time and uses a `floatFastDedupeIterator` to avoid point encoding during deduplication.	2016-05-03 08:52:44 -06:00
Jonathan A. Sternberg	d6d0addcec	Fix aggregate returns when data is missing from some shards If a shard is empty for a specific field and the field type is something other than a float, a nil iterator would get returned from one of the empty shards and cause the combined iterators to be cast to the float type and all other iterator types to be discarded (or for integers, to be cast). This is rare since most aggregates don't accept strings or booleans, but for queries like: SELECT distinct(string) FROM mydata It would result in nothing getting returned if one of the shards didn't have a value for `string`. This change modifies the query engine to return nil for the shards instead of a fake iterator and then to only use the fake iterator if the final aggregate iterator is nil (meaning that no iterators could be constructed for the field from any shard). Fixes #6495.	2016-05-03 10:41:22 -04:00
Edd Robinson	d35fa1ec97	Remove redundant windows build tags	2016-05-03 14:22:02 +01:00
Jason Wilder	d82aa98951	Reduce indentation in filter func	2016-05-02 11:38:25 -06:00
Jason Wilder	e0304ae3d5	Fix shards not getting assigned to series on restart Also, simplifies the LoadMetaDataIndex func to not require a *Shard	2016-05-02 11:36:05 -06:00
Jason Wilder	2d09937fd2	Fix removing fully deleted index blocks If multiple tombstone entries happen to exist for the same key in a tombstone file, it was possible to panic. The first application would remove all index entries and the second time around the code still assumed entries would exist and would index into the nil slice. Also fixes a case where the range of time would fully delete all index entries, but it did not align with math.MinInt64 and math.MaxInt64. This would cause the index locations to still exist in the offset slice. This is inefficient because the BlockIterator would still scan and decode the block only to discover that all the values are deleted. We now just remove it from the offsets slice in this case since the range of values are deleted.	2016-05-02 11:36:05 -06:00
Jason Wilder	58aa65d5a8	Optimize applyTombstones When a large tombstone file existed on disk, this code was slow since it would apply each tombstone to the index one at a time causing the index to be scanned for each key. Instead, we group all the tombstones together by timestamp and apply in bulk so that the index in scan once for each set of tombstones. If we change to immuntable tombstone files, it might be better to just write a file where all the keys have the same tombstone so we can re-apply them efficiently.	2016-05-02 11:36:05 -06:00
Jason Wilder	c73c7cea25	Revert filtering index entries in BlockIterator This was the wrong fix. The real issue was the tombstones were being read incorrectly and also applied incorrectly at times. This code is slower and not necessary so reverting it.	2016-05-02 11:36:04 -06:00
Jason Wilder	3a7429886e	Optimize Measurement.DropSeries	2016-05-02 11:36:04 -06:00
Jason Wilder	f9ace932c0	Fix V2 tombstone reading file position Each iteration of the loop was incrementing the position by 4 incorrectly. The position should start at four since the header is 4 bytes. This caused tombstones at the end of the file to not be read because the counter was out of sync with the actual file position which cause the loop to exit early. Probably better to refactor this to check for io.EOF instead of using the counter.	2016-05-02 11:36:04 -06:00
Jason Wilder	61e0d8ff93	Fix log prefix formatting	2016-05-02 11:36:04 -06:00
Jason Wilder	bd1009080e	Prevent writing empty tombstone files If you delete from a measurement with a tag those does not match any series, we would write a empty tombstone file and file to load it back.	2016-05-02 11:36:04 -06:00

1 2 3 4 5 ...

1104 Commits (8fda621d8b41b383e0c9895fdab77809285fdf34)