influxdb

Commit Graph

Author	SHA1	Message	Date
Jason Wilder	46fdcba6e3	Remove compaction enabled logging Too verbose	2016-07-17 23:53:12 -06:00
Jason Wilder	2fa28ba1d3	Don't log error when compactions are aborted	2016-07-17 23:53:12 -06:00
Jason Wilder	b48d88ce9e	Abort running compactions when series are deleted If a delete is issued while a compaction is running, the a newly deleted series could re-appear after the compaction completed. This could occur the compaction had already written the blocks for series that were just deleted. When the compaction completes, the newly written tombstone files would be deleted, essentially undeleting the series.	2016-07-17 23:53:12 -06:00
Jason Wilder	cc4a668be5	Don't return statistic if engine is closed	2016-07-17 23:53:12 -06:00
Jason Wilder	6710c69aa5	Merge pull request #7015 from influxdata/jw-drop Speed up delete/drop statements	2016-07-15 12:41:08 -06:00
Jason Wilder	21dbe7e854	Simplify throttle type	2016-07-15 12:14:25 -06:00
Jason Wilder	d1556e3964	Fix missing read locks before filtering	2016-07-15 10:08:26 -06:00
Jason Wilder	ff5d61d024	Speed up delete series Reduce lock contention and process shards in concurrently.	2016-07-14 17:31:34 -06:00
Jason Wilder	8f3ec3be43	Inline deleteShard Only used by one caller now	2016-07-14 17:31:34 -06:00
Jason Wilder	78201e19d0	Refactor DeleteDatabase to use filter/walk funcs	2016-07-14 17:31:34 -06:00
Jason Wilder	e0122efcf8	Speed up drop retention policy Reduce the lock contention on tsdb.Store by taking a short lived read-lock instead of a long write lock. Also close shards in parallel and drop the whole RP dir in bulk instead of each shard dir.	2016-07-14 17:31:34 -06:00
Jason Wilder	6d3d2f6fe9	Speed up drop measurement Reduces the lock contention on the tsdb.Store by taking a short read lock instead of a long write lock. Also processes shards in parallel instead of serially.	2016-07-14 17:31:29 -06:00
Jason Wilder	4254ad304c	Merge pull request #6851 from influxdata/md-add-benchmarks Add additional benchmarks for various schemas	2016-07-14 15:04:29 -06:00
Jason Wilder	0f5e994383	Fix panic in full compactions due to duplciate data in blocks Due to a bug in compactions, it's possible some blocks may have duplicate points stored. If those blocks are decoded and re-compacted, an assertion panic could trigger. We now dedup those blocks if necessary to remove the duplicate points and avoid the panic.	2016-07-14 11:32:36 -06:00
Jason Wilder	0264966f5c	Add index optimize planning step For larger datasets, it's possible for shards to get into a state where many large, dense TSM files exist. While the shard is still hot for writes, full compactions will skip these files since they are already fairly optimized and full compactions are expensive. If the write volume is large enough, the shard can accumulate lots of these files. When a file is in this state, it's index can contain every series which causes startup times to increase since each file must parse the full set of series keys for every file. If the number of series is high, the index can be quite large causing large amount of disk IO at startup. To fix this, a optmize compaction is run when a full compaction planning step decides there is nothing to do. The optimize compaction combines and spreads the data and series keys across all files resulting in each file containing the full series data for that shard and a subset of the total set of keys in the shard. This allows a shard to only store a series key once in the shard reducing storage size as well allows a shard to only load each key once at startup.	2016-07-14 11:32:36 -06:00
Jason Wilder	5ee20e04a8	Fix compaction level planner Large files created early in the leveled compactions could cause a shard to get into a bad state. This reworks the level planner to handle those cases as well as splits large compactions up into multiple groups to leverage more CPUs when possible.	2016-07-14 11:14:09 -06:00
Jonathan A. Sternberg	12a33fe0d3	Add stats and diagnostics to the TSM engine Track the number of TSM files in the file store and keep engine statistics related to the number of TSM compactions.	2016-07-07 19:35:55 -05:00
Jonathan A. Sternberg	837a9804cf	Refactoring the monitor service to avoid expvar Truncate the time interval output of the monitor service to be on even time intervals rather than on every minute based on the start time. This normalizes the output from the monitor service.	2016-07-07 11:13:58 -05:00
Jason Wilder	2f82d9a525	Truncate the slice when merging the caches	2016-07-05 12:12:21 -05:00
Jason Wilder	5aae28e14f	Merge pull request #6922 from influxdata/jw-6829 Fix panic: runtime error: index out of range	2016-06-28 09:38:19 -06:00
Jason Wilder	fdf0bac717	Fix panic: runtime error: index out of range Fixes #6829	2016-06-27 18:50:48 -06:00
kun	77ed719bc1	delete redundant code in NewStore function	2016-06-24 17:14:00 +08:00
Michael Desa	517d8d5881	Move benchmarks beneath other NewSeries	2016-06-23 10:15:37 -07:00
Jason Wilder	ca6bfac01a	Fix out of order blocks returned during query If there were blocks in later TSM files that were for overwritten points or writes into the past, they could be returned more than once or out of order causing the cursor values to be unsorted. One effect of this is that graphs in graphana would render with the line going all over the place in spots. This might also cause duplicate data to be returned. Fixes #6738	2016-06-22 17:34:44 -06:00
Jonathan A. Sternberg	7bdcd669a8	Merge pull request #6879 from influxdata/js-prune-deadcode Removing dead code from every package except influxql	2016-06-22 08:12:19 -05:00
Jonathan A. Sternberg	497db2a6d3	Removing dead code from every package except influxql The tsdb package had a substantial amount of dead code related to the old query engine still in there. It is no longer used, so it was removed since it was left unmaintained. There is likely still more code that is the same, but wasn't found as part of this code cleanup. influxql has dead code show up because of the code generation so it is not included in this pruning.	2016-06-20 22:41:07 -05:00
Jonathan A. Sternberg	8812bc8a93	Remove a double lock in the tsm1 index writer	2016-06-20 17:32:34 -05:00
Jonathan A. Sternberg	1d03151631	Remove FieldCodec from tsdb package Updated `influx_inspect` to use the `FieldDimensions` method instead (more reliable anyway). The `influx_tsm` program used its own vendored copy of `FieldCodec` so it is not affected by this change. `FieldCodec` was only used for the `b1` and `bz1` engines which were removed in 0.12, but the code that created the field codec was never removed. This limited the maximum number of fields to 255 even though that restriction was removed with the `tsm1` engine. Fixes #6869.	2016-06-19 21:38:43 -05:00
Jonathan A. Sternberg	6e205ce135	Set the condition cursor instead of aux iterator when creating a nil condition cursor A copy/paste error had nil cursors destined for a condition cursor get set to the auxiliary cursor instead. When the number of conditions exceeded the number of auxiliary fields, this would result in a stack trace in some situations. When the number of conditions was less than or equal to the number of auxiliary fields, it means that an auxiliary cursor may have been overwritten with a nil cursor accidentally and a leak might have happened since it was never closed. Fixes #6859.	2016-06-17 14:54:48 -05:00
Michael Desa	0c867e4b2c	Fix benchmark test names Previously the test names included an `s` for the name of a singular component.	2016-06-16 08:45:36 -07:00
Michael Desa	9dfaa182a7	Add additional benchmarks for various schemas Anecdotally, the relationship between memory consumption and series cardinality was thought to be exponential. I suspect that this is false. The intent of the added benchmarks is to verify my suspicion. Eventually the these benchmarks will run nightly to serve as a basis to evualuate the memory performance in a controlled environment. https://github.com/influxdata/docs.influxdata.com/issues/392	2016-06-15 14:54:14 -07:00
Ben Johnson	7d4bea7153	add node id to execution options This commit changes the `ExecutionOptions` and `SelectOptions` to allow a `NodeID` for specifying an exact node to query against.	2016-06-10 09:20:44 -06:00
Jason Wilder	ac6addd0b5	Ensure restore doesn't write broken files Restore would try to open the shard if there was an error. If there was an error, the files written are very likely to be partially written and they can cause the server to panic. To prevent a shard from trying to open broken files, we now write to a temp file and rename it to the actual name only after fully writing and fsyncing the file.	2016-06-07 14:36:46 -06:00
Jonathan A. Sternberg	fe3f0d0e3d	Remove the DatabaseIndex method from TSDBStore interface The TSDBStore interface needs to also allow for remote TSDBStore but the DatabaseIndex is only for a local TSDB instance. Moved the optimized SHOW TAG VALUES path to do a typecast to the LocalTSDBStore struct instead of always attempting to use the optimized version. If the TSDBStore is not local and does not have the DatabaseIndex, it will default to using the distributed query instead.	2016-06-07 11:34:34 -05:00
Ben Johnson	bf3c22689b	Merge pull request #6792 from benbjohnson/show-tag-values Optimize SHOW TAG VALUES	2016-06-06 16:00:12 -06:00
Ben Johnson	1b94cd2686	optimize SHOW TAG VALUES This commit optimizes `SHOW TAG VALUES` so that it avoids the `SELECT` query engine execution and iterator creation. There are also optimizations to reduce individual memory allocations and to reduce in-memory heap size by only operating on one measurement at a time. Execution time has been reduce to approximately 900ms for 500,000 rows. This is about 2µs per row. Of this time, approximately 1µs is spent retrieving and sorting the row and 1µs is spent encoding into JSON and writing to the response body.	2016-06-06 15:50:53 -06:00
Jason Wilder	838a29cca8	Fix race in cache If cache.Deduplicate is called while writes are in-flight on the cache, a data race could occur. WARNING: DATA RACE Write by goroutine 15: runtime.mapassign1() /usr/local/go/src/runtime/hashmap.go:429 +0x0 github.com/influxdata/influxdb/tsdb/engine/tsm1.(Cache).entry() /Users/jason/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/cache.go:482 +0x27e github.com/influxdata/influxdb/tsdb/engine/tsm1.(Cache).WriteMulti() /Users/jason/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/cache.go:207 +0x3b2 github.com/influxdata/influxdb/tsdb/engine/tsm1.TestCache_Deduplicate_Concurrent.func1() /Users/jason/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/cache_test.go:421 +0x73 Previous read by goroutine 16: runtime.mapiterinit() /usr/local/go/src/runtime/hashmap.go:607 +0x0 github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Cache).Deduplicate() /Users/jason/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/cache.go:272 +0x7c github.com/influxdata/influxdb/tsdb/engine/tsm1.TestCache_Deduplicate_Concurrent.func2() /Users/jason/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/cache_test.go:429 +0x69 Goroutine 15 (running) created at: github.com/influxdata/influxdb/tsdb/engine/tsm1.TestCache_Deduplicate_Concurrent() /Users/jason/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/cache_test.go:423 +0x3f2 testing.tRunner() /usr/local/go/src/testing/testing.go:473 +0xdc Goroutine 16 (finished) created at: github.com/influxdata/influxdb/tsdb/engine/tsm1.TestCache_Deduplicate_Concurrent() /Users/jason/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/cache_test.go:431 +0x43b testing.tRunner() /usr/local/go/src/testing/testing.go:473 +0xdc	2016-06-06 15:45:01 -06:00
Jason Wilder	bc76048371	Fix panic in cache.DeleteRange Deleting keys that did not exist in the cache could cause a panic because the entry returned would be nil and was not checked.	2016-06-06 14:48:53 -06:00
Jason Wilder	cd336095ca	Merge pull request #6768 from influxdata/jw-disable-open Allow creating shards in a disabled state	2016-06-02 08:34:51 -06:00
Jason Wilder	579923d95f	Fix sporadic write failures with influx_stress This Unlock was moved which seems to create a deadlock situation sometimes under high write load. This deadlock causes writes to fail with timeouts.	2016-06-01 17:25:47 -06:00
Jason Wilder	a74ea4cbf4	Allow creating shards in a disable state For restoring a shard, we need to be able to have the shard open, but disabled. It was racy to open it and then disable it separately since writes/queries could occur in between that time.	2016-06-01 16:17:18 -06:00
Jason Wilder	d0023dee5d	Convert inline errors to constants	2016-05-31 10:51:54 -06:00
Jason Wilder	1ff8ecf4fb	Add ability to disable shards Disabling a shard causes all writes and queries to a shard to return an error. This also disables compactions for the shard.	2016-05-31 10:51:54 -06:00
Edd Robinson	baf5d505e6	Merge pull request #6754 from influxdata/er-fs Prevent ReadFloatBlock from panicking when no values	2016-05-31 16:41:29 +01:00
Edd Robinson	003c30989a	Check for no values	2016-05-31 16:28:17 +01:00
rw	dcec206f2e	Dedup `.RUnlock` between two conditionals.	2016-05-29 10:20:58 -07:00
rw	1b160d1af0	Low-contention path for pre-existing cache entries. This change appears to increase bulk ingestion throughput by 2x-3x in multiprocessor environments.	2016-05-28 23:50:11 -07:00
Jason Wilder	dd58101061	Merge pull request #6743 from influxdata/jw-parse-key Optimize series key parsing on startup	2016-05-27 15:00:42 -06:00
Jason Wilder	ff1447202c	Reduce lock contention in Measurement.AddSeries	2016-05-27 10:30:08 -06:00
Jason Wilder	11959005f4	Switch backup to use shard.Snapshot This switch the backup shard call to use the shard Snapshot that internally creates a snapshot by hardlinking all of the TSM and tombstone files instead. This reduces the time that the FileStore is locked and will allow for larger shards to be backup more easily.	2016-05-27 09:30:25 -06:00
David Norton	381059a55c	Merge pull request #6736 from influxdata/benchmark-write-points-allocs Benchmarks to count allocs in WritePoints.	2016-05-27 10:13:17 -04:00
Alex Russell-Saw	7edb14bffd	assign engine to shard after engine is initialized	2016-05-27 13:45:16 +01:00
Edd Robinson	6a7f9527e3	Revert `d2672a3` and `1e0a4e9`	2016-05-27 10:34:14 +01:00
rw	92e7fec5cf	Benchmarks to count allocs in WritePoints.	2016-05-26 17:13:14 -07:00
Edd Robinson	d2672a3280	Update Go version	2016-05-26 15:26:09 +01:00
Edd Robinson	1e0a4e9119	Move fields under mutex	2016-05-26 12:00:46 +01:00
Jason Wilder	d6661060a3	Merge pull request #6719 from shurcooL/fix-tombstone-open-error-check tsdb/engine/tsm1: Check os.Open error before using file.	2016-05-25 12:11:26 -06:00
Jason Wilder	a77dd4fe4c	Merge pull request #6725 from influxdata/jw-tsm-query Fix pathological TSM query case	2016-05-25 11:23:38 -06:00
Jason Wilder	7d50970631	Fix continous compaction edge case The level planner would keep including the same TSM files to be recompacted even if they were already quite compacted and split across several TSM files. Fixes #6683	2016-05-25 10:36:24 -06:00
Jason Wilder	0b481ff627	Fix pathalogical TSM query case This fixes a pathalogical query condition cause by and problematic structuring of TSM files based on how points were written. The condition can occur when there are multiple TSM files and a large number of points are written into the past. The earlier existing TSM files must also have points in the past and close to the present causing their time range to eclipse the later files. When this condition occurs, some queries can spend an excessive amount of time merge all the overlapping blocks. The fix was to constrain the window of overlapping blocks based on the first one we ran into. There was also a simple case in the Merge where we could skip the binary search path and just append the two inputs.	2016-05-25 09:14:17 -06:00
Dmitri Shuralyov	c03ebf896b	tsdb/engine/tsm1: Check os.Open error before using file. os.Open is documented as: > Open opens the named file for reading. If successful, methods on > the returned file can be used for reading; That suggests the file's methods should only be called if opening was successful. The original code would defer f.Close() right after os.Open, before ensuring that err is nil, so f.Close() would run even if os.Open did not return successfully. Apply https://github.com/golang/go/wiki/CodeReviewComments#indent-error-flow suggestion to keep the normal path at minimal indentation, and indent the error handling code instead. This improves code readability.	2016-05-24 21:08:35 -07:00
Jonathan A. Sternberg	32e42b93ae	Merge pull request #6705 from influxdata/js-6701-duplicate-points-with-select Filter out sources that do not match the shard database/retention policy	2016-05-24 09:48:31 -04:00
Jonathan A. Sternberg	5e7e0bd19b	Filter out sources that do not match the shard database/retention policy If you use a statement like this: SELECT value FROM one..cpu, two..cpu It will access both the `one` and `two` databases as if you had selected the `cpu` measurement twice for both of them. Updated the `tsdb.Shard` create iterator function to filter out any sources that do not apply to that shard so this duplication doesn't happen. Fixes #6701.	2016-05-23 17:05:33 -04:00
Jason Wilder	f48a106860	Optimized timestamp run-length decoding Removes the up-front allocation of decoded values and return them as needed.	2016-05-23 14:05:25 -06:00
Edd Robinson	0b2a806789	Merge pull request #6690 from influxdata/jw-shard-size Fix panic in shard.DiskSize()	2016-05-20 15:29:53 +01:00
Edd Robinson	40732a35d0	Merge pull request #6660 from influxdata/er-vet Fix vet issues	2016-05-20 11:12:25 +01:00
Jason Wilder	d324777bfc	Fix panic in shard.DiskSize() If the wal or data dir is not accessible (possibly deleted), the DiskSize walk funcs could panic because they did not check the error passed in.	2016-05-19 23:19:44 -06:00
Jonathan A. Sternberg	5621ccc2ce	Remove limit optimization when using an aggregate The limit optimization was put into the wrong place and caused only part of the shard to be read when a limit was used. The optimization is possible, but requires a bit of refactoring to the code here so the call iterator is created per series before handed to the limit iterator. Fixes #6661.	2016-05-19 10:29:38 -04:00
Jason Wilder	4c089a56f4	Fix read tombstones: EOF Due to an bug in TSM tombstone files, it was possible to create empty tombstone files. At startup, the TSM file would error out and not load the TSM file. Instead, treat it as an empty v1 file so the TSM file can load correctly. Fixes #6641	2016-05-18 23:29:25 -06:00
Jason Wilder	7fb7faaaca	Fix points already read from being returned more than once If there were duplicate points in multiple blocks, we would correctly dedup the points and mark the regions of the blocks we've read. Unfortunately, we were not excluding the already points as the cursor moved to points in the later blocks which could cause points to be return twice incorrectly. Fixes #6611	2016-05-18 17:21:10 -06:00
Jason Wilder	9f89420b4c	Merge pull request #6653 from influxdata/jw-compact-fix Compaction fixes	2016-05-18 16:10:10 -06:00
Jason Wilder	121195a865	Merge pull request #6665 from influxdata/jw-series-stats Reload series count stat at startup	2016-05-18 15:58:15 -06:00
Edd Robinson	09dc48b847	Merge pull request #6664 from influxdata/jw-shard-size Store shard size on disk statistic	2016-05-18 22:39:12 +01:00
Jason Wilder	209dd005c5	Merge pull request #6627 from influxdata/jw-deadlock Fix possible deadlock when queries and delete series run concurrently	2016-05-18 15:30:37 -06:00
Jason Wilder	f2bcf9d9ab	Code review fixes	2016-05-18 15:25:56 -06:00
Jason Wilder	d32ad26d27	Fix data not getting reloaded The optimization to speed up shard loading had the side effect of skipping adding series to the index that already exist. The skipping was in the wrong location and also skipped the shards measurementFields index which is required in order to query that series in the shard.	2016-05-18 15:25:56 -06:00
Jason Wilder	e859141b75	Speed up tests Switched the max keys test to write int64 of the same value so RLE would kick in and the file size will be smaller (84MB vs 3.8MB). Removed the chunking test which was skipped because the code will not downsize a block into smaller chunks now. Skip MaxKeys tests in various environments because it needs to write too much data to run reliably.	2016-05-18 15:25:56 -06:00
Jason Wilder	eff71cbe23	Rollover to new TSM file when max blocks exceeded Fixes #6406	2016-05-18 15:25:55 -06:00
Jason Wilder	8fda621d8b	Fix memory spike when compacting overwritten points If a large series contains a point that is overwritten, the compactor would load the whole series into RAM during a full compaction. If the series was large, it could cause very large RAM spikes and OOMs. The change reworks the compactor to merge blocks more incrementally similar to the fix done in #6556. Fixes #6557	2016-05-18 15:25:55 -06:00
Jason Wilder	f1ab89561a	Reload series count stat at startup	2016-05-18 15:21:57 -06:00
Edd Robinson	28ad7c687b	Add const for interval	2016-05-18 22:14:59 +01:00
Jason Wilder	cbc551f9dc	Collect shard size stats	2016-05-18 22:14:59 +01:00
Jonathan A. Sternberg	946968ba23	Fixing panic in SHOW FIELD KEYS caused by `733a17d` The list of field keys in the index may have differed from the field keys in the actual shard. Fixing `SHOW FIELD KEYS` so it relies only on the shard rather than the index. Fixes #6659.	2016-05-18 14:43:50 -04:00
Edd Robinson	f78e67d09c	Fix concurrent map access panic	2016-05-18 17:56:50 +01:00
Edd Robinson	f680ab0f0d	Fix vet issues	2016-05-18 13:34:11 +01:00
Joe LeGasse	af432e7d12	Fix loop variable reuse in database close Fixes #6650	2016-05-17 11:25:39 -04:00
Jonathan A. Sternberg	42cdaf0365	Merge pull request #6529 from influxdata/js-6519-select-tag-key-specifier Support cast syntax for selecting a specific type	2016-05-16 12:30:14 -04:00
Jonathan A. Sternberg	23f6a706bb	Support cast syntax for selecting a specific type Casting syntax is done with the PostgreSQL syntax `field1::float` to specify which type should be used when selecting a field. You can also do `field1::field` or `tag1::tag` to specify that a field or tag should be selected. This makes it possible to select a tag when a field key and a tag key conflict with each other in a measurement. It also means it's possible to choose a field with a specific type if multiple shards disagree. If no types are given, the same ordering for how a type is chosen is used to determine which type to return. The FieldDimensions method has been updated to return the data type for the fields that get returned. The SeriesKeys function has also been removed since it is no longer needed. SeriesKeys was originally used for the fill iterator, but then expanded to be used by auxiliary iterators for determining the channel iterator types. The fill iterator doesn't need it anymore and the auxiliary types are better served by FieldDimensions implementing that functionality, so SeriesKeys is no longer needed. Fixes #6519.	2016-05-16 12:08:29 -04:00
Jason Wilder	ce141eae37	Merge pull request #6637 from influxdata/jw-revert-compact Revert "Fix memory spike when compacting overwritten points"	2016-05-16 09:46:24 -06:00
Jason Wilder	23fc9ff748	Revert "Fix memory spike when compacting overwritten points" This reverts commit `d99c5e26f6`.	2016-05-16 09:30:34 -06:00
Jonathan A. Sternberg	a17f3d960a	SHOW TAG VALUES accepts != and !~ in WHERE clause Fixes #6607.	2016-05-16 08:51:09 -04:00
Jason Wilder	57d4becaec	Fix possible deadlock when queries and delete series run concurrently This locks showeed up in a deadlock systems running queries and delete series across a large dataset. Queries should not need to lock the tsdb.Store for writes	2016-05-13 17:04:12 -06:00
Jason Wilder	5b6f3afefa	Limit concurrent shards loading to number of cores available	2016-05-13 15:41:32 -06:00
Jason Wilder	11871958c6	Merge pull request #6618 from influxdata/jw-shard-load Optimize shard index loading	2016-05-13 14:16:17 -06:00
Jason Wilder	9e54adc719	Speed up drop database Drop database was closing and deleting each shard dir individually and serially. It would then delete the empty database dirs. This changes drop database to close all shards in parallel and run one os.RemoveAll to remove everything under the db dir which is more efficient. This also reworked the locking to avoid locking the tsdb.Store for long periods of time. That can cause queries and writes for other databases to block as well.	2016-05-13 10:26:28 -06:00
Jason Wilder	0dbd4893da	Optimize shard index loading On data sets with many series and potentially large series keys, the cost of parsing the key and re-indexing can be high. Loading the TSM keys into the index was being done repeatedly for series that were already index by an earlier TSM file. This was wasted worked and slows down shard loading. Parsing the key was also innefficient and allocated a new string slice. This was simplified to remove that allocation.	2016-05-12 14:02:42 -06:00
Ben Johnson	7afb73aa99	Merge pull request #6598 from benbjohnson/parallelize-planning Parallelize query planning	2016-05-12 09:00:58 -06:00
Jonathan A. Sternberg	89346bb618	Merge pull request #6600 from influxdata/0.13 Merge 0.13 release candidate back to master	2016-05-11 13:04:26 -04:00
Ben Johnson	668bae57df	parallelize query planning This commit changes the `tsm1.Engine` to create individual series iterators in batches so that it can be parallelized. Iterators are combined at the end so they can be redistributed to the parallelized merge iterator.	2016-05-11 10:38:11 -06:00
Cory LaNou	c32906a366	Merge pull request #6593 from influxdata/cjl-copyshard create shard snapshot	2016-05-10 20:01:59 -05:00
Jonathan A. Sternberg	8353b0c20f	Merge pull request #6592 from influxdata/js-3451-show-field-keys-with-field-type Update SHOW FIELD KEYS to return the field type with the field key	2016-05-10 14:13:17 -04:00
Jason Wilder	d8490f1170	Merge pull request #6587 from influxdata/jw-validate-fields Fix for merge values	2016-05-10 11:56:07 -06:00
Jonathan A. Sternberg	733a17d9e9	Update SHOW FIELD KEYS to return the field type with the field key Fixes #3451.	2016-05-10 13:16:57 -04:00
Cory LaNou	f415cf89ad	wip	2016-05-10 11:01:03 -05:00
Jason Wilder	9b86bfea2a	Merge pull request #6582 from eleme/fix_engine_cache_size fix cache size of engine	2016-05-10 09:01:03 -06:00
Jason Wilder	8839cabd41	Add benchmark for Merge	2016-05-10 08:39:55 -06:00
Cory LaNou	4d30ea1eb3	minor PR feedback refactor	2016-05-10 08:14:51 -05:00
Cory LaNou	a3bf3e2ef1	added baseline backup/restore plumbing	2016-05-10 08:14:51 -05:00
Jason Wilder	4f39cb2f97	Fix case where Merge return unsorted values	2016-05-09 15:40:34 -06:00
Ben Johnson	078e561820	parallelize iterators	2016-05-09 10:25:30 -06:00
Jonathan A. Sternberg	3f4072be7a	Fix SHOW TAG VALUES condition to not filter "name" erroneously Before #6038 was merged, we needed to filter "name" so that it didn't accidentally hit the code path that used "name" to check the name of a measurement. This was changed to "_name" to avoid a conflict with a legitimate tag that used "name" as the key. SHOW TAG VALUES was never modified to remove the code that filtered out "name". This removes that line of code so a condition with "name" doesn't get removed erroneously. Example: SHOW TAG VALUES WITH KEY = host WHERE "name" = 'jsternberg' Fixes #6581.	2016-05-09 10:27:53 -04:00
thbourlove	22c2e7e1c5	fix cache memory size of engine	2016-05-09 21:29:34 +08:00
Jason Wilder	d99c5e26f6	Fix memory spike when compacting overwritten points If a large series contains a point that is overwritten, the compactor would load the whole series into RAM during a full compaction. If the series was large, it could cause very large RAM spikes and OOMs. The change reworks the compactor to merge blocks more incrementally similar to the fix done in #6556.	2016-05-05 22:31:30 -06:00
Ben Johnson	4c45f8ec32	Merge pull request #6560 from benbjohnson/optimize-tsm1-call-iterator Move call iterator to series level	2016-05-05 11:13:53 -06:00
Ben Johnson	fdf34d4356	move call iterator to series level This commit moves the `CallIterator` to wrap the individual series instead of wrapping a shard. This allows individual points to be aggregated before being merged. This will cause a small increase in memory usuage per series but it shows a 20% decrease in query time when there are a moderate number of points per series.	2016-05-05 09:59:03 -06:00
Jason Wilder	a0ac754802	Fix loading huge series into RAM when points are overwritten In some query scenarios, if there are a lot of points on disk spread across many blocks in TSM files and a point is overwritten near the begginning of the shard's timerange, the full series could be loaded into RAM triggering OOMs and huge allocations. The issue was that the KeyCursor code that handles overwriting points had a simple implementation that just deduped the whole series in this case. This falls over when the series is quite large. Instead, the KeyCursor has been changed to only decode blocks with updated points. It then keeps track of what section of the blocks have been read so they are not re-read when the later points are decoded. Since the points in a block are always sorted, the code was also changed to remove the Deduplicate calls since they end up reallocating the slice. Instead, we do a sorted merge and re-use the slice as much as we can.	2016-05-05 09:34:44 -06:00
Jason Wilder	57cb3fdbc0	Merge pull request #6522 from influxdata/tp-tsm-dump Dump TSM files to line protocol	2016-05-03 10:44:33 -06:00
Jason Wilder	4196554f51	Fix overwriting points returning wrong value The cursors were returning the wrong value in the case when points existed in both the cache and tsm files with the same timestamp. The cache value should have been returned, but the tsm value was returned incorrectly. Fixes #6439	2016-05-03 09:21:31 -06:00
Ben Johnson	417df18396	Merge pull request #6533 from benbjohnson/optimize-show-series Optimize SHOW SERIES	2016-05-03 09:15:21 -06:00
Edd Robinson	fd77dbe648	Merge pull request #6546 from influxdata/er-build-tag Fix invalid build tag	2016-05-03 16:00:39 +01:00
Jonathan A. Sternberg	a2a5c32770	Merge pull request #6539 from influxdata/js-6495-fix-aggregates-with-empty-shards Fix aggregate returns when data is missing from some shards	2016-05-03 10:56:21 -04:00
Ben Johnson	49eb3b8d04	optimize show series iterator This commit changes the `SeriesIterator` to process one measurement at a time and uses a `floatFastDedupeIterator` to avoid point encoding during deduplication.	2016-05-03 08:52:44 -06:00
Jonathan A. Sternberg	d6d0addcec	Fix aggregate returns when data is missing from some shards If a shard is empty for a specific field and the field type is something other than a float, a nil iterator would get returned from one of the empty shards and cause the combined iterators to be cast to the float type and all other iterator types to be discarded (or for integers, to be cast). This is rare since most aggregates don't accept strings or booleans, but for queries like: SELECT distinct(string) FROM mydata It would result in nothing getting returned if one of the shards didn't have a value for `string`. This change modifies the query engine to return nil for the shards instead of a fake iterator and then to only use the fake iterator if the final aggregate iterator is nil (meaning that no iterators could be constructed for the field from any shard). Fixes #6495.	2016-05-03 10:41:22 -04:00
Edd Robinson	d35fa1ec97	Remove redundant windows build tags	2016-05-03 14:22:02 +01:00
Jason Wilder	d82aa98951	Reduce indentation in filter func	2016-05-02 11:38:25 -06:00
Jason Wilder	e0304ae3d5	Fix shards not getting assigned to series on restart Also, simplifies the LoadMetaDataIndex func to not require a *Shard	2016-05-02 11:36:05 -06:00
Jason Wilder	2d09937fd2	Fix removing fully deleted index blocks If multiple tombstone entries happen to exist for the same key in a tombstone file, it was possible to panic. The first application would remove all index entries and the second time around the code still assumed entries would exist and would index into the nil slice. Also fixes a case where the range of time would fully delete all index entries, but it did not align with math.MinInt64 and math.MaxInt64. This would cause the index locations to still exist in the offset slice. This is inefficient because the BlockIterator would still scan and decode the block only to discover that all the values are deleted. We now just remove it from the offsets slice in this case since the range of values are deleted.	2016-05-02 11:36:05 -06:00
Jason Wilder	58aa65d5a8	Optimize applyTombstones When a large tombstone file existed on disk, this code was slow since it would apply each tombstone to the index one at a time causing the index to be scanned for each key. Instead, we group all the tombstones together by timestamp and apply in bulk so that the index in scan once for each set of tombstones. If we change to immuntable tombstone files, it might be better to just write a file where all the keys have the same tombstone so we can re-apply them efficiently.	2016-05-02 11:36:05 -06:00
Jason Wilder	c73c7cea25	Revert filtering index entries in BlockIterator This was the wrong fix. The real issue was the tombstones were being read incorrectly and also applied incorrectly at times. This code is slower and not necessary so reverting it.	2016-05-02 11:36:04 -06:00
Jason Wilder	3a7429886e	Optimize Measurement.DropSeries	2016-05-02 11:36:04 -06:00
Jason Wilder	f9ace932c0	Fix V2 tombstone reading file position Each iteration of the loop was incrementing the position by 4 incorrectly. The position should start at four since the header is 4 bytes. This caused tombstones at the end of the file to not be read because the counter was out of sync with the actual file position which cause the loop to exit early. Probably better to refactor this to check for io.EOF instead of using the counter.	2016-05-02 11:36:04 -06:00
Jason Wilder	61e0d8ff93	Fix log prefix formatting	2016-05-02 11:36:04 -06:00
Jason Wilder	bd1009080e	Prevent writing empty tombstone files If you delete from a measurement with a tag those does not match any series, we would write a empty tombstone file and file to load it back.	2016-05-02 11:36:04 -06:00
Jason Wilder	8082fc61ba	Fix parsing keys when loading database index The code for parsing a key our of the WAL or TSM files in the engine was naive and didn't account for measurements with escape chars. This uses the correct parsing code to parse and load them correctly. Fixes #6496	2016-04-30 14:47:19 -06:00
Todd Persen	9eb4c1ec57	Fix typo in comment.	2016-04-29 16:26:27 -07:00
Jason Wilder	abcb559b09	Remove index meta data when series and measurements are gone This remove the dropMeta param from the tsdb.Store.DeleteSeries and lets the shard determine when to remove the meta data from the index based on what series still have data in the shard. This uncovered a nasty bug in compactions where a fully deleted series would prematurely end the compactions and not carry forward the rest of the data in the TSM file. This is now fixed as well.	2016-04-29 16:31:57 -06:00
Edd Robinson	4d1cfa887c	Ensure measurement dropped when no more series	2016-04-29 00:05:42 +01:00
Jason Wilder	2bd5880d7a	Remove series from index when shard is closed When a shard is closed and removed due to retention policy enforcement, the series contained in the shard would still exists in the index causing a memory leak. Restarting the server would cause them not to be loaded. Fixes #6457	2016-04-28 12:34:46 -06:00
Jason Wilder	4e353867d5	Fix first block not getting purged when deleting series	2016-04-27 17:08:00 -06:00
Ben Johnson	f7af787aef	add DELETE query support This commit adds query language support for deleting series with a `DELETE` query.	2016-04-27 15:16:23 -06:00
Jason Wilder	aefd2ad08b	Add DeleteSeries and DeleteSeriesRange	2016-04-27 13:09:53 -06:00
Jason Wilder	c306090361	Fix tombstone rename on windows	2016-04-27 13:09:53 -06:00
Jason Wilder	86d37614e4	Remove debugging from test output	2016-04-27 13:09:53 -06:00
Jason Wilder	bf3aa5857d	Don't add tombstone for timerange not contained by file	2016-04-27 13:09:53 -06:00
Jason Wilder	6042e114a1	Remove tombstoned values during compaction This will skip blocks that are fully tombstoned as well as remove points that have been removed within a block.	2016-04-27 13:09:53 -06:00
Jason Wilder	23bbfb2192	Prevent truncated WAL entries from panicing	2016-04-27 13:09:53 -06:00
Jason Wilder	0de21ade40	Add delete range of values support to WAL and cache loader	2016-04-27 13:09:53 -06:00
Jason Wilder	d13d01b516	Allow deleting series by time on a shard	2016-04-27 13:09:53 -06:00
Jason Wilder	4d71d2b01f	Add support for deleting cache values using time range	2016-04-27 13:09:52 -06:00
Jason Wilder	c154cd4b4a	Remove TSMReaderOptions Not used	2016-04-27 13:09:52 -06:00
Jason Wilder	c8bd41c2d8	Remove TSM reader Keys func It's very inneficient and should never be used.	2016-04-27 13:09:52 -06:00
Jason Wilder	7e06d558d5	Update ContainsValue to handle tombstones	2016-04-27 13:09:52 -06:00
Jason Wilder	97504a552c	Support time range tombstones in FileStore/KeyCursor	2016-04-27 13:09:52 -06:00
Jason Wilder	27c2bc3f15	Sepearate IndexWriter from TSMIndex Allows for future versionion of the TSMIndex as well as removing a lot of unnecessary code.	2016-04-27 13:09:52 -06:00
Jason Wilder	bb82331db7	Move TSMIndex defn to reader.go	2016-04-27 13:09:52 -06:00
Jason Wilder	1ac0b01c5a	Remove fileAccessor No longer used	2016-04-27 13:09:52 -06:00
Jason Wilder	a789e819a3	Remove NewTSMReaderWithOptions There are two TSMIndex implementations, the directIndex and the indirectIndex. Originally, we only had the directIndex and later added the indirectIndex and NewTSMReaderWithOptions in order to allow both indexes to be used in tests and code. This has created a problem since we really only use the directIndex for writing and always use the indirectIndex for reading. This changes removes the NewTSMReaderWithOptions func so that it is no longer possible to create a TSMReader with a directIndex. This will allow a lot of the block reading code used by the directIndex to be removed and simplify maintainence. It also gives better test coverage of the code that is actually used by the TSM engine now.	2016-04-27 13:09:52 -06:00
Jason Wilder	bc6328d196	Add time range support to tombstone files This adds support for a time range to tombstone files to allow a subset of points to be deleted instead of the whole series. It changes the tombstone file format to a binary format and maintains backwards compatibility with the old text format tombstone files.	2016-04-27 13:09:52 -06:00
Tait Clarridge	df0e16a92f	Add safer unlock to CreateFieldIfNotExists A deadlock can occur if the field was created while we were waiting for the lock.	2016-04-25 12:44:58 -04:00
Ben Johnson	9c1fa76f3c	Merge pull request #6452 from benbjohnson/simple8b update dep: simple8b @ b421ab40	2016-04-22 11:05:42 -06:00
Ben Johnson	286072f65a	update dep: simple8b @ b421ab40	2016-04-22 09:46:05 -06:00
Jonathan A. Sternberg	d26e4e3650	Pass binary expressions to the underlying query Binary math inside of a where condition was previously disallowed. Now, these types of queries are just passed verbatim down to the underlying query engine which can handle it. We may want to revisit this when it comes to tags at some point as it prevents the more efficient filtering of tags that a simple expression allows, but it allows a query like this to be done: SELECT * FROM cpu WHERE value + 2 < 5 So while it can be better, this is a good initial implementation to provide this functionality. There are very rare situations where a tag may be used appropriately in one of these circumstances. Fixes #3558.	2016-04-22 11:30:36 -04:00
Ben Johnson	d204a8b683	optimize tsm1.FloatDecoder This commit changes the `FloatDecoder.val` from a `float64` type to a `uint64` to avoid an additional type conversion during read. Now the type gets converted to a `float64` only on call to `Values()`.	2016-04-21 08:49:12 -06:00
Jason Wilder	87ceb7426a	Don't lock the cache while adding entries Entries have their own locking so the cache doesn't need to be lock when adding to them.	2016-04-20 16:08:58 -06:00
Jason Wilder	89aeaafd50	Re-use the string point key	2016-04-20 16:08:58 -06:00
Jason Wilder	fbaa7db54f	Don't lock entry when scanning new values to add	2016-04-20 16:00:26 -06:00
Jason Wilder	bfa225f149	Merge pull request #6430 from influxdata/jw-cache-load-size Disable cache max memory size when reloading the cache	2016-04-20 14:35:23 -06:00
Stephen Gutekanst	9dc09c5257	Make logging output location more programmatically configurable (#6213 ) This has various benefits: - Users embedding InfluxDB within other Go programs can specify a different logger / prefix easily. - More consistent with code used elsewhere in InfluxDB (e.g. services, other `run.Server.` fields, etc). - This is also more efficient, because it means `executeQuery` no longer allocates a single `log.Logger` each time it is called.	2016-04-20 21:07:08 +01:00
Jason Wilder	f679787080	Disable cache max memory size when reloading the cache The cache max memory size is an approximate size and can prevent a shard from loading at startup. This change disable the max size at startup to prevent this problem and sets the limt back after reloading. Fixes #6109	2016-04-20 10:41:30 -06:00
Jonathan A. Sternberg	c8c38e15cd	Merge pull request #6386 from influxdata/js-iterator-next-error Modify all of the iterators to allow returning an error on Next()	2016-04-20 10:39:53 -04:00
Ben Johnson	54454e1e5b	Merge pull request #6424 from benbjohnson/optimize-bit-reader Optimize tsm1.BitReader	2016-04-20 08:28:24 -06:00
Seif Lotfy	c6e3c87e00	Add Block checksum validation and "influx_inspect verify" tool Fixes #5502	2016-04-19 22:33:03 +02:00
Jonathan A. Sternberg	493ef0e1ce	Merge pull request #6416 from influxdata/js-3166-deterministic-limit Sort the series keys inside of a tag set so the output is deterministic	2016-04-19 14:49:49 -04:00
Ben Johnson	1d2238c642	optimize tsm1.BitReader This commit rewrites the `tsm1.BitReader` to use an 8-byte buffer instead of a 1-byte buffer and provide an inlineable fast bit read.	2016-04-19 11:34:17 -06:00
Jason Wilder	f841a90d35	Use int64 instead of time.Time in timestamp encoder/decoder	2016-04-19 10:25:27 -06:00
Jason Wilder	61beeca426	Update timestamp benchmarks	2016-04-19 10:17:32 -06:00
Jonathan A. Sternberg	09c46a451a	Sort the series keys inside of a tag set so the output is deterministic The series keys within a tag set were previously not sorted which would cause the output to be non-deterministic. This sorts the output series by their keys so it has a consistent output especially when using limits. Fixes #3166.	2016-04-18 17:45:31 -04:00
Jonathan A. Sternberg	7ec2a991d5	Modify all of the iterators to allow returning an error on Next() This also switches the remaining iterators to be lazy so they can return errors properly. They needed to be converted to lazy initialization anyway, which has the side effect of making it much easier for us to propagate the underlying error during initialization. Updated the Emitter to return errors when it cannot read properly from the iterators.	2016-04-18 11:17:55 -04:00
Jonathan A. Sternberg	93745d9693	Merge pull request #6391 from influxdata/js-5553-limit-queries-slow-with-group-by Propagate the limit option to the low level iterators	2016-04-16 09:39:25 -04:00
Jonathan A. Sternberg	bd5fdd797d	Propagate the limit option to the low level iterators When a GROUP BY or multiple sources are used, the top level limit iterator requires reading the entire iterator stream so it can find all of the tag groups it needs to return. For large data series, this ends up with the limit iterator discarding a lot of output. This change adds a new lower level limit iterator on each series itself so that there are fewer data points that have to be thrown away by the top level iterator. Fixes #5553.	2016-04-15 18:23:54 -04:00
Jonathan A. Sternberg	835d08591e	Do not filter out empty tags from series keys	2016-04-13 09:15:57 -04:00
Jonathan A. Sternberg	60282cf52d	Merge pull request #6284 from influxdata/js-3371-where-clause-compare-tags-and-fields Enhance comparing tags and fields in the where clause	2016-04-12 11:45:54 -04:00
Pierre Fersing	29b19a2293	Fix deadlock in tsm1/file_store	2016-04-12 09:39:21 +02:00
Jonathan A. Sternberg	ea6262b712	Enhance comparing tags and fields in the where clause Now it is possible to compare tags and fields and it is also now possible to compare tags and tags. Previously, it was only possible to compare fields with fields and tags with a string or a regex. Fixes #3371.	2016-04-11 18:10:08 -04:00
Ben Johnson	525e22c92b	tsm1 query engine alloc reduction This commit makes a number of performance improvements to reduce allocations during query execution. Several objects and buffers are now reused across the components to avoid allocations. Previously a simple `count(value)` query across 1M points would require 26,000+ allocations. After the changes in this commit that number has been reduced to 88.	2016-04-11 14:50:59 -06:00
Jonathan A. Sternberg	5bdd61bde7	Support empty tags for all WHERE equality operations A missing tag on a point was sometimes treated as `""` and sometimes treated as a separate `null` entity. This change modifies the equality operations to always treat a missing tag as an empty string. Empty tags are not indexed and do not have the same performance as a tag that exists. Fixes #3773.	2016-04-11 12:01:35 -04:00
Edd Robinson	5327a75a6f	Merge pull request #6216 from influxdata/er-scope-proto Change protobuf package names to avoid clashes	2016-04-07 16:38:21 +01:00
Jonathan A. Sternberg	a58430bb60	Merge pull request #6217 from influxdata/js-tsdb-unused-code Remove unused code and increase some test coverage for the tsdb package	2016-04-06 10:07:43 -04:00
Jonathan A. Sternberg	028fdaff81	Merge pull request #6222 from influxdata/js-6206-descending-tsm1-iterators Handle nil values from the tsm1 cursor correctly	2016-04-06 10:05:20 -04:00
Jonathan A. Sternberg	94ec92d669	Handle nil values from the tsm1 cursor correctly Send nil values from the tsm1 cursor at the end of the cursor. After the cursor reached tsm1, the `nextAt()` call would always return the default value rather than a nil value. Descending also didn't work correctly because the seeking functionality for tsm1 iterators would always act like they were ascending instead of descending when choosing which value to select. This resulted in very strange output from the emitter since it couldn't figure out if it was ascending or descending. Fixes #6206.	2016-04-06 09:27:02 -04:00
Jonathan A. Sternberg	7a229c7e4e	Remove unused code and increase some test coverage for the tsdb package	2016-04-06 09:24:56 -04:00
joelegasse	84f8dd7c85	Merge pull request #6190 from influxdata/jw-race Fix race on measurementFields	2016-04-06 08:13:58 -04:00
Edd Robinson	184257a10d	Scope all internal protobuf packages	2016-04-05 13:54:21 +01:00
Jonathan A. Sternberg	37b63cedec	Cleanup QueryExecutor and split statement execution code The QueryExecutor had a lot of dead code made obsolete by the query engine refactor that has now been removed. The TSDBStore interface has also been cleaned up so we can have multiple implementations of this (such as a local and remote version). A StatementExecutor interface has been created for adding custom functionality to the QueryExecutor that may not be available in the open source version. The QueryExecutor delegate all statement execution to the StatementExecutor and the QueryExecutor will only keep track of housekeeping. Implementing additional queries is as simple as wrapping the cluster.StatementExecutor struct or replacing it with something completely different. The PointsWriter in the QueryExecutor has been changed to a simple interface that implements the one method needed by the query executor. This is to allow different PointsWriter implementations to be used by the QueryExecutor. It has also been moved into the StatementExecutor instead. The TSDBStore interface has now been modified to contain the code for creating an IteratorCreator. This is so the underlying TSDBStore can implement different ways of accessing the underlying shards rather than always having to access each shard individually (such as batch requests). Remove the show servers handling. This isn't a valid command in the open source version of InfluxDB anymore. The QueryManager interface is now built into QueryExecutor and is no longer necessary. The StatementExecutor and QueryExecutor split allows task management to much more easily be built into QueryExecutor rather than as a separate struct.	2016-04-04 13:27:17 -04:00
Jason Wilder	ca8b0ca143	Optimize locking in CreateFieldIfNotExists Also remove some dead code that is no longer relevant with tsm.	2016-04-01 20:44:40 -06:00
Jason Wilder	3f4c5a5585	Fix race on measurementFields Both Shard and Engine had the same reference to the measurementField map, but they each protected it with their own locks. This causes a race when write and queries are occurring because writes can add new fields to the map while queries are reading from it. The fix moves the ownership to the Engine and provides protected accessors to that Shard now users. For the most parts, the access on shard were old dead code. Fixing the measurementFields map race created a new race on the internal fields map. This is now unexported and protected via MeasurementFields exported funcs. Fixes #6188	2016-04-01 18:57:01 -06:00
Jason Wilder	07e3215d11	Remove ununsed Series.match func	2016-03-31 10:19:46 -06:00
Jason Wilder	40c4973423	Remove per measurement stats collection The stats setup ends up creating a lot of lock contention which signifcantly impacts write throughput when a large number of measurements are used. Fixes #6131	2016-03-31 10:19:27 -06:00
Jason Wilder	f1bb87d4f8	Convert index write lock to series lock	2016-03-31 10:19:27 -06:00
Edd Robinson	8e2d1e48c7	Check if engine closed. Fixes #6140	2016-03-31 15:59:04 +01:00

... 2 3 4 5 6 ...

1337 Commits (27d157763a4a6ecfd7ec487cb4d1e472ba17634b)