influxdb

Commit Graph

Author	SHA1	Message	Date
Jason Wilder	dd1c030815	Remove limit count param on fields It's not used anymore.	2017-11-22 11:17:34 -07:00
Jason Wilder	c14b0e81b7	Save field types to speed up startup This persists the field types in a shard to avoid having to scan all the TSM files at startup.	2017-11-22 11:17:34 -07:00
Jason Wilder	c8b24b7939	Remove MANIFEST	2017-11-22 11:17:34 -07:00
Edd Robinson	68dd5e27c8	Improve performance of TagKeys	2017-11-21 17:16:47 +00:00
Jason Wilder	50b6ace75f	Fix wait reused while disabling compactions	2017-11-20 14:55:47 -07:00
Edd Robinson	6851db3fc9	Add FGA support to SHOW MEASUREMENTS	2017-11-17 11:06:43 +00:00
Jason Wilder	aa99a56bf1	Merge pull request #9129 from influxdata/jw-cursor-deletes Fix KeyCursor not returning remaing blocks	2017-11-16 16:58:30 -07:00
Jason Wilder	02dbe6dbd3	Fix KeyCursor not return remaing blocks If the first block that needs to be read was partially deleted such that the trailing end has no values, it was possible for the query cursor end early. This was caused by the KeyCursor.ReadFloatBlock returning no values instead of checking the remaing blocks.	2017-11-16 15:23:34 -07:00
Stuart Carnie	2c2244b79c	remove empty file	2017-11-16 09:02:31 -08:00
Ben Johnson	ede3fcf98e	intermediate	2017-11-15 16:09:25 -07:00
Jason Wilder	e2cb1d0ff4	Merge pull request #9114 from influxdata/jw-force-full-plan Add capability to force a full compaction	2017-11-15 10:45:00 -07:00
Jason Wilder	ef06773d5b	Fix panic: runtime error: slice bounds out of range A panic could occur if an invalid time range was passed to Exclude/Include, etc.	2017-11-15 08:18:53 -07:00
Jason Wilder	97e0d496a6	Add capability to force a full compaction This adds the capability to the engine to force a full compaction to be scheduled. When called, it snapshots any data in the cache, aborts running compactions and prevents level plans from returning level plans.	2017-11-15 07:14:27 -07:00
Ben Johnson	ba4c9e0317	Merge remote-tracking branch 'upstream/master' into er-tsi-index-part	2017-11-14 16:14:13 -07:00
Stuart Carnie	2e04e871c9	fix descending queries * did not handle cached values correctly * sort shards by time in either ascending or descending order depending on the RPC request ordering to ensure they are traversed in the correct order.	2017-11-13 17:14:36 -08:00
Jason Wilder	8b18cc4456	Optimize deletes in tsi The DropSeries code path ended up creating a MeasurementSeriesIterator for each dropped series, this was too expensive just to see if a series exists. This adds a HasSeries func and fixes and issue where TSI files were compacted while an iterator was still in use causing a panic.	2017-11-13 12:35:38 -07:00
Jason Wilder	c0631c2b95	Fix temp tombstone files leaking	2017-11-13 09:02:10 -07:00
Jason Wilder	13692639cb	Fix create/delete series race This fixes a race where writes and deletes to the same series and measurements could sometimes leave the index in an inconsistent state.	2017-11-13 09:02:10 -07:00
Jason Wilder	80cd5e63af	Optimize DeleteSeriesRange This removes more allocations and speeds up some critical sections.	2017-11-13 09:02:10 -07:00
Jason Wilder	aee395d3bd	Make DeleteSeriesRange take SeriesIterator	2017-11-13 09:02:10 -07:00
Jason Wilder	f893beb6d8	Use MeasurementSeriesKeysByExprIterator for deletes	2017-11-13 09:02:10 -07:00
Jason Wilder	000768371f	Optimized deletes in TSM index This optimizes how deletes are processed to reduce memory usage and improve efficiency.	2017-11-13 09:02:08 -07:00
Jason Wilder	eebd88f825	Don't write tombstones for keys that do not exist This filters out keys that do not exist in a TSM file to avoid writing entries that would end up being ignored when applied.	2017-11-13 08:50:07 -07:00
Jason Wilder	88c48ec78b	Rework Engine.DeleteSeriesRange to avoid allocations This removes the containsSeries func which ends up creating a map sized to the slice of keys passed in. This doesn't scale well to high cardinalities and creates a lot of garbage.	2017-11-13 08:50:07 -07:00
Jason Wilder	cb658774bb	Reduce allocations when reading tombstone v4	2017-11-13 08:50:07 -07:00
Jason Wilder	1c65bb3bb1	Fix leaked goroutine in FileStore.WalkKeys If fn returned and error, the goroutines sending keys from TSM files would get blocked indefinitely and leak.	2017-11-13 08:50:07 -07:00
Jason Wilder	b0c7a44eaa	Adjust min/max time to work in the engine The query language min and max times are slighly different than the values used in the engine. This allows faster codes to be used when the whole time range is deleted.	2017-11-13 08:50:07 -07:00
Jason Wilder	2959b8d2eb	Make BatchDeleters concurrent	2017-11-13 08:50:07 -07:00
Jason Wilder	5a775c50d9	Add DeleteRangeWith This is a version of DeleteRange that take a func predicate to determine whether a series key should be deleted or not. This avoids the large slice allocations with higher cardinalities.	2017-11-13 08:50:07 -07:00
Jason Wilder	6b19d2b673	Add BatchDeleters type	2017-11-13 08:48:03 -07:00
Jason Wilder	9ac83601cf	Use BatchDeleter in FileStore	2017-11-13 08:48:03 -07:00
Jason Wilder	4ed19348fd	Add a BatchDelete capability to TSMReader	2017-11-13 08:48:03 -07:00
Jason Wilder	44e782f173	Store temporary tombstones on disk This removes the in-memory tombstone buffer when writing tombstones which eliminates one source of large memory spikes during deletes.	2017-11-13 08:48:03 -07:00
Jason Wilder	bd15d37c70	Extract commit func	2017-11-13 08:48:03 -07:00
Jason Wilder	1e56894097	Extract writeTombstone func	2017-11-13 08:48:03 -07:00
Jason Wilder	b958c68ce5	Avoid re-reading tombstones when writing new ones This adds a new v4 tombstone format that extends the v3 format by allowing multiple batches of tombstones to be written without having to re-read all the existing tombstones. This uses gzip multi stream to append multiple v3 files together to create a v4 format.	2017-11-13 08:48:03 -07:00
Jason Wilder	17bae05370	Allow buffering tombstones before writing to disk	2017-11-13 08:48:03 -07:00
Jonathan A. Sternberg	0b7c56bcd8	Update the zap logger dependency The previous sha was taken from a revision on a devel branch that I thought would continue staying in the tree after it was merged. That revision was rebased away and the API was changed for the logger. This updates the usage of the logger and adds a simple package for constructing the base logger. The 1.0 version of zap changed the format of the default console logger so this change moves over to this new logger instead of attempting to retain backwards compatibility with the old format.	2017-11-10 16:27:16 -06:00
Ben Johnson	9ad2b53881	intermediate	2017-11-09 09:18:33 -07:00
Stuart Carnie	7cb25ecbff	optimized slice when outside timerange find position then update both slices once	2017-11-03 16:31:01 -07:00
Stuart Carnie	295acd6920	also slice values	2017-11-03 15:50:16 -07:00
Stuart Carnie	c1da95442c	Merge pull request #9054 from influxdata/js-update-influxql-path-in-templates Update the influxql path inside of the template files	2017-11-03 09:44:02 -07:00
Jonathan A. Sternberg	748fc4ae79	Update the influxql path inside of the template files	2017-11-03 10:57:17 -05:00
Andrew Hare	ecb3952fa9	Allow human-readable byte sizes in config Update support in the `toml` package for parsing human-readble byte sizes. Supported size suffixes are "k" or "K" for kibibytes, "m" or "M" for mebibytes, and "g" or "G" for gibibytes. If a size suffix isn't specified then bytes are assumed. In the config, `cache-max-memory-size` and `cache-snapshot-memory-size` are now typed as `toml.Size` and support the new syntax.	2017-11-01 11:09:09 -05:00
Stuart Carnie	9a43c14653	Merge pull request #9041 from influxdata/sgc-influxql influxdata/influxdb/influxql -> influxdata/influxql	2017-10-31 07:31:31 -07:00
Stuart Carnie	f3d45ba301	influxdata/influxdb/influxql -> influxdata/influxql	2017-10-30 14:40:26 -07:00
Jason Wilder	48ebc53154	Revert "Fix race in disableLevelCompactions" This reverts commit `4f8580fbaa`.	2017-10-30 14:14:50 -06:00
Stuart Carnie	dc04eaa8f3	Amendments based on feedback * Fprint* functions * No nakedness * clarify panic messages * spacing between case statements * remove break in favor of return * remove goto in favor of for { continue }	2017-10-25 13:38:07 -07:00
Stuart Carnie	c39f1ad748	Add batch cursor support to tsdb and tsm1 * batch cursors return slices of timestamps and values to reduce call overhead. Significantly improved iteration. * added CreateCursor API to Shard, Engine * moved build*Cursor to code gen	2017-10-25 13:38:07 -07:00
Stuart Carnie	3e28323a10	Simplified DecodeBlock functions array has already been sized correctly * eliminates bounds checking for each element access * reduces decoding of 30,000,000 points via storage API from 584ms to 540ms on average	2017-10-25 13:38:07 -07:00
Stuart Carnie	b7579340fe	return query.ErrQueryInterrupted for read on InterruptCh	2017-10-24 14:10:28 -07:00
Jason Wilder	955829e7c3	Merge pull request #9003 from influxdata/jw-delete-regression Delete series in batches	2017-10-24 13:54:33 -06:00
Jason Wilder	cbbbe8bedb	Delete series in batches This fixes a regression where deleting series keys would happen one at a time instead of in bulk.	2017-10-24 11:06:21 -06:00
Stuart Carnie	02a05e86ee	Add missing template changes for EXPLAIN ANALYZE	2017-10-23 14:46:36 -07:00
Stuart Carnie	e9313876ab	EXPLAIN ANALYZE * Introduces EXPLAIN ANALYZE command, which produces a detailed tree of operations used to execute the query. introduce context.Context to APIs metrics package * create groups of named measurements * safe for concurrent access tracing package EXPLAIN ANALYZE implementation for OSS Serialize EXPLAIN ANALYZE traces from remote nodes use context.Background for tests group with other stdlib packages additional documentation and remove unused API use influxdb/pkg/testing/assert remove testify reference	2017-10-20 08:01:37 -07:00
Jason Wilder	05131f4453	Fix indirectIndex not removing fully deleted series If multiple tombstones exists for a series that ended up causing the full data to be deleted, the blocks were not removed from the offsets in the index. This causes the TSMReader to report that a key exist but does not have any data. During a compaction, every key should have at least one value. Since this invariant was broken, the compaction aborted early and ends up dropping all series keys that are lexigraphically greater than where the breakage occured. This would cause data to be dropped during the compaction.	2017-10-18 18:16:41 -06:00
Jason Wilder	9f102adabe	Abort BlockIterator iteration if deletes detected This fixes a potential bug where the BlockIterator would skip blocks if the underlying TSMReader had deletes on it concurrently. This could possibly occur due to changes in `91eb9de3` that now use the existing TSMReaders from the FileStore instead of creating new ones during compaction.	2017-10-18 18:16:37 -06:00
Jason Wilder	4d171f3f40	Fix data deleted outside of time range	2017-10-18 13:39:47 -06:00
Jason Wilder	4f8580fbaa	Fix race in disableLevelCompactions There was a race on the WaitGroup where we could end up calling Add while another goroutine was still waiting. The functions were confusing so they have been simplified a bit since the compactions goroutines have been reworked a lot already.	2017-10-16 10:50:16 -06:00
Jason Wilder	e683502dd6	Merge pull request #8961 from lrita/master remove duplicated code in cacheKeyIterator.encode()	2017-10-16 10:17:32 -06:00
Jason Wilder	bc360ccfd5	Merge pull request #8970 from influxdata/jw-wal-panic Fix corrupted wal segment panic on 32 bit systems	2017-10-16 10:00:02 -06:00
Jason Wilder	fb7135ddc8	Fix corrupted wal segment panic on 32 bit systems	2017-10-16 09:41:20 -06:00
lrita	2f0aa4a420	remove duplicated code in cacheKeyIterator.encode()	2017-10-13 20:39:15 +08:00
Stuart Carnie	a0848eac8c	remove unnecessary err value readKey never sets error, so it is always nil	2017-10-12 08:28:53 -07:00
Jason Wilder	1401950b10	Only schedule one compaction per shard at a time The scheduling logic ended up favoring more backlogged shards too much and would starved active, less backed up shards. This occurred because the scheduling kicks in once a second. When it runs, it schedules as many compactions as it can. A backed up shard would end up having more compactions to run during the loop an would generally get to schedule them more frequently. This now allows each shard to try and schedule one compaction at a time which provides a more balanced approach. At some point, we'll probably want to more directly balanc the each shards backlog vs letting it happen somewhat randomly.	2017-10-09 11:40:32 -06:00
Jason Wilder	00a403f60e	Reduce allocation in tsmKeyIterator.Next This reuses some intermediate buffers and structs while compacting files.	2017-10-04 17:35:56 -06:00
Jason Wilder	6b6ccf1a40	Wait for compaction gorotuines to finish	2017-10-04 10:01:44 -06:00
Jason Wilder	06226d6fd3	Handle orphan lower level TSM files during full planning Some files seem to get orphan behind higher levels. This causes the compactions to get blocked as the lowere level files will not get picked up by their lower level planners. This allows the full plan to identify them and pull them into their plans.	2017-10-04 08:13:14 -06:00
Jason Wilder	a1d0b52897	Allow lower priority compactions to use excess capacity If there is a backlog of level 3 and 4 compacitons, but few level 1 and 2 compactions, allow them to use some excess capacity.	2017-10-04 08:11:44 -06:00
Jason Wilder	f2a681c4cf	Unconditionally remove file when calling Remove	2017-10-03 10:49:17 -06:00
Jason Wilder	0c0505881f	Remove multiple file skipping for full compaction planning This check doesn't make sense for high cardinality data as the files typically get big and sparse very quickly. This causes a lot of extra disk space to be used which is taken up by large indexes and sparse data.	2017-10-03 10:48:14 -06:00
Jason Wilder	90df803802	Prevent infinite scheduling loop One shard might be able to run a compaction, but could fail to limits being hit. This loop would continue indefinitely as the same task would continue to be rescheduled.	2017-10-03 10:48:14 -06:00
Jason Wilder	4ff4ba0841	Use first file in generation for level With higher cardinality or larger series keys, the files can roll over early which causes them to take longer to be compacted by higher levels. This causes larger disk usage and higher numbers of tsm files at times.	2017-10-03 10:48:14 -06:00
Jason Wilder	71071ed67a	Add compaction backlog stat This gives an indication as to whether compactions are backed up or not.	2017-10-03 10:48:14 -06:00
Jason Wilder	16ece490ef	Reduce allocation in tsmKeyIterator.Next The chunked slice is unnecessary and we can re-use k.blocks throughout the compaction.	2017-10-03 10:48:14 -06:00
Jason Wilder	2c5006fccc	Rework snapshotting concurrency This switches the thresholds that are used for writing snapshots concurrently. This scales better than the prior model.	2017-10-03 10:48:14 -06:00
Jason Wilder	3af9c7df37	Remove a defer allocation Shows up under high cardinality compactions.	2017-10-03 10:48:14 -06:00
Jason Wilder	70817350b7	Ensure temp index files are cleaned up on error	2017-10-03 10:48:14 -06:00
Jason Wilder	a5afaf7499	Fix cache mem size not including key size	2017-10-03 10:48:14 -06:00
Jason Wilder	ae821f4e2d	Rework compaction scheduling This changes the compaction scheduling to better utilize the available cores that are free. Previously, a level was planned in its own goroutine and would kick off a number of compactions groups. The problem with this model was that if there were 4 groups, and 3 completed quickly, the planning would be blocked for that level until the last group finished. If the compactions at the prior level are running more quickly, a large backlog could accumlate. This now moves the planning to a single goroutine that plans each level in succession and starts as many groups as it can. When one group finishes, the planning will start the next group for the level.	2017-10-03 10:48:13 -06:00
Jason Wilder	f668b0cc3f	Only use O_SYNC for tsm file writing Doing this for the WAL reduces throughput quite a bit.	2017-10-03 10:48:13 -06:00
Jason Wilder	1610ae5727	Don't return tsm files part of a compaction plan	2017-10-03 10:48:13 -06:00
Joe LeGasse	1443b22379	auth: add series auth to 'show tag values'	2017-09-27 20:01:18 -04:00
Edd Robinson	e0cba4477c	Merge pull request #8885 from influxdata/er-entry-race Fix race on Cache entry	2017-09-27 18:42:45 +01:00
Edd Robinson	d0b81c1e6c	Fix race on Cache entry	2017-09-27 18:10:23 +01:00
Edd Robinson	a1b67160f6	Use math/bits in encoder	2017-09-26 12:51:08 +01:00
Jason Wilder	122a74c692	Use synchronous IO for wal and tsm writing The fysncs due to large writes when writing to TSM files and the WAL can eventually cause large pauses. Since we already buffer writes, using synchronous IO reduces fsync latency by ensuring the individiual writes hit disk. This spreads out the latecncy across multiple writes better.	2017-09-25 12:44:57 -06:00
Jason Wilder	5774b44a4c	Remove MADV_RANDOM This was inadvertently added when merging the solaris and unix mmap files. This causes large delays due to major page faults.	2017-09-25 10:25:06 -06:00
Jason Wilder	94aba64b88	Re-use index entries slice when writing TSM index	2017-09-21 12:48:16 -06:00
Jason Wilder	db204f3eb7	Default concurrent compactions to 50% of available cores	2017-09-21 12:48:11 -06:00
Jason Wilder	deef0c5649	Fix 32bit alignment	2017-09-20 10:00:20 -06:00
Jason Wilder	61ca1243c7	Increase index disk writer buffer	2017-09-20 09:05:30 -06:00
Jason Wilder	796de3dcea	Reduce encoder pool checkout contention With higher cardinalities, the encoder pools where become a bottleneck. This changes the snapshot compactions ot checkout one encoder of each type and re-use it while writing the snapshots as opposed to repeatedly checking it out and in.	2017-09-19 15:27:26 -06:00
Jason Wilder	391a6288c6	Write parallel snapshot for higher cardinalities	2017-09-19 15:27:26 -06:00
Jason Wilder	0d52b060df	Skip onFileStoreReplace with tsi	2017-09-19 15:27:25 -06:00
Jason Wilder	4fe81aeee6	Remove manual Gosched from compactions At higher cardinalities, this dramatically slows down compaction throughput.	2017-09-19 15:27:25 -06:00
Jason Wilder	31e785d676	Don't deduplicate a single value	2017-09-19 15:27:25 -06:00
Jason Wilder	2ca9ccee1f	Reset snapshot cache outside of write lock	2017-09-19 15:27:25 -06:00
Jason Wilder	ddeba2c86b	Split large snapshots and write concurrently	2017-09-19 15:27:25 -06:00
Jason Wilder	9ee305f6f5	Periodically re-allocate cache store This perioically re-allocates the cache store to avoid memory fragmentation and gradual slow down of the store after repeated deletes and inserts into the map.	2017-09-19 15:27:25 -06:00
Jason Wilder	2885b9b310	Remove entrySizeHints map There is a lot of overhead for calculating the hints for larger cardinalities. This slows down resetting the partitions in the ring.	2017-09-19 15:27:25 -06:00
Jason Wilder	4124a8ed97	Simplify cache ring The continuum slice is not needed since the number of partitions doesn't change. This removes the slice to make the mapping simpler.	2017-09-19 15:27:25 -06:00
Stuart Carnie	ed7bc9d825	fix FindValues panic for empty array	2017-09-19 14:23:32 -07:00
Stuart Carnie	92756ec0ad	Reduce allocations, improve readEntries performance by simplifying loop * callers of ReadEntries and Key API can cache allocated slice	2017-09-19 11:57:10 -07:00
Stuart Carnie	baa05de3f8	add benchmarks	2017-09-19 11:47:48 -07:00
Stuart Carnie	cfc6a1cd9f	implement optimization for Include function ``` benchmark old ns/op new ns/op delta BenchmarkIntegerValues_IncludeNone_1000-8 651 6.69 -98.97% BenchmarkIntegerValues_IncludeMiddleHalf_1000-8 1131 114 -89.92% BenchmarkIntegerValues_IncludeFirst_1000-8 638 33.9 -94.69% BenchmarkIntegerValues_IncludeLast_1000-8 1269 32.2 -97.46% BenchmarkIntegerValues_IncludeNone_10000-8 7751 6.76 -99.91% BenchmarkIntegerValues_IncludeMiddleHalf_10000-8 11582 1378 -88.10% BenchmarkIntegerValues_IncludeFirst_10000-8 7911 43.8 -99.45% BenchmarkIntegerValues_IncludeLast_10000-8 12442 38.4 -99.69% ``` (cherry picked from commit fb93ad5)	2017-09-19 09:53:28 -07:00
Stuart Carnie	ca40c1ad3c	<type>Values.Exclude function uses binary search and copy builtin ``` ± benchcmp old.txt new.txt benchmark old ns/op new ns/op delta BenchmarkIntegerValues_ExcludeNone_1000-8 1285 7.34 -99.43% BenchmarkIntegerValues_ExcludeMiddleHalf_1000-8 1258 148 -88.24% BenchmarkIntegerValues_ExcludeFirst_1000-8 1268 7.51 -99.41% BenchmarkIntegerValues_ExcludeLast_1000-8 1125 27.7 -97.54% BenchmarkIntegerValues_ExcludeNone_10000-8 12665 7.31 -99.94% BenchmarkIntegerValues_ExcludeMiddleHalf_10000-8 12039 976 -91.89% BenchmarkIntegerValues_ExcludeFirst_10000-8 12663 7.29 -99.94% BenchmarkIntegerValues_ExcludeLast_10000-8 10990 34.9 -99.68% ``` (cherry picked from commit d7a3c23)	2017-09-19 09:53:26 -07:00
Jason Wilder	31646aae3a	Release mmap pages when shard is cold This instructs the kernel that it can release memory used by mmap'd TSM files when they are not actively being used. It the mappings are use, the kernel will fault the pages back in. On linux, this causes RES memory to drop immediately when run.	2017-09-18 11:51:51 -06:00
Jason Wilder	7d467c2047	Fix windows unmapping of anonymous index slice	2017-09-12 10:30:10 -06:00
Jason Wilder	b4b3c159cc	Fixup rebase	2017-09-11 17:04:10 -06:00
Jason Wilder	26f92ce6ac	Remove commented out code	2017-09-11 15:30:05 -06:00
Jason Wilder	820856347c	Don't use disk temp file for snapshots	2017-09-11 15:29:26 -06:00
Jason Wilder	4ed9c75896	Fix unmapping anonymous memory slice	2017-09-11 15:29:26 -06:00
Jason Wilder	97f7857715	Remove mutex on TSMWriter This isn't used by more than one goroutine so locks are unnecessary.	2017-09-11 15:29:26 -06:00
Jason Wilder	a93a5e9bdf	Include the size of the key in the cache size	2017-09-11 15:29:26 -06:00
Jason Wilder	7388eb9499	Use disk when writing TSM index	2017-09-11 15:29:25 -06:00
Jason Wilder	d3e832b462	Use offheap memory for indirect index offsets slice	2017-09-11 15:29:25 -06:00
Jason Wilder	91eb9de341	Use existing TSMReader from file store during compactions Compactions would create their own TSMReaders for simplicity. With very high cardinality compactions, creating the reader and indirectIndex can start to use a significant amount of memory. This changes the compactions to use a reader that is already allocated and managed by the FileStore.	2017-09-11 15:29:25 -06:00
Jason Wilder	739ecd2ebd	Fix a compaction planning bug There was a race where the plan returned was for files that were just compacted so the compaction would immediately abort.	2017-09-11 15:26:25 -06:00
Jason Wilder	bc4fb0ea10	Sort index entries if necessary These are already sorted during compaction, so switch to sorting lazily to avoid the CPU and allocations. This would only occur when using if using the writer directly.	2017-09-11 15:26:25 -06:00
Jason Wilder	f18dec6a4a	Use sorted slice for writing TSM index The directIndex used by the TSMWriter maintained a map of series keys to index entries. When the index is written to the TSM file, the keys are sorted and then written out in order. The reason for this is because directIndex used to be the only index and it was optimized more for reading. The reading has been replaced by the indirectIndex so the map of keys ends up wasting space. During compactions, the series keys (and index entries) are already sorted so this change uses the sorting to avoid the map and sort when writing the index. This reduces allocations and CPU usage quite a bit for larger cardinality TSM files.	2017-09-11 15:26:24 -06:00
Jason Wilder	2a0d7935d7	Switch level 3 compactions to use fast compaction strategy This leaves the slower compactions that create full blocks to only the full compaction. This helps reduce CPU usage and memory while shards are hot, but increases disk usage (reduced compression) slightly.	2017-09-11 15:26:24 -06:00
Jason Wilder	94e229ff59	Merge branch 'master' into jw-drop-series	2017-09-08 15:34:32 -06:00
Jason Wilder	78922f9821	Set rc to nil when closing WALSegmentReader	2017-09-08 14:55:02 -06:00
Jason Wilder	b9b648e2a0	Dynamically allocate cache store The cache store can be memory intensive with many shards. This lazyily allocates it when needed and frees it when the cache is empty and cold.	2017-09-07 16:35:08 -06:00
Jason Wilder	5581f8b4ae	Re-use WALSegmentReaders at startup	2017-09-07 12:56:17 -06:00
Jason Wilder	e39276b96f	Skip reading 0 byte wal segments	2017-09-07 12:24:54 -06:00
Jason Wilder	a8d9eeef36	Reduce lock contention when deleting high cardinality series Deleting high cardinality series could take a very long time, cause write timeouts as well as dead lock the process. This fixes these issue to by changing the approach for cleaning up the indexes and reducing lock contention. The prior approach delete each series and updated every index (inmem) during the delete. This was very slow and cause the index to be locked while it items in a slice were removed one by one. This has been changed to mark series as deleted and then rebuild the index asynchronously which speeds up the process. There was also a dead lock that could occur when deleing the field set. Deleting the field set held a write lock and the function it invoked under the lock could try to take a read lock on the field set. This would then deadlock. This approach was also very slow and caused time out for writes. It now uses faster approach that checks for the existing of the measurment in the cache and filestore which does not take write locks.	2017-09-07 11:36:02 -06:00
Jonathan A. Sternberg	590be193e5	Include the number of scanned cached values in the iterator cost	2017-09-06 15:41:07 -05:00
Jonathan A. Sternberg	50d404e690	Initial implementation of explain plan It prints the statistics of each iterator that will access the storage engine. For each access of the storage engine, it will print the number of shards that will potentially be accessed, the number of files that may be accessed, the number of series that will be created, the number of blocks, and the size of those blocks.	2017-09-01 09:01:10 -05:00
Jonathan A. Sternberg	466fc9026e	Reduce how long it takes to walk the varrefs in an expression This is used quite a bit to determine which fields are needed in a condition. When the condition gets large, the memory usage begins to slow it down considerably and it doesn't take care of duplicates.	2017-08-31 09:33:45 -05:00
Joe LeGasse	732a0c2eaa	Merge pull request #8769 from influxdata/jl-map-cleanup cleanup: remove poor usage of ',ok' with maps	2017-08-31 09:18:42 -04:00
Ben Johnson	1dbe0662d8	Use system cursors for measurement, series, and tag key meta queries.	2017-08-30 08:35:20 -06:00
Joe LeGasse	a95647b720	cleanup: remove poor usage of ',ok' with maps There are several places in the code where comma-ok map retrieval was being used poorly. Some were benign, like checking existence before issuing an unconditional delete with no cleanup. Others were potentially far more serious: assuming that if 'ok' was true, then the resulting pointer retrieved from the map would be non-nil. `nil` is a perfectly valid value to store in a map of pointers, and the comma-ok syntax is meant for when membership is distinct from having a non-zero value. There was only one or two cases that I saw that being used correctly for maps of pointers.	2017-08-30 09:49:31 -04:00
Edd Robinson	9be7c5aaa6	Run relevant engine tests on both indexes	2017-08-23 10:47:01 +01:00
Jason Wilder	d305b89f74	Merge pull request #8726 from influxdata/jw-tsm-file-leak Fix leaking tmp file when large compaction aborted	2017-08-22 09:59:23 -05:00
Stuart Carnie	2ef9b489f0	Merge pull request #8727 from influxdata/sgc-finalizer log message when iterator is closed by finalizer	2017-08-22 07:29:38 -07:00
Stuart Carnie	d189621d07	log message when iterator closed by finalizer	2017-08-21 16:46:24 -07:00
Jason Wilder	e265d150be	Fix leaking tmp file when large compaction aborted If a large compaction was running and was aborted. It could would leave some tmp files around for files that it had fully written. The current active file was cleaned up, but already completed ones would not. This would occur when a TSM file needed to rollover due to size.	2017-08-21 17:04:57 -06:00
Jonathan A. Sternberg	5ce6007347	Merge pull request #8724 from influxdata/js-remove-unused-cursor This cursor implementation appears to be completely unused	2017-08-21 17:44:51 -05:00
Jonathan A. Sternberg	c0f7a8af5b	This cursor implementation appears to be completely unused Remove it so that its existence doesn't confuse someone that this is actually the cursor. The real cursors appear to be in file_store.gen.go.	2017-08-21 16:27:23 -05:00
Stuart Carnie	25edd7bfdf	naming	2017-08-17 15:47:47 -07:00
Stuart Carnie	c86dc0d103	redundant allocation is overwritten by line 1769	2017-08-17 11:12:41 -07:00
Stuart Carnie	823f903cc6	inputs are closed if Merge returns error and use <type>FinalizerIterator * <type>FinalizerIterator sets a runtime finalizer and calls Close when garbage collected. This will ensure any associated cursors are closed and the associated TSM files released * `query.Iterators#Merge` call could return an error and the inputs would not be closed, causing a cursor leak	2017-08-17 11:12:18 -07:00
Jason Wilder	85842503be	Fix deadlock in engine/measurement fields The OnReplace func ends up trying to acquire locks on MeasurementFields. When its called via snapshotting, this can deadlock because the snapshotting goroutine also holds an RLock on the engine. If a delete measurement calls is run at the right time, it will lock the MeasurementFields and try to acquire a lock on the engine to disable compactions. This creates a deadlock. To fix this, the OnReplace callback is moved to a function param to allow only Replace calls as part of a compaction to invoke it as opposed to both snapshotting and compactions. Fixes #8713	2017-08-16 16:43:40 -06:00
Jonathan A. Sternberg	697759613c	Remove time comparisons from the inner sections of the storage engine	2017-08-16 16:51:13 -05:00
Jonathan A. Sternberg	9a2357c2c0	Separate the query engine into a separate package This change provides a clear separation between the query engine mechanics and the query language so that the language can be parsed and dealt with separate from the query engine itself.	2017-08-16 13:38:43 -05:00
Stuart Carnie	3caeee8a24	fix: cursor leak when cur == nil and aux or conds is not empty	2017-08-16 09:17:20 -07:00
Ben Johnson	e0d8cb0ef3	Cardinality AST, parser, & rewriter fixes.	2017-08-16 09:27:29 -06:00
Ben Johnson	60ab1282ea	Refactor system iterators. Previously pseudo iterators could be created for meta data such as series, measurement, and tag data. These iterators were created at a higher level and lacked a lot of the power of the query engine. This commit moves system iterators down to the series level and supports the following: - _name - _seriesKey - _tagKey - _tagValue - _fieldKey These can be used as normal fields such as: SELECT _seriesKey FROM cpu This will return all the series keys for `cpu`.	2017-08-16 09:27:29 -06:00
David Norton	1d8d739418	fix #8677 : check for snapshot size == 0	2017-08-16 09:43:56 -04:00
Jason Wilder	90e2cadeb6	Fix drop measurement not dropping all data If there were multiple shards, drop measurement could update the index and remove the measurement before the other shards ran their deletes. This causes the later shards to not see any series to delete. The fix is to all deleteSeries to handle the index delete which already accounts for removing the measurement when it is fully removed from the index.	2017-08-15 11:19:45 -06:00
Jason Wilder	61b13eb12b	Fix partiallyRead logic The partiallyRead func didn't account for the initial values and would return true for blocks that had not been read at all. This causes a slower path during compactions that forces a block to be decoded when it could just be merged as is without decoded. This causes compactions to consume more CPU and run slower at times.	2017-08-14 16:44:32 -06:00
Edd Robinson	aa7095be5a	Use a merge-based approach for TagValues	2017-08-02 14:10:52 +01:00
Jason Wilder	94a48774b7	Pull in new index filter	2017-08-02 14:10:52 +01:00
Stuart Carnie	5449285c4c	Merge pull request #8652 from influxdata/sgc-literal-cursor Reduce allocations using nil cursors and literal value cursors	2017-08-01 10:20:24 -07:00
Jason Wilder	173276a409	Remove unused filestore reference Reduces cursor struct size from 119 bytes to 111.	2017-08-01 09:41:16 -06:00
Stuart Carnie	ff65f0f24d	Reduce allocations using nil cursors and literal value cursors ``` benchmark old ns/op new ns/op delta BenchmarkIntegerIterator_Next-8 82.8 22.7 -72.58% benchmark old allocs new allocs delta BenchmarkIntegerIterator_Next-8 3 0 -100.00% benchmark old bytes new bytes delta BenchmarkIntegerIterator_Next-8 32 0 -100.00% ```	2017-07-30 09:15:34 -07:00
Jason Wilder	3d12c62121	Avoid repeatedly growning decoded values slices	2017-07-28 11:00:56 -06:00
Jason Wilder	778000435a	Conver all keys from string to []byte in TSM engine This switches all the interfaces that take string series key to take a []byte. This eliminates many small allocations where we convert between to two repeatedly. Eventually, this change should propogate futher up the stack.	2017-07-28 11:00:50 -06:00
Jason Wilder	8009da0187	Remove some extra cursor buffers that are not needed	2017-07-28 10:53:07 -06:00
Jason Wilder	6582caa78b	Reduce allocations when creating KeyCursors The refs map was to increment the file references one time each. It doesn't hurt to increment them multiple times though. We also do not need to copy the files slice as we are accessing it under a read lock so it can't be changed.	2017-07-28 10:53:07 -06:00
Jason Wilder	6e6cc991ee	Merge pull request #8629 from influxdata/jw-compaction-abort Interrupt in progress TSM compactions	2017-07-27 16:12:40 -06:00
Jason Wilder	18a02d50d7	Interrupt in progress TSM compactions When snapshots and compactions are disabled, the check to see if the compaction should be aborted occurs in between writing to the next TSM file. If a large compaction is running, it might take a while for the file to be finished writing causing long delays. This now interrupts compactions while iterating over the blocks to write which allows them to abort immediately.	2017-07-27 15:58:56 -06:00
Stuart Carnie	0c79ec6f17	update xxhash and use Sum64String to avoid allocs ``` ± benchcmp ring_before.txt ring_after.txt benchmark old ns/op new ns/op delta BenchmarkRing_getPartition_100-8 108 48.1 -55.46% BenchmarkRing_getPartition_1000-8 113 48.9 -56.73% benchmark old allocs new allocs delta BenchmarkRing_getPartition_100-8 1 0 -100.00% BenchmarkRing_getPartition_1000-8 1 0 -100.00% benchmark old bytes new bytes delta BenchmarkRing_getPartition_100-8 192 0 -100.00% BenchmarkRing_getPartition_1000-8 192 0 -100.00% ```	2017-07-26 10:16:54 -07:00
Stuart Carnie	d243df5ca3	simplify loop	2017-07-24 09:03:22 -07:00
Stuart Carnie	eec80692c4	Taught tsm1 storage engine how to read and write uint64 values * introduced UnsignedValue type * leveraged existing int64 compression algorithms (RLE, Simple 8B) * tsm and WAL can read and write UnsignedValue * compaction is aware of UnsignedValue * unsigned support to model, cursors and write points NOTE: there is no support to create unsigned points, as the line protocol has not been modified.	2017-07-24 09:03:22 -07:00
Jason Wilder	4244d0e053	Merge pull request #8568 from influxdata/jw-tombstone-compress Compress tombstone files	2017-07-10 11:28:09 -06:00
Jason Wilder	dba3ce1a42	Merge pull request #8576 from influxdata/jw-delete-index Fix index inconsistency after deletes	2017-07-07 14:36:33 -06:00
Jason Wilder	e9370e0b86	Fix indefinite hang in WAL.writeToLog There was a race in the WAL writeToLog and scheduleSync which could lead to a writing goroutine blocking indefinitely on its syncErr channel. The issue was that the clearing of the syncCount happenend after the wal was unlock. If a goroutine was able to lock, write and call scheduleSync before the existing scheduleSync goroutine returns and ran the defer to clear the syncCount, then a new scheduleSync goroutine would not get started. This left the writing goroutine block with nothing to signal it. While in this state, a RLock on the engine was held. If a Lock was requested on the engine during this time, all future writes and queries would block waiting on the blocked wal writer. The fix is to move the atomic clearing of syncCount before the Lock is released.	2017-07-07 13:31:52 -06:00
Jason Wilder	5e11cdcdd7	Fix incorrect condition in OverlapsKeyRange The min key was not used in OverlapsKeyRange which caused it to return false when it should be true. This causes a bug where deletes would not write tombstones for files that actually contained the data it was supposed to delete.	2017-07-07 12:19:33 -06:00
Jason Wilder	839cddf6d5	Refresh index after compactions The in-memory index can get out of sync when deletes and writes to the same measurement are running concurrently. The index is updated independently from data on disk and it's possible for the index to unassign a shard when data still exists on disk. What happens is that there are TSM files on disk, but the index does not know that the series that exist in those files still are in the shard. Restarting the server reloads the index and the data is visible again. From and end user perspective, this can look like more data is deleted than should have been or that deleted data re-appears after a restart or writes to the shard occur again. There isn't an easy way to resolve this since the index and storage are not transactional resources and we cannot atomically commit or rollback changes to both at once. As a workaround, after new TSM files are installed, we refresh the index with series keys that exist in the new tsm files as well as any lingering data still in the cache. There is a small window of time when the index may be missing series, but it will re-appear after the refresh completes.	2017-07-07 12:19:30 -06:00
Jason Wilder	3e7dfad7c4	Compress tombstone files This adds a v3 format that is a gzip compressed version of the v2 format. It reduces the size of tombstone files substantially without having to support a more feature rich file format for tombstones.	2017-07-06 10:10:31 -06:00
Jason Wilder	9ac042b5cd	Reduce lock contention when disabling compactions The monitor goroutine calls enable compactions every 10s to spin down (or start up) goroutines for cold shards. This frequent Lock may be causing lock contention for writes and queries which get blocked trying to acquire an RLock. The go RWMutex says that new RLock calls will block if there is a pending Lock call that is blocked. Switching the common path to use an RLock should avoid the Lock and reduce lock contention for writes and queries.	2017-07-05 15:42:21 -06:00
Edd Robinson	101af89987	Update CHANGELOG	2017-07-05 16:35:41 +01:00
Edd Robinson	0748d28986	Ensure tmp files cleaned up when compaction disabled	2017-07-04 20:04:23 +01:00
Stuart Carnie	46796d932f	add database to index, engine and shard; call AuthorizeSeriesRead	2017-05-26 13:21:50 -07:00
Joe LeGasse	815f740f4c	initial fga work wip wip fix tests / build	2017-05-26 13:16:27 -07:00
Jason Wilder	208ef09f87	Prevent writing series keys that exceed max key size WriteBlock was missing the check for the max series keys which allowed series keys to be written that were larger than the 2 bytes allocated to store their length. When this occurred, the TSM can fail to load.	2017-05-24 13:41:09 -06:00
Jason Wilder	29e4287fd2	Preven masking root errors when compactions are in progress The root error when creating a tmp file when writing a snapshot was hidden making it difficult to determine why snapshots were failing.	2017-05-23 12:09:36 -06:00
Jason Wilder	bd6d0681e9	Ensure planned files are released The defer was never executed because the planning happens in a long running goroutine that loops. The plans need to be released immediately after applying them.	2017-05-23 12:08:25 -06:00
Jason Wilder	4e582f297a	Fix race in findGenerations It was possible that the findGenerations could get stuck returning no files even when generations existed on disk.	2017-05-23 12:05:47 -06:00
Jason Wilder	1833475c09	Fix TSM tmp files leaking TMP files could leak when compactions failed for various reasons. They were also being deleted inadvertently when compactions were disabled causing other errors to be reported in the logs.	2017-05-22 14:51:18 -06:00
Stuart Carnie	c863923e68	cache MarshalSize	2017-05-12 14:05:25 -06:00
Stuart Carnie	0151afe31c	check size and allocate once	2017-05-12 14:05:25 -06:00
Stuart Carnie	096d6f65b4	explicit sizes	2017-05-12 14:05:24 -06:00
Jason Wilder	4d002bb370	Limit concurrent compactions within a shard This changes full compactions within a shard to run sequentially instead of running all the compaction groups in parallel. Normally, there is only 1 full compaction group to run. At times, there could be several which causes instability if they are all running concurrently as they tie up a cpu for long periods of time. Level compactions are also capped to a max of 4 concurrently running for each level in a shard. This prevents sudden spikes in CPU and disk usage due to a large backlog of tsm files at a given level.	2017-05-12 14:05:24 -06:00
Jason Wilder	2cac46ebbc	Convert usage of strings to []byte Measurement name and field were converted between []byte and string repetively causing lots of garbage. This switches the code to use []byte in the write path.	2017-05-12 14:05:19 -06:00
Jason Wilder	503d41a08f	Add LimitedBytePool for wal buffers This pool was previously a pool.Bytes to avoid repetitive allocations. It was recently switchted to a sync.Pool because pool.Bytes held onto very larger buffers at times which were never released. sync.Pool is showing up in allocation profiles quite frequently. This switches the pool to a new pool that limits how many buffers are in the pool as well as the max size of each buffer in the pool. This provides better bounds on allocations.	2017-05-11 11:27:00 -06:00
Jason Wilder	e17be9f4ba	Merge pull request #8377 from influxdata/jw-encoders Speed up time encoding/decoding	2017-05-11 10:38:27 -06:00
Joe LeGasse	087d9f4670	tsm: fixed test to not require sorted backup tarball	2017-05-11 12:00:19 -04:00
Jason Wilder	b150a6293c	Merge pull request #8380 from influxdata/jw-wal-buffer Use buffer writer for wal segments	2017-05-11 08:34:44 -06:00
Jason Wilder	b81ac21bcb	Merge pull request #8378 from influxdata/jw-snapshot-disable Don't disable snapshots when snapshot compactions are disabled	2017-05-10 12:00:27 -06:00
Jason Wilder	e102fcca9c	Use buffer writer for wal segments	2017-05-10 11:42:32 -06:00
Jason Wilder	39a829c1ae	Speed up time encoding/decoding This speeds up time encoding and decoding by skipping the divisor scaling if scaling by 1. Since division and multiplication are expensive cpu and scaling by 1 has no effect, this just slows encoding and decoding down.	2017-05-10 11:12:35 -06:00
Jason Wilder	4e3e707abc	Fix packed time encoded benchmark	2017-05-10 10:35:44 -06:00
Jason Wilder	29c2b1958e	Fix deletes triggering unnecessary compactions Tombstone files would be written to all TSM files even if the deleted keys or timerange did not exist in the TSM file. This had the side effect of causing shards to get recompacted back to the same state. If any shards or large numbers of TSM files existed, disk usage and CPU utilization would spike causing issues. This prevents tombstones being written for TSM files that could not possiby contain the series keys being deleted or if the delted time range is outside the range of the file.	2017-05-08 14:52:28 -06:00
Jason Wilder	c0c6ad6880	Don't disable snapshots when snapshot compactions are disabled Snapshot compactions can be disabled independently of snapshotting capability. This prevents taking backups of shards that have compactions disabled.	2017-05-05 14:15:45 -06:00
Jason Wilder	bc639c5982	Make disableLevelCompactions lighter weight Since this is called more frequently now, the cleanup func was invoked quite a bit which makes several syscalls per shard. This should only be called the first time compactions are disabled.	2017-05-04 09:56:15 -06:00
Jason Wilder	b4ea523910	Include snapshot size in the total cache size This was causing a shard to appear idle when in fact a snapshot compaction was running. If the time was write, the compactions would be disabled and the snapshot compaction would be aborted.	2017-05-03 16:31:58 -06:00
Jason Wilder	88848a9426	Remove per shard monitor goroutine The monitor goroutine ran for each shard and updated disk stats as well as logged cardinality warnings. This goroutine has been removed by making the disks stats more lightweight and callable direclty from Statisics and move the logging to the tsdb.Store. The latter allows one goroutine to handle all shards.	2017-05-03 16:31:57 -06:00
Jason Wilder	f87fd7c7ed	Stop background compaction goroutines when shard is cold Each shard has a number of goroutines for compacting different levels of TSM files. When a shard goes cold and is fully compacted, these goroutines are still running. This change will stop background shard goroutines when the shard goes cold and start them back up if new writes arrive.	2017-05-03 16:31:57 -06:00
Jason Wilder	3d1c0cd981	Don't return compaction plans for files already part of a plan The compactor prevents the same file from being compacted by different compaction runs, but it can result in warning errors in the logs that are confusing. This adds compaction plan tracking to the planner so that files are only part of one plan at a given time.	2017-05-03 16:31:57 -06:00
Jason Wilder	8fc9853ed8	Add max-concurrent-compactions limit This limit allows the number of concurrent level and full compactions to be throttled. Snapshot compactions are not affected by this limit as then need to run continously. This limit can be used to control how much CPU is consumed by compactions. The default is to limit to the number of CPU available.	2017-05-03 16:31:57 -06:00
Jason Wilder	3c130cd39c	Expose TSMWriter.Flush Allows flushing the writer so we don't always need to close and re-open the file handle.	2017-04-28 14:00:50 -06:00
Jason Wilder	141f0d71cd	Update index when import files	2017-04-28 14:00:45 -06:00
Jason Wilder	a76146e34a	Add Store.Import capability This allows the contents of a backup to be imported into a shard without requiring the whole shard to be replaced.	2017-04-28 13:30:46 -06:00
Jason Wilder	3839fe34ea	Remove FileStore.Add/Remove Can use Replace which handles files in-use and stats correctly.	2017-04-28 13:20:55 -06:00
Jason Wilder	137d0c0d09	Rename WAL.WritePoints to WAL.WriteMulti To match Cache.WriteMulti	2017-04-28 13:20:55 -06:00
Jason Wilder	28422f2fec	Use consistent receiver var name for Value types	2017-04-28 13:20:55 -06:00
Jason Wilder	1bc4936336	Export Reader.ReadBytes	2017-04-28 13:20:55 -06:00
Jason Wilder	d88604f6f2	Move repetive loop checks outside of values loop	2017-04-20 13:45:04 -06:00
Jason Wilder	888689f5d3	Move values loop under type switch All the values read must be of the same type so repeatedly using the type switch is confusing and less efficiient.	2017-04-20 13:39:49 -06:00
Jason Wilder	b0988511bf	Use fixed size array instead of slice	2017-04-20 13:38:33 -06:00
Jason Wilder	da6bdfdda8	Use bufio.Reader when reading wal segments Reduces disk IO due to small reads.	2017-04-20 13:33:42 -06:00
Jason Wilder	8e9cbd7ffc	Simplify WALSegmentReader.UnmarshalBinary There were two loops over nvals which created some extra allocation which coudl be replaced with a simplet slice capacity and append.	2017-04-20 13:33:42 -06:00
Jason Wilder	ef65ee77f4	Switch WAL byte pools to sync/pool The current bytes.Pool will hold onto byte slices indefinitely. Large writes can cause the pool to hold onto very large buffers over time. Testing w/ sync/pool seems to perform similarly now so using a sync/pool will allow these buffers to be GC'd when necessary.	2017-04-20 12:28:42 -06:00
Jason Wilder	d155d37ca8	Reduce TSM write buffer When many TSM files are being compacted, the buffers can add up fairly quickly.	2017-04-20 12:28:42 -06:00
Jason Wilder	d7c5dd0a3e	Reduce wal sync goroutine churn Under high write load, the sync goroutine would startup, and end very frequently. Starting a new goroutine so frequently adds a small amount of latency which causes writes to take long and sometimes timeout. This changes the goroutine to loop until there are no more waiters which reduce the churn and latency.	2017-04-20 12:28:34 -06:00
Jason Wilder	aa9925621b	Fix deadlock in wal If the sync waiters channel was full, it would block sending to the channel while holding a the wal write lock. The sync goroutine would then be stuck acquiring the write lock and could not drain the channel. This increases the buffer to 1024 which would require a very high write load to fill as well as retuns and error if the channel is full to prevent the blocking.	2017-04-19 11:33:13 -06:00
Jason Wilder	5c51ae7319	Merge branch '1.2' into jw-merge-123	2017-04-14 14:36:54 -06:00
Jason Wilder	ff1270dfeb	Fix dropping fields created data corruption The Point is intended to be immutable after being parsed since it is shared by several goroutines. When dropping a field (e.g. time), corrupted data can result if one goroutine is delete the field while another is marshaling the underlying byte slices. To avoid this, the shard will just skip invalid fields and series instead of trying to mutate them by deleting them.	2017-04-07 12:58:42 -06:00
Ben Johnson	9c97cd8601	Merge remote-tracking branch 'upstream/master' into tsi	2017-04-04 12:46:09 -06:00
Jason Wilder	5fa8073fc2	Merge branch '1.2' into jw-merge-123	2017-04-04 11:12:06 -06:00
Jason Wilder	84cbee227a	Fix file store not close all TSM files Regression added via #8192	2017-04-04 10:58:51 -06:00
Jason Wilder	4f850b5cff	Skip TestCache_Deduplicate_Concurrent on windows	2017-04-04 08:48:55 -06:00
Jason Wilder	8da84e6144	Merge branch 'master' into tsi	2017-04-03 11:21:02 -06:00
Jason Wilder	32c4d43952	Speed up drop measurement This reworks drop measurement to use a sorted list of series keys instead of creating an intermediate map. It remove allocations and some extra garbage that is created during drop measurement.	2017-04-03 08:57:53 -06:00
Jason Wilder	a78da51b7c	Use buffered writer when writing tombstones When deleting many series, the many small writes flood the disks and consume a lot of CPU time.	2017-04-03 08:57:52 -06:00
Jason Wilder	6232d5e56d	Remove defer allocations in TSMReader	2017-04-03 08:57:52 -06:00
Jason Wilder	920c8396c6	Use sorted merge in FileStore.WalkKeys WalkKeys serially walked each TSM file and invoked fn for each key. Caller needed to handle duplicate calls to fn with the same key because the same key could exist in multiple TSM files. The serial execution was also slower. Since the series keys are already sorted, we can iterate over all files in parallel and skip duplicates using a sorted merge. This fixes the duplicate invocation issue as well as speeds up walking all keys. This can significant improve startup performance when many TSM files exists that may not have been fully compacted. This also has benefits for deletes (measurements/series) since duplicates are removed saving extra allocations and work. This may also allow for the optimize compaction to be removed provided startup times are fast enough.	2017-04-03 08:57:52 -06:00
Edd Robinson	fddaff2cc8	Merge master in	2017-03-29 18:00:28 +01:00
Ben Johnson	2edfb1c92d	Ignore series limit on database load.	2017-03-24 16:27:16 -06:00
Ben Johnson	9fb8f1ec1d	Fix database and tag limits.	2017-03-24 09:48:10 -06:00
Jason Wilder	631681796d	Remove tsl file committed by mistake	2017-03-23 16:18:27 -06:00
Jason Wilder	7119ef8f29	Merge pull request #8193 from influxdata/jw-123-backports 1.2.3 backports	2017-03-23 13:31:35 -06:00
Jason Wilder	ca1919e5de	Use standard merge algorithm for merging values The previous version was very innefficient due to the benchmarks used to optimize it having a bug. This version always allocates a new slice, but is O(n).	2017-03-23 12:53:59 -06:00
Jason Wilder	ba2571903d	Fix broken Values.Merge benchmark Merge had the side effect of modifying the original values so the results are wrong because they always hit the fast path after the first run.	2017-03-23 12:53:50 -06:00
Jason Wilder	890ffb4ce8	Generate encode*Values funcs	2017-03-23 12:53:29 -06:00
Jason Wilder	ced953ae89	Use typed values to avoid allocations This switches compactions to use type values (FloatValues) from the generic Values type. It avoids a bunch of allocations where each value much be converted from a specific type to an interface{}.	2017-03-23 12:53:17 -06:00
Jason Wilder	a1c84ae6f3	Add block type for BlockIterator	2017-03-23 12:49:17 -06:00
Jason Wilder	2972a3f223	Remove MMAP derefencing code This code was added to address some slow startup issues. It is believed to be the cause of some segfault panic's that occur at query time when the underlying MMAP array has been unmapped. The current structure of code makes this change unnecessary now.	2017-03-23 12:46:23 -06:00
Jason Wilder	61f80db1b9	Skip cardinaltiy dups on circle race test	2017-03-22 15:20:38 -06:00
Jason Wilder	c443e639b0	Fix 32bit alignment issue in wal.sync	2017-03-22 11:21:29 -06:00
Ben Johnson	afe41f1c80	Fix tsm1/tsi1 broken tests.	2017-03-21 12:21:48 -06:00
Jason Wilder	8f7b251afd	Merge branch 'master' into jw-tsi	2017-03-20 17:17:26 -06:00
Jason Wilder	8177df2dab	Simplify Measurement.TagSets signature	2017-03-17 16:19:10 -06:00
Jason Wilder	2d5d899ac2	Allow queries to be interrupted during planning If a bad query is run, kill query and limits would not kick in until after it started executing. Some bad queries that involve high cardinality can cause the server to OOM just from planning which defeats the purpose of the max-select-series limit. This change primarily fixes max-select-series limit so that the query is killed earlier and has the side effect that kill query now can kill a query while it's being planned.	2017-03-17 16:00:54 -06:00
Jason Wilder	bc4aeefbed	Check max-series-limit in shard iterator creation The limit waited until all the iterators had been created which still allows problem queries to be planned. This allows the queries to be aborted much earlier in some cases.	2017-03-17 16:00:25 -06:00
Jason Wilder	e9eb925170	Coalesce multiple WAL fsyncs Fsyncs to the WAL can cause higher IO with lots of small writes or slower disks. This reworks the previous wal fsyncing to remove the extra goroutine and remove the hard-coded 100ms delay. Writes to the wal still maintain the invariant that they do not return to the caller until the write is fsync'd. This also adds a new config options wal-fsync-delay (default 0s) which can be increased if a delay is desired. This is somewhat useful for system with slower disks, but the current default works well as is.	2017-03-15 16:31:03 -06:00
Ben Johnson	1807772388	Fix tsi tests.	2017-03-15 11:23:58 -06:00
Ben Johnson	cf7ba96377	Merge branch 'tsi-log-compact' into tsi	2017-03-15 10:18:40 -06:00
Ben Johnson	358b1e0b05	Merge remote-tracking branch 'upstream/master' into tsi	2017-03-15 10:13:32 -06:00
Jason Wilder	65464ea0d1	Merge pull request #8131 from influxdata/jw-values-merge Use standard merge algorithm when merging Values	2017-03-15 09:51:21 -06:00
Jason Wilder	a4cfeacedb	Use standard merge algorithm for merging values The previous version was very innefficient due to the benchmarks used to optimize it having a bug. This version always allocates a new slice, but is O(n).	2017-03-15 08:59:41 -06:00
Jason Wilder	4d37c9dc9e	Fix broken Values.Merge benchmark Merge had the side effect of modifying the original values so the results are wrong because they always hit the fast path after the first run.	2017-03-14 14:20:24 -06:00
Jason Wilder	ca9c67a877	Generate encode*Values funcs	2017-03-14 11:54:53 -06:00
Jason Wilder	2f7d4995b4	Use typed values to avoid allocations This switches compactions to use type values (FloatValues) from the generic Values type. It avoids a bunch of allocations where each value much be converted from a specific type to an interface{}.	2017-03-09 16:27:07 -07:00
Jason Wilder	78b7815c49	Add block type for BlockIterator	2017-03-09 09:16:59 -07:00
Jason Wilder	b9e5375043	Merge branch '1.2' into jw-merge-12	2017-03-08 13:16:50 -07:00
Jason Wilder	37187cbe6d	Delete series under fields lock Still seeing the panic that switching this logic around was supposed to fix. We now delete the bulk of data outside of the fields lock and then again, under the write lock, to ensure that the field mapping is accurate. We don't do the full delete under the lock because it can block writes and queries that require a read lock.	2017-03-06 14:19:55 -07:00
Jason Wilder	675d7c9d65	Merge branch '1.2' into jw-merge12	2017-03-06 11:09:05 -07:00
Jason Wilder	eab012ef61	Fix points missing after compaction If blocks containing overlapping ranges of time where partially recombined, it was possible for the some points to get dropped during compactions. This occurred because the window of time of the points we need to merge did not account for the partial blocks created from a prior merge. Fixes #8084	2017-03-06 10:17:11 -07:00
Jason Wilder	3c70abf061	Delete series before remove from field index There is a race where the field type can be deleted while a new type is written and during a query. When this happens, an iterator for the new type is created but old data make still exist in the cache for TSM files causing a panic.	2017-03-06 09:38:27 -07:00
Jason Wilder	29f8d8de76	Fix race in WALEntry.Encode and Value.Deduplicate Under high query load, a race exists in the cache and the WAL. Since writes currently hit the cache first, they are availble for query before they hit the WAL. If the WAL is writing and accessign the Value slice at the same time that a query is run that needs to dedup the same slice, a race occurs. To fix this, the cache now just copies the values instead of storing the slice passed in. Another way to fix this might be to have the writes go to the wal before the cache. I think the latter would be better, but it introduces some larger write path issues that we'd need to also address. e.g. if the cache was full, writes to the WAL would need to be rejected to avoid filling the disk. Copying the slice in the cache is simpler for now and does not appear to dramatically affect performance.	2017-03-06 09:38:22 -07:00
Jason Wilder	a024003f2c	Merge branch '1.2' into jw-merge-12	2017-02-22 12:13:29 -07:00
Ben Johnson	78a9bb2527	Remove Tags.shouldCopy, replace with forceCopy on series creation. Previously, tags had a `shouldCopy` flag to indicate if those tags referenced an underlying buffer and should be copied to allow GC. Unfortunately, this prevented tags from being copied that were created and referenced the mmap which caused segfaults. This change removes the `shouldCopy` flag and replaces it with a `forceCopy` argument in `CreateSeriesIfNotExists()`. This allows the write path to indicate that tags must be cloned on insert.	2017-02-21 11:13:35 -07:00
Mark Rushakoff	601cbcd084	Merge branch '1.2' into mr-merge-12	2017-02-17 16:14:22 -08:00
Jonathan A. Sternberg	2fe48d6781	Rename zap import back to github.com/uber-go/zap They rebased a revision we were previously relying upon that allowed us to use the vanity name so we are reverting back to an older version with the old import path.	2017-02-17 17:17:22 -06:00
Ben Johnson	673143a0ad	Remove .tsl file.	2017-02-15 08:44:01 -07:00
Jason Wilder	4b6289ce58	Merge pull request #7942 from influxdata/jw-cache-partitions Reduce write timeouts	2017-02-10 10:07:08 -07:00
Edd Robinson	38eb6d5994	Don't load meta data for tsi	2017-02-09 18:04:23 +00:00
Edd Robinson	a6a2f9d5f0	Don't load meta data for tsi	2017-02-09 17:59:14 +00:00
Jason Wilder	2f74e3f3d5	Use simple8b.CountBytes to avoid allocations	2017-02-09 10:47:03 -07:00
Jason Wilder	1bc0f68490	Merge branch '1.2' into jw-merge-12	2017-02-07 12:48:36 -07:00
Jonathan A. Sternberg	e1fa48d0dd	Fix ORDER BY time DESC with ordering series keys The order of series keys is in ascending alphabetical order, not descending alphabetical order, when it is ordered by descending time. This fixes the ordering so points are returned in descending order. The emitter also had the conditions for choosing which iterator to use in the wrong direction (which only affects aggregates with `FILL(none)`).	2017-02-06 15:49:12 -06:00
Jonathan A. Sternberg	95831b3307	Fix LIMIT and OFFSET when they are used in a subquery This fixes LIMIT and OFFSET when they are used in a subquery where the grouping of the inner query is different than the grouping of the outer query. When organizing tag sets, the grouping of the outer query is used so the final result is in the correct order. But, unfortunately, the optimization incorrectly limited the number of points based on the grouping in the outer query rather than the grouping in the inner query. The ideal solution would be to use the outer grouping to further organize it by the grouping for the inner subquery, but that's more difficult to do at the moment. As an easier fix, the query engine now limits the output of each series. This may result in these types of queries being slower in some situations like this one: SELECT mean(value) FROM (SELECT value FROM cpu GROUP BY host LIMIT 1) This will be slower in a situation where the `cpu` measurement has a high cardinality and many different tags. This also fixes `last()` and `first()` when they are used in a subquery because those functions use `LIMIT 1` as an internal optimization.	2017-02-06 14:04:34 -06:00
Jason Wilder	93a9d01643	Increase default waiting WAL writes	2017-02-06 11:48:51 -07:00
Jason Wilder	38a649fc40	Batch multiple WAL fsyncs Every write to the WAL current runs and fsync before returning. When there are lot of concurrent writes, this can cause the WAL to bottleneck write throughput since fsyncs are very expensive. This changes the writeToLog to fsync on an interval to allow multiple fsyncs calls to be batched up into one. The writeToLog behavior is the same in that it won't return until an fsync has been performed.	2017-02-06 11:48:45 -07:00
Ben Johnson	d91e6eabac	Add max-values-per-tag to inmem index.	2017-02-06 11:14:13 -07:00
Ben Johnson	57f44d5f0c	Include index in snapshot.	2017-02-01 14:19:42 -07:00
Jason Wilder	54ab3a7a0a	Don't write lock file store when opening new files When replacing TSM files, the new files can be opened before the write lock is taken to reduce lock contention in this code path.	2017-02-01 11:11:26 -07:00
Jason Wilder	6eb46d2100	Remove unnecessary read lock on engine	2017-02-01 11:10:41 -07:00
Jason Wilder	784a851742	Release cpu during compactions	2017-01-31 17:04:36 -07:00
Jason Wilder	278c1449d6	Increase number of cache partitions	2017-01-31 16:49:57 -07:00
Ben Johnson	047c21f4d9	Merge remote-tracking branch 'upstream/master' into tsi	2017-01-24 09:28:58 -07:00
Edd Robinson	feb7a2842c	Use unbuffered error channels in tests	2017-01-17 10:53:15 -08:00
Edd Robinson	fb7388cdfc	Remove dead code from various pkgs	2017-01-17 09:47:34 -08:00
Edd Robinson	292b30b82b	Fix subtle bugs and remove dead code from tsdb	2017-01-17 09:47:34 -08:00
Joe LeGasse	bf58d9ffb7	Update backup to use ioutil.ReadDir	2017-01-12 16:28:01 -05:00
Jason Wilder	11f264563a	Fix 32bit alignment	2017-01-12 12:01:49 -07:00
Jason Wilder	06a8fd6ca2	Simplifications and cleanup	2017-01-12 09:55:38 -07:00
Edd Robinson	73ed864e1d	Add cache tests	2017-01-12 16:27:16 +00:00
Jason Wilder	1e56b5416b	Fix compactions sometimes getting stuck I ran into an issue where the cache snapshotting seemed to stop completely causing the cache to fill up and never recover. I believe this is due to the the Timer being reused incorrectly. Instead, use a Ticker that will fire more regularly and not require the resetting logic (which was wrong).	2017-01-11 17:57:40 -07:00
Jason Wilder	40b017f4a4	Fix Cache stats size collection The memory stats as well as the size of the cache were not accurate. There was also a problem where the cache size would be increased optimisitically, but if the cache size limit was hit, it would not be decreased. This would cause the cache size to grow without bounds with every failed write.	2017-01-11 17:54:51 -07:00
Jason Wilder	c433ff331f	Encode snapshots concurrently The CacheKeyIterator (used for snapshot compactions), iterated over each key and serially encoded the values for that key as the TSM file is written. With many series, this can be slow and will only use 1 CPU core even if more are available. This changes it so that the key space is split amongst a number of goroutines that start encoding all keys in parallel to improve throughput.	2017-01-11 17:54:27 -07:00
Jason Wilder	ae838ef323	Simplify Cache.Snapshot This simplifies the cache.Snapshot func to swap the hot cache to the snapshot cache instead of copy and appending entries. This reduces the amount of time the cache is write locked which should reduce cache contention for the read only code paths.	2017-01-11 11:12:02 -07:00
Jonathan A. Sternberg	3ba950b029	Fix for subqueries to use the parallel iterator correctly Also, fix the `Iterators.Merge(IteratorOptions)` function so it consults the `Ordered` attribute to determine which iterator it should use to merge the input iterators.	2017-01-11 10:47:18 -06:00
Jonathan A. Sternberg	b58d1778e2	Remove improper newlines from logging statements	2017-01-10 11:20:09 -06:00
Mark Rushakoff	a135906b43	Merge pull request #7747 from influxdata/mr-lint-cleanup Miscellaneous lint cleanup	2017-01-10 08:22:00 -08:00
Mark Rushakoff	3b3604e362	Fix race in (*tsm1.Cache).values Without this read lock, this race would happen during a concurrent snapshot compaction and query.	2017-01-09 14:48:28 -08:00
Jonathan A. Sternberg	4a559c4620	Merge pull request #7646 from influxdata/js-4619-subqueries Support subquery execution in the query language	2017-01-09 14:14:01 -06:00
Jason Wilder	eb4d311c0a	Add retry/backup when backing up a shard fails The backup command can fail if a snapshot is running which silently closes the connection. This causes the backup shard command to continue on as if nothing failed.	2017-01-09 11:28:48 -07:00
Jason Wilder	194c5adfaf	Fix race on t.refs Read at 0x00c42018f620 by goroutine 58: github.com/influxdata/influxdb/tsdb/engine/tsm1.(TSMReader).Close() /root/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/reader.go:330 +0x94 github.com/influxdata/influxdb/tsdb/engine/tsm1.(FileStore).Close() /root/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/file_store.go:464 +0x123 Previous write at 0x00c42018f620 by goroutine 63: sync/atomic.AddInt64() /usr/local/go/src/runtime/race_amd64.s:276 +0xb github.com/influxdata/influxdb/tsdb/engine/tsm1.(TSMReader).Unref() /root/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/reader.go:352 +0x43 github.com/influxdata/influxdb/tsdb/engine/tsm1.(KeyCursor).Close()	2017-01-07 12:39:45 -07:00
Jonathan A. Sternberg	d7c8c7ca4f	Support subquery execution in the query language This adds query syntax support for subqueries and adds support to the query engine to execute queries on subqueries. Subqueries act as a source for another query. It is the equivalent of writing the results of a query to a temporary database, executing a query on that temporary database, and then deleting the database (except this is all performed in-memory). The syntax is like this: SELECT sum(derivative) FROM (SELECT derivative(mean(value)) FROM cpu GROUP BY *) This will execute derivative and then sum the result of those derivatives. Another example: SELECT max(min) FROM (SELECT min(value) FROM cpu GROUP BY host) This would let you find the maximum minimum value of each host. There is complete freedom to mix subqueries with auxiliary fields. The only caveat is that the following two queries: SELECT mean(value) FROM cpu SELECT mean(value) FROM (SELECT value FROM cpu) Have different performance characteristics. The first will calculate `mean(value)` at the shard level and will be faster, especially when it comes to clustered setups. The second will process the mean at the top level and will not include that optimization.	2017-01-07 13:00:48 -06:00
Mark Rushakoff	153277c01d	Merge pull request #7786 from influxdata/mr-cache-decrease-size Use one atomic operation in (*Cache).decreaseSize	2017-01-06 10:17:01 -08:00
Ben Johnson	2b3cd415e2	Fixing rebase.	2017-01-06 09:52:16 -07:00
Ben Johnson	d1f1e19591	Fixing rebase.	2017-01-06 09:31:25 -07:00
Ben Johnson	c1c98223ec	Fix and optimize tsi1 FileSet.	2017-01-05 10:17:12 -07:00
Ben Johnson	9b1e8215e0	Remove dictionary encoding, add bulk series insertion.	2017-01-05 10:17:11 -07:00
Ben Johnson	9bd19cdc69	Fix inmem DELETE SERIES.	2017-01-05 10:17:11 -07:00
Ben Johnson	f9efcb3365	Re-add shared in-memory index.	2017-01-05 10:17:09 -07:00
Edd Robinson	0f9b2bfe6a	Fix tests	2017-01-05 10:16:15 -07:00
Edd Robinson	4ccb8dbab1	Move series count check to shard	2017-01-05 10:16:13 -07:00
Ben Johnson	745b1973a8	tsi compaction	2017-01-05 10:15:37 -07:00
Ben Johnson	183418dcbd	Fix tsi TAG KEYS iterator.	2017-01-05 10:15:36 -07:00
Ben Johnson	9f8b206b51	Fix measurement system queries.	2017-01-05 10:15:34 -07:00
Ben Johnson	4aa78383d1	Fix tsi1 series deletion.	2017-01-05 10:14:48 -07:00
Ben Johnson	e7940cc556	Add tsi1 series system iterator.	2017-01-05 10:14:00 -07:00
Ben Johnson	87f4e0ec0a	Add regex support in tsi1.	2017-01-05 10:12:29 -07:00
Jason Wilder	1ba64f3610	Disable max-value-per-tag option temporarily This is too slow currently and causes all writes to timeout.	2017-01-05 10:11:47 -07:00
Ben Johnson	fbe7f464ee	Improve insert performance.	2017-01-05 10:11:12 -07:00
Ben Johnson	cb93f10120	Remove per-shard in-memory index.	2017-01-05 10:11:09 -07:00
Ben Johnson	409b0165f5	shared in-memory index	2017-01-05 10:09:57 -07:00
Ben Johnson	a812502ea3	reintegrating in-memory index	2017-01-05 10:07:35 -07:00
Ben Johnson	1ac067e53b	intermediate	2017-01-05 10:03:09 -07:00
Ben Johnson	fda84955ea	Remove TODO	2017-01-05 10:02:42 -07:00
Ben Johnson	62d2b3ebe9	Series filtering.	2017-01-05 10:02:42 -07:00
Ben Johnson	62269c3cea	intermediate	2017-01-05 10:02:41 -07:00
Ben Johnson	5f5b02e052	intermediate	2017-01-05 10:01:49 -07:00
Ben Johnson	8863e3c0f3	Refactor tsi1 merge iterators, finish multi-file compaction.	2017-01-05 10:01:25 -07:00
Ben Johnson	e3af4b0dad	Refactor iterators.	2017-01-05 10:00:45 -07:00
Ben Johnson	ce9e3181a5	Refactor merge iterators.	2017-01-05 10:00:45 -07:00
Ben Johnson	0294e717a0	Add mm, tag key, tag value, & series iterators.	2017-01-05 10:00:44 -07:00
Ben Johnson	2bfafaed76	tsi1 log compaction	2017-01-05 10:00:44 -07:00
Ben Johnson	afce53e81b	Rebase fixes.	2017-01-05 10:00:44 -07:00
Ben Johnson	992e651588	Add tsi1.Log.	2017-01-05 10:00:44 -07:00
Ben Johnson	2a81351992	Implement tsdb.Index interface on tsi1.Index.	2017-01-05 10:00:43 -07:00
Edd Robinson	ebc92ca04f	Fix overflow issues	2017-01-05 09:59:12 -07:00
Edd Robinson	149b1cef1d	Fix 32bit overflow; limit capacity	2017-01-05 09:59:10 -07:00
Edd Robinson	9ed6040265	Tidy up	2017-01-05 09:58:37 -07:00
Edd Robinson	2a5c865b44	Use xxhash	2017-01-05 09:57:35 -07:00
Edd Robinson	2d9bd09784	Use []byte where possible in Index	2017-01-05 09:57:34 -07:00
Edd Robinson	4b1ef68dc9	Move series and measurement stats to store	2017-01-05 09:54:05 -07:00
Edd Robinson	bd8dd9a291	Sketches working	2017-01-05 09:54:04 -07:00
Edd Robinson	d19fbf5ab4	Wire in HLL estimator	2017-01-05 09:54:03 -07:00
Edd Robinson	2b8efefef4	Initial index interface	2017-01-05 09:51:43 -07:00
Edd Robinson	05bc4dec00	Refactor	2017-01-05 09:50:23 -07:00
Edd Robinson	c535e3899a	Remove in-memory index from Shard and Store	2017-01-05 09:47:09 -07:00
Ben Johnson	57d0556174	Fix 32-bit issues.	2017-01-05 09:34:37 -07:00

... 5 6 7 8 9 ...

1413 Commits (master)