influxdb

Commit Graph

Author	SHA1	Message	Date
Ben Johnson	3900c948a2	Fix requested changes.	2018-01-03 10:04:12 -07:00
Edd Robinson	f9ea54198f	rename series directory	2018-01-03 15:44:58 +00:00
Ben Johnson	52630e69d7	Integrate SeriesFileCompactor	2018-01-02 12:20:03 -07:00
Ben Johnson	56980b0d24	Segment series file	2017-12-29 11:57:45 -07:00
Stuart Carnie	455013a486	updates per PR review comments	2017-12-29 07:58:52 -07:00
Stuart Carnie	5dfe3b2645	inmem startup improvments * only call ParseTags when necessary * remove dependency on inmem.Series in tsdb test package * Measurement and Series are no longer exported. Their use is restricted to the inmem package * improve Measurement and Series types by exporting immutable fields and removing unnecessary APIs and locks Reduced startup time from 28s to 17s. Overall improvement including #9162 reduces startup from 46s to 17s for 1MM series across 14 shards.	2017-12-29 07:58:52 -07:00
Edd Robinson	bde66f19bc	Adjust series file size and partitions	2017-12-18 13:17:42 +00:00
Edd Robinson	38af43d5eb	Fix engine test races	2017-12-15 23:19:18 +00:00
Edd Robinson	42ba4831ba	Update Digest test	2017-12-15 18:45:20 +00:00
Edd Robinson	c476a0b4a1	Merge branch 'master' into er-tsi-index-part	2017-12-15 18:31:24 +00:00
Edd Robinson	3bfe525705	Add 32-bit support to series file This commit ensures that the series file should work appropriately on 32-bit architecturs. It does this by reducing the maximum size of a series file to 512MB on 32-bit systems, which should be fully addressable. It further updates tests so that the series file size can be reduced further when running many tests in parallel on 32-bit architectures.	2017-12-15 15:47:26 +00:00
Edd Robinson	59afd8cc90	Return to original DELETE/DROP SERIES semantics Since possibly v0.9 DELETE SERIES has had the unwanted side effect of removing series from the index when the last traces of series data are removed from TSM. This occurred because the inmem index was rebuilt on startup, and if there was no TSM data for a series then there could be not series to add to the index. This commit returns to the original (documented) DROP/DETETE SERIES behaviour. As such, when issuing DROP SERIES all instances of matching series will be removed from both the TSM engine and the index. When issuing DELETE SERIES only TSM data will be removed. It is up to the operator to remove series from the index. NB, this commit does not address how to remove series data from the series file when a shard rolls over.	2017-12-15 00:02:06 +00:00
Edd Robinson	9e3b17fd09	Ensure deleted series are not returned via iterators	2017-12-14 21:29:35 +00:00
David Norton	253ea7cc5e	feat #9212 : fix file in use bug on Windows	2017-12-13 09:29:07 -05:00
David Norton	4e13248d85	feat #9212 : add ability to generate shard digests	2017-12-13 09:28:34 -05:00
Edd Robinson	0844f20dc4	Engine tests	2017-12-12 21:25:35 +00:00
Edd Robinson	7d13bf3262	merge master	2017-12-08 17:21:58 +00:00
Edd Robinson	f6835632e7	Merge master into branch	2017-12-08 17:11:07 +00:00
Edd Robinson	3318c94a2f	Clean up 🛁:	2017-12-08 11:38:53 +00:00
Ben Johnson	0e0e7cfc08	Fix tests.	2017-12-07 09:59:39 -07:00
Adam	a0b2195d6b	Pulled in backup-relevant code for review (#9193 ) for issue #8879	2017-12-07 11:35:20 -05:00
Jason Wilder	909a2fb6cc	Fix deletes removing index for invalid time ranges If a delete for a time that does not exist was run, we would not remove the series key from the slice of series to remove from the index. This could be triggered by running somethin like "delete from cpu where time = 0" and if there was no data at time 0, the series would still be removed from the index.	2017-11-30 15:01:01 -07:00
Jason Wilder	8633e38549	Fix removing series from index The loop to check if a series still exists in a TSM file was wrong in that it 1) exited early after one iteration and 2) had an off by one error that causes the wrong series to be marked as existing. This fixes both of these cases which can cause the index to become inconsistent with the data store on disk.	2017-11-29 10:45:04 -07:00
Jason Wilder	dd1c030815	Remove limit count param on fields It's not used anymore.	2017-11-22 11:17:34 -07:00
Jason Wilder	50b6ace75f	Fix wait reused while disabling compactions	2017-11-20 14:55:47 -07:00
Jason Wilder	97e0d496a6	Add capability to force a full compaction This adds the capability to the engine to force a full compaction to be scheduled. When called, it snapshots any data in the cache, aborts running compactions and prevents level plans from returning level plans.	2017-11-15 07:14:27 -07:00
Stuart Carnie	2e04e871c9	fix descending queries * did not handle cached values correctly * sort shards by time in either ascending or descending order depending on the RPC request ordering to ensure they are traversed in the correct order.	2017-11-13 17:14:36 -08:00
Jason Wilder	aee395d3bd	Make DeleteSeriesRange take SeriesIterator	2017-11-13 09:02:10 -07:00
Stuart Carnie	f3d45ba301	influxdata/influxdb/influxql -> influxdata/influxql	2017-10-30 14:40:26 -07:00
Stuart Carnie	e9313876ab	EXPLAIN ANALYZE * Introduces EXPLAIN ANALYZE command, which produces a detailed tree of operations used to execute the query. introduce context.Context to APIs metrics package * create groups of named measurements * safe for concurrent access tracing package EXPLAIN ANALYZE implementation for OSS Serialize EXPLAIN ANALYZE traces from remote nodes use context.Background for tests group with other stdlib packages additional documentation and remove unused API use influxdb/pkg/testing/assert remove testify reference	2017-10-20 08:01:37 -07:00
Edd Robinson	9be7c5aaa6	Run relevant engine tests on both indexes	2017-08-23 10:47:01 +01:00
Jonathan A. Sternberg	9a2357c2c0	Separate the query engine into a separate package This change provides a clear separation between the query engine mechanics and the query language so that the language can be parsed and dealt with separate from the query engine itself.	2017-08-16 13:38:43 -05:00
David Norton	1d8d739418	fix #8677 : check for snapshot size == 0	2017-08-16 09:43:56 -04:00
Jason Wilder	778000435a	Conver all keys from string to []byte in TSM engine This switches all the interfaces that take string series key to take a []byte. This eliminates many small allocations where we convert between to two repeatedly. Eventually, this change should propogate futher up the stack.	2017-07-28 11:00:50 -06:00
Stuart Carnie	46796d932f	add database to index, engine and shard; call AuthorizeSeriesRead	2017-05-26 13:21:50 -07:00
Jason Wilder	2cac46ebbc	Convert usage of strings to []byte Measurement name and field were converted between []byte and string repetively causing lots of garbage. This switches the code to use []byte in the write path.	2017-05-12 14:05:19 -06:00
Joe LeGasse	087d9f4670	tsm: fixed test to not require sorted backup tarball	2017-05-11 12:00:19 -04:00
Jason Wilder	f87fd7c7ed	Stop background compaction goroutines when shard is cold Each shard has a number of goroutines for compacting different levels of TSM files. When a shard goes cold and is fully compacted, these goroutines are still running. This change will stop background shard goroutines when the shard goes cold and start them back up if new writes arrive.	2017-05-03 16:31:57 -06:00
Jason Wilder	3d1c0cd981	Don't return compaction plans for files already part of a plan The compactor prevents the same file from being compacted by different compaction runs, but it can result in warning errors in the logs that are confusing. This adds compaction plan tracking to the planner so that files are only part of one plan at a given time.	2017-05-03 16:31:57 -06:00
Ben Johnson	afe41f1c80	Fix tsm1/tsi1 broken tests.	2017-03-21 12:21:48 -06:00
Ben Johnson	358b1e0b05	Merge remote-tracking branch 'upstream/master' into tsi	2017-03-15 10:13:32 -06:00
Jason Wilder	a024003f2c	Merge branch '1.2' into jw-merge-12	2017-02-22 12:13:29 -07:00
Ben Johnson	78a9bb2527	Remove Tags.shouldCopy, replace with forceCopy on series creation. Previously, tags had a `shouldCopy` flag to indicate if those tags referenced an underlying buffer and should be copied to allow GC. Unfortunately, this prevented tags from being copied that were created and referenced the mmap which caused segfaults. This change removes the `shouldCopy` flag and replaces it with a `forceCopy` argument in `CreateSeriesIfNotExists()`. This allows the write path to indicate that tags must be cloned on insert.	2017-02-21 11:13:35 -07:00
Ben Johnson	d91e6eabac	Add max-values-per-tag to inmem index.	2017-02-06 11:14:13 -07:00
Ben Johnson	047c21f4d9	Merge remote-tracking branch 'upstream/master' into tsi	2017-01-24 09:28:58 -07:00
Edd Robinson	feb7a2842c	Use unbuffered error channels in tests	2017-01-17 10:53:15 -08:00
Edd Robinson	292b30b82b	Fix subtle bugs and remove dead code from tsdb	2017-01-17 09:47:34 -08:00
Jonathan A. Sternberg	d7c8c7ca4f	Support subquery execution in the query language This adds query syntax support for subqueries and adds support to the query engine to execute queries on subqueries. Subqueries act as a source for another query. It is the equivalent of writing the results of a query to a temporary database, executing a query on that temporary database, and then deleting the database (except this is all performed in-memory). The syntax is like this: SELECT sum(derivative) FROM (SELECT derivative(mean(value)) FROM cpu GROUP BY *) This will execute derivative and then sum the result of those derivatives. Another example: SELECT max(min) FROM (SELECT min(value) FROM cpu GROUP BY host) This would let you find the maximum minimum value of each host. There is complete freedom to mix subqueries with auxiliary fields. The only caveat is that the following two queries: SELECT mean(value) FROM cpu SELECT mean(value) FROM (SELECT value FROM cpu) Have different performance characteristics. The first will calculate `mean(value)` at the shard level and will be faster, especially when it comes to clustered setups. The second will process the mean at the top level and will not include that optimization.	2017-01-07 13:00:48 -06:00
Ben Johnson	2b3cd415e2	Fixing rebase.	2017-01-06 09:52:16 -07:00
Edd Robinson	0f9b2bfe6a	Fix tests	2017-01-05 10:16:15 -07:00
Ben Johnson	cb93f10120	Remove per-shard in-memory index.	2017-01-05 10:11:09 -07:00
Ben Johnson	409b0165f5	shared in-memory index	2017-01-05 10:09:57 -07:00
Ben Johnson	a812502ea3	reintegrating in-memory index	2017-01-05 10:07:35 -07:00
Ben Johnson	2a81351992	Implement tsdb.Index interface on tsi1.Index.	2017-01-05 10:00:43 -07:00
Edd Robinson	149b1cef1d	Fix 32bit overflow; limit capacity	2017-01-05 09:59:10 -07:00
Edd Robinson	d19fbf5ab4	Wire in HLL estimator	2017-01-05 09:54:03 -07:00
Edd Robinson	2b8efefef4	Initial index interface	2017-01-05 09:51:43 -07:00
Edd Robinson	05bc4dec00	Refactor	2017-01-05 09:50:23 -07:00
Edd Robinson	c535e3899a	Remove in-memory index from Shard and Store	2017-01-05 09:47:09 -07:00
Edd Robinson	66edb32182	Sharded Cache using a hash ring	2016-12-14 18:23:36 +00:00
Edd Robinson	d3e6d4e7ca	Add benchmarks	2016-12-14 18:21:50 +00:00
Jason Wilder	e8a28cfbab	Expose Shard.LastModified This returns the LastModified time of the shard. The LastModified time is the wall time when a change to the shards state occurred. It uses the WAL or FileStore to determine the max mod time.	2016-11-23 10:04:07 -07:00
Jonathan A. Sternberg	a515aeda39	Optimize first/last when no group by interval is present The `first()` and `last()` functions response rate would increase linear to the number of points even though it seems like it shouldn't. This optimization greatly reduces the amount of time to return a response when no `GROUP BY time(...)` clause is present in a query.	2016-10-25 09:57:31 -05:00
Jonathan A. Sternberg	3681bc8a43	Filter out series within shards that do not have data for that series Previously, we would return a full tag set for every shard and the tag set would include all series that existed in the database index including series that didn't physically exist within that shard. This led to the tag sets returned being incredibly huge when we had high cardinality but sparse data. Since the data was sparse, it was unexpected that it would cause such a large strain on the system by most people. Now we filter out the series ids that are not assigned to the current shard when computing a tag set for that shard. This lowers the memory usage for high cardinality sparse data drastically and allows queries on those to complete successfully. This does not resolve issues for high cardinality data in every shard that is also spread out over a long series of time. That situation isn't nearly as common as the above situation though.	2016-10-20 14:15:34 -05:00
Jason Wilder	7f96d78b79	Make encoder re-usable This allows encoders to be re-used and maintained in a pool to avoid allocating new ones on every compactions and write of an encoded block. The pool used is not a sync.Pool to ensure that the encoders will not be garbage collected.	2016-09-26 12:19:15 -06:00
Jason Wilder	190537a557	Fix DeleteSeries when multiple fields exists The logic for determining whether a series key was already in the the set of TSM series was too restrictive. It allowed only the first field of a series to be added leaving all the remaing fields.	2016-08-31 20:35:35 -06:00
Ben Johnson	8aa224b22d	reduce memory allocations in index This commit changes the index to point to index data in the shards instead of keeping it in-memory on the heap.	2016-08-16 14:09:00 -06:00
Jason Wilder	0264966f5c	Add index optimize planning step For larger datasets, it's possible for shards to get into a state where many large, dense TSM files exist. While the shard is still hot for writes, full compactions will skip these files since they are already fairly optimized and full compactions are expensive. If the write volume is large enough, the shard can accumulate lots of these files. When a file is in this state, it's index can contain every series which causes startup times to increase since each file must parse the full set of series keys for every file. If the number of series is high, the index can be quite large causing large amount of disk IO at startup. To fix this, a optmize compaction is run when a full compaction planning step decides there is nothing to do. The optimize compaction combines and spreads the data and series keys across all files resulting in each file containing the full series data for that shard and a subset of the total set of keys in the shard. This allows a shard to only store a series key once in the shard reducing storage size as well allows a shard to only load each key once at startup.	2016-07-14 11:32:36 -06:00
rw	92e7fec5cf	Benchmarks to count allocs in WritePoints.	2016-05-26 17:13:14 -07:00
Jonathan A. Sternberg	23f6a706bb	Support cast syntax for selecting a specific type Casting syntax is done with the PostgreSQL syntax `field1::float` to specify which type should be used when selecting a field. You can also do `field1::field` or `tag1::tag` to specify that a field or tag should be selected. This makes it possible to select a tag when a field key and a tag key conflict with each other in a measurement. It also means it's possible to choose a field with a specific type if multiple shards disagree. If no types are given, the same ordering for how a type is chosen is used to determine which type to return. The FieldDimensions method has been updated to return the data type for the fields that get returned. The SeriesKeys function has also been removed since it is no longer needed. SeriesKeys was originally used for the fill iterator, but then expanded to be used by auxiliary iterators for determining the channel iterator types. The fill iterator doesn't need it anymore and the auxiliary types are better served by FieldDimensions implementing that functionality, so SeriesKeys is no longer needed. Fixes #6519.	2016-05-16 12:08:29 -04:00
Jason Wilder	e0304ae3d5	Fix shards not getting assigned to series on restart Also, simplifies the LoadMetaDataIndex func to not require a *Shard	2016-05-02 11:36:05 -06:00
Jonathan A. Sternberg	7ec2a991d5	Modify all of the iterators to allow returning an error on Next() This also switches the remaining iterators to be lazy so they can return errors properly. They needed to be converted to lazy initialization anyway, which has the side effect of making it much easier for us to propagate the underlying error during initialization. Updated the Emitter to return errors when it cannot read properly from the iterators.	2016-04-18 11:17:55 -04:00
Jonathan A. Sternberg	bd5fdd797d	Propagate the limit option to the low level iterators When a GROUP BY or multiple sources are used, the top level limit iterator requires reading the entire iterator stream so it can find all of the tag groups it needs to return. For large data series, this ends up with the limit iterator discarding a lot of output. This change adds a new lower level limit iterator on each series itself so that there are fewer data points that have to be thrown away by the top level iterator. Fixes #5553.	2016-04-15 18:23:54 -04:00
Ben Johnson	525e22c92b	tsm1 query engine alloc reduction This commit makes a number of performance improvements to reduce allocations during query execution. Several objects and buffers are now reused across the components to avoid allocations. Previously a simple `count(value)` query across 1M points would require 26,000+ allocations. After the changes in this commit that number has been reduced to 88.	2016-04-11 14:50:59 -06:00
Jason Wilder	3f4c5a5585	Fix race on measurementFields Both Shard and Engine had the same reference to the measurementField map, but they each protected it with their own locks. This causes a race when write and queries are occurring because writes can add new fields to the map while queries are reading from it. The fix moves the ownership to the Engine and provides protected accessors to that Shard now users. For the most parts, the access on shard were old dead code. Fixing the measurementFields map race created a new race on the internal fields map. This is now unexported and protected via MeasurementFields exported funcs. Fixes #6188	2016-04-01 18:57:01 -06:00
Mark Rushakoff	fb83374389	Track stats for number of series, measurements Per database: track number of series and measurements Per measurement: track number of series	2016-02-24 08:10:16 -08:00
Ben Johnson	5a0d1ab7c1	rename influxdb/influxdb to influxdata/influxdb This commit changes all the import and URL references from: github.com/influxdb/influxdb to: github.com/influxdata/influxdb	2016-02-10 10:26:18 -07:00
Ben Johnson	b4cb770a7f	refactor aux iterators	2016-02-10 09:40:27 -07:00
Ben Johnson	00806de9b8	refactor query engine	2016-02-10 09:40:25 -07:00
Jason Wilder	24f1bcfd20	Remove Dev prefix from tsm engine/tx	2016-01-10 16:43:36 -07:00
Paul Dix	49d480cb0c	Fix races in backup/restore	2015-12-31 08:42:01 -05:00
Paul Dix	5974d37649	Fix backup test to mock out compaction	2015-12-31 08:15:13 -05:00
Paul Dix	59fbd371fc	Implement backup/restore for TSM. This changes backup and restore to work for TSM. It breaks it for b1 and bz1, but since those are getting removed it's ok. The backup runs against any host that is specified and can backup either the metasstore, a database, specific retention policy, or a specific shard. It can also take incremental backups with the `since` flag, which will only backup TSM files that have been created since that timestamp. The backup is safe to run online. However, for shards that are still hot for writes, they won't be able to create new TSM files while the backup for that single shard runs. If the backup isn't too large and the write throughput isn't too high this shouldn't be a problem since the writes will just go into the WAL cache.	2015-12-30 18:06:50 -05:00
Jason Wilder	a38c95ec85	Update compactions to run concurrently This has a few changes in it (unfortuantely). The main change is to run compactions concurrently. While implementing this, a few query and performance bugs showed up that are also fixed by this commit.	2015-12-23 18:01:11 -07:00
Paul Dix	820b0d31d6	Update TSM to delete from the WAL/cache * Update cache loader to delete entries from cache * Add cache.Delete() * Update delete to look at keys in the Cache in addition to the FileStore * Update cache compaction to never happen if the cache is empty	2015-12-07 14:35:48 -05:00
Paul Dix	1bee7d1512	Update TSM, remove old version, add config * remove rolloverTSMFileSize constant that is no longer used * remove the maxGenerationFileCount since it is no longer a limitation that's necessary with the new compaction scheme. We no longer read WAL segments as part of the compaction so memory is only used as we read in each individual key * remove minFileCount and switch to a user configurable variable * remove the mutex from WALSegmentWriter. There's never more than one open in the WAL at one time and it's not exported through any function so the lock on the WAL should be used. This simplified keeping track of the last write time and removed a bunch of unnecessary locks. * update WALSegmentWriter.Write to take the compressed bytes so that encoding and compression can occur before the call to write (while we don't hold the WAL lock) * remove a bunch of unnecessary locking in WAL.writeToLog * Add check for TSM file magic number and vesion * Remove old tsm, log, and unused cursor code * Remove references to tsm1dev everywhere except in the inspector * Clean up config options for compaction and snapshotting * Remove old TSM configuration options * Update the config.sample.toml with TSM options * Update WAL compact to force if it has been cold for writes for a configurable period of time (1h by default)	2015-12-06 18:50:39 -05:00
Paul Dix	9637446ba9	Merge pull request #4990 from influxdb/pd-loadmetadata-wal Update TSM engine, WAL and encoding	2015-12-04 18:21:47 -05:00
Paul Dix	b0f3dcc8cc	Update TSM metadata loading and write snapshot * Update WriteSnapshot to always call synchronously * Update LoadMetadataIndex to load WAL metadata from the cache	2015-12-04 16:03:17 -05:00
Jason Wilder	c7e37766e7	Avoid repetitive index searches when iterating over cursors First pass at TSM cursor iteration ended up searching the file indexes too frequently and hurt performance. This changes that to search it once and then have the cursor hold onto the block locations to seek to. Doubles the query performance from the first iteration, but still a lot of room for improvement.	2015-12-04 10:02:59 -07:00
Paul Dix	eafb703afc	Update TSM engine, WAL and encoding * Add InfluxQLType to Values to map the TSM type to InfluxQL * Fix bug in WAL where close wouldn't nil out the currentSegment after closing it * Export writeSnapshot to be used in tests, add argument to run it async or not * Update reloadCache to load temporary metadata information in the engine * Update LoadMetadataIndex to use the temp WAL metadata information	2015-12-04 11:09:39 -05:00
Philip O'Toole	2d79d7e35f	Fix descending cache cursor	2015-12-03 14:34:29 -08:00
Jason Wilder	2ad32af7ea	Add desc quey support	2015-12-03 12:39:13 -07:00
Jason Wilder	be59ba3455	Add Prev support to FileStore Allows read the previous block of values given a timestamp and key.	2015-12-03 12:39:12 -07:00
Jason Wilder	6fba01df89	Implement single field TSM queries	2015-12-03 12:35:36 -07:00
Paul Dix	6722e9ff14	Update TSM engine, engine_test, and wal_test * Address jwilder's comments in #4966	2015-12-03 10:49:47 -05:00
Paul Dix	b0fb8a0a27	Update TSM cache, compact, wal, encoding * Update cache to have a single slice of values for a key (removed checkpoints) * Changed compact.Plan to only worry about TSM files. * Updated Plan to not return an error since there was no case in which it would. * Update WAL to not keep stats since they're no longer needed. * Update engine to flush the Cache/WAL to a new TSM file when the min threshold is hit. * Split compact logic between TSM compacts and WAL/Cache writes. * Remove unnecessary merge iterator, wal segment iterator, and other no longer necessary stuff. * Remove the asending bool from the Dedupe method. Values should always be in ascending order. It's up to the cursor to iterate through values based on the direction. Giving the cursor responsibility makes it so we don't need to sort, dedupe or reallocate anything for different query orders. * Updated engine to use its locks to ensure writes and cache flushes don't cause a race. * Update all tests with new signatures. Removed a bunch of tests around TSM rewrites and WAL segment iteration that are no longer necessary.	2015-12-03 08:11:50 -05:00
Philip O'Toole	fc83968e2e	Cache values supports sorting order	2015-12-01 13:24:25 -08:00
Philip O'Toole	3a72e40e3f	Implement descending cursor support	2015-12-01 13:24:25 -08:00
Philip O'Toole	ec4daaccff	Test ascending tsm1dev cursor	2015-12-01 13:24:25 -08:00

1 2 3

149 Commits (master)