Commit Graph

380 Commits (main-2.x)

Author SHA1 Message Date
Ben Johnson 58aed93fe6
Add option for unicode validation. 2018-05-02 11:16:55 -06:00
Jason Wilder 2be2418b89 Add series type validation to Engine
This is the start of per-series validation that occurs in the
Engine write path.  It uses an in-memory radix tree to reduce
memory usage and is re-built on demand the first time a series
is written.
2018-04-30 17:26:23 -06:00
Edd Robinson ba16268f41
Merge pull request #9777 from influxdata/er-index-log
Log information about index version during startup
2018-04-26 10:48:54 +01:00
Jeff Wendling e5dbc18d0b remove bool return param from dataTypeFromModelsFieldType 2018-04-25 09:48:24 -06:00
Edd Robinson 32e195860b Log index type when opening shard 2018-04-25 13:02:09 +01:00
Jeff Wendling 29a62e4f74 Add FieldValidator to allow custom validations on measurements
No appreciable changes in benchmark results. It seems like this
function is less than 4% of cpu time in the write workloads in the
benchmarks at least.
2018-04-23 20:21:27 -06:00
Ben Johnson dbbe9d8467
Merge pull request #9615 from influxdata/bj-check-shard-count-on-series-iterator-master
Remove error for series file when no shards exist
2018-04-20 08:14:24 -06:00
Jason Wilder 97ecf62ffb Return time range from delete predicate func
This moves the time range to delete to be returned by the predicate
func in DeleteSeriesRangeWithPredicate.  It allows for a single delete
to delete different ranges of times per series instead of a single
range of time for all series.
2018-04-09 20:01:33 -06:00
Jacob Marble 470ee7f176 Add ability to delete many series with predicate 2018-03-28 08:32:18 -07:00
Stuart Carnie 2cc1f5137e support for tenant+bucket
NOTE: to match storage service, values for database and rp are
hard-coded to `db` and `rp` respectively
2018-03-23 12:26:55 -07:00
Stuart Carnie aa61359cc7 Storage RPC API improvements. See PR for details
* reduce # allocations (115M -> 22M)
* reduce size allocations (53GB -> 1.3GB)
* reduce RPC query time (45s -> 12.9s)
2018-03-21 13:46:09 -07:00
Ben Johnson da8669f3e2
Remove error for series file when no shards exist 2018-03-21 14:41:11 -06:00
Jonathan A. Sternberg f8d60a881d Refactor the math engine to compile the query and use eval
This change makes it so that we simplify the math engine so it doesn't
use a complicated set of nested iterators. That way, we have to change
math in one fewer place.

It also greatly simplifies the query engine as now we can create the
necessary iterators, join them by time, name, and tags, and then use the
cursor interface to read them and use eval to compute the result. It
makes it so the auxiliary iterators and all of their complexity can be
removed.

This also makes use of the new eval functionality that was recently
added to the influxql package.

No math functions have been added, but the scaffolding has been included
so things like trigonometry functions are just a single commit away.

This also introduces a small breaking change. Because of the call
optimization, it is now possible to use the same selector multiple times
as a selector. So if you do this:

    SELECT max(value) * 2, max(value) / 2 FROM cpu

This will now return the timestamp of the max value rather than zero
since this query is considered to have only a single selector rather
than multiple separate selectors. If any aspect of the selector is
different, such as different selector functions or different arguments,
it will consider the selectors to be aggregates like the old behavior.
2018-03-19 15:01:15 -05:00
Ben Johnson f6fdba2590
Allow SHOW SERIES kill. 2018-03-15 11:22:34 -06:00
Stuart Carnie 6cf6ae7af4 Use combined IndexSet when executing meta queries
* removed unused fieldset field
2018-03-15 09:59:11 -07:00
Edd Robinson c1e1412dae Don't panic when checking for field 2018-03-12 15:25:20 +00:00
Edd Robinson 544329380f
Add empty series sketches back to tsi1 index
This commit adds initial empty sketches back to the tsi1 index, as well
as ensuring that ephemeral sketches in the index `LogFile` are updated
accordingly.

The commit also adds a test that verifies that the merged sketches at
the store level produce the correct results under writes, deletions and
re-opening of the store.

This commit does not provide working sketches for post-compaction on the
tsi1 index.
2018-02-07 14:52:13 -07:00
Stuart Carnie a058d204d8 remove redundant closing channel 2018-02-06 12:08:58 -07:00
Edd Robinson 42c3adeffc simplify packages under tsdb 2018-01-21 09:41:27 -08:00
Edd Robinson 4ccb6ada69 Remove unused code/cleanup tsdb package 2018-01-20 14:06:15 +00:00
Jason Wilder 1c8676b4a3 Rebuild corrupted fields index when necessary
If the fields.idx was corrupted in someway, it would cause the shard
to fail to load.  Deleting the file will allow it to be rebuilt.

This change handles this automatically so it's rebuilt if necessary
without user intervention.
2018-01-16 11:31:07 -07:00
Edd Robinson ceb3abd118 Remove series when shard rolls over
Series should only be removed from the series file when they're no
longer present in any shard. This commit ensures that during a shard
rollover, the series local to the shard are checked against all other
series in the database.

Series that are no longer present in any other shards' bitsets, are then
marked as deleted in the series file.
2018-01-16 15:58:20 +00:00
Edd Robinson e902998f4e All closes are now fast 2018-01-16 14:56:54 +00:00
Edd Robinson d890f29fcb Remove redundant index methods
Now that each shard-local index is maintaining a bitset of series ids,
tracking the series present in the local shard's tsm engine, there is no
need to track shards in the `inmem` index.

This commit removes the methods associated with tracking those
series/shard relationships.
2018-01-16 14:56:54 +00:00
Edd Robinson 286c8f4c09 Return to original DELETE/DROP SERIES semantics
This reverts commit 59afd8cc90.
2018-01-15 12:00:30 +00:00
Edd Robinson e610e7c21d Track undeleted series IDs per-shard with inmem
This commit adds a bitset into each shard's in-memory index, to be used to
track undeleted series ids. Currently tsi1 support is not implemented.

When new series are added to the shard, the series id is added
to the bitset. When series are deleted from the shard, the series
ids are removed from the bitset.

Becasue each shard shares the same inmem index reference, the bitset
is stored in the `ShardIndex`, which is local to each shard, and then
different references are passed into the shared `Index` object, depending
on which shard is writing the series.
2018-01-11 01:01:54 +00:00
David Norton 1c452d83cb fix #9286: return digest size 2018-01-08 13:15:14 -05:00
Stuart Carnie c986cac76e improve performance when writes exceed max tag values or series
```
 benchmark                                                                  old ns/op     new ns/op     delta
 BenchmarkShardIndex_CreateSeriesListIfNotExists_MaxValuesExceeded-8        6175374       2714158       -56.05%
 BenchmarkShardIndex_CreateSeriesListIfNotExists_MaxValuesNotExceeded-8     344502        326312        -5.28%
 BenchmarkShardIndex_CreateSeriesListIfNotExists_NoMaxValues-8              346734        329961        -4.84%
 BenchmarkShardIndex_CreateSeriesListIfNotExists_MaxSeriesExceeded-8        2414945       1996223       -17.34%

 benchmark                                                                  old allocs     new allocs     delta
 BenchmarkShardIndex_CreateSeriesListIfNotExists_MaxValuesExceeded-8        45377          128            -99.72%
 BenchmarkShardIndex_CreateSeriesListIfNotExists_MaxValuesNotExceeded-8     33             20             -39.39%
 BenchmarkShardIndex_CreateSeriesListIfNotExists_NoMaxValues-8              33             20             -39.39%
 BenchmarkShardIndex_CreateSeriesListIfNotExists_MaxSeriesExceeded-8        15219          71             -99.53%

 benchmark                                                                  old bytes     new bytes     delta
 BenchmarkShardIndex_CreateSeriesListIfNotExists_MaxValuesExceeded-8        1354539       480114        -64.56%
 BenchmarkShardIndex_CreateSeriesListIfNotExists_MaxValuesNotExceeded-8     2101          1261          -39.98%
 BenchmarkShardIndex_CreateSeriesListIfNotExists_NoMaxValues-8              2100          1261          -39.95%
 BenchmarkShardIndex_CreateSeriesListIfNotExists_MaxSeriesExceeded-8        707247        477737        -32.45%
 ```
2017-12-27 17:27:03 -07:00
Edd Robinson c476a0b4a1 Merge branch 'master' into er-tsi-index-part 2017-12-15 18:31:24 +00:00
Edd Robinson 73fcf894b6 Fix shard races when accessing index 2017-12-15 18:19:55 +00:00
Edd Robinson 59afd8cc90 Return to original DELETE/DROP SERIES semantics
Since possibly v0.9 DELETE SERIES has had the unwanted side effect of
removing series from the index when the last traces of series data are
removed from TSM. This occurred because the inmem index was rebuilt on
startup, and if there was no TSM data for a series then there could be
not series to add to the index.

This commit returns to the original (documented) DROP/DETETE SERIES
behaviour. As such, when issuing DROP SERIES all instances of matching
series will be removed from both the TSM engine and the index. When
issuing DELETE SERIES only TSM data will be removed.

It is up to the operator to remove series from the index.

NB, this commit does not address how to remove series data from the
series file when a shard rolls over.
2017-12-15 00:02:06 +00:00
David Norton 4e13248d85 feat #9212: add ability to generate shard digests 2017-12-13 09:28:34 -05:00
Edd Robinson f1bcc97e89 Fix auth tests 2017-12-12 21:25:35 +00:00
Edd Robinson 7d13bf3262 merge master 2017-12-08 17:21:58 +00:00
Edd Robinson f6835632e7 Merge master into branch 2017-12-08 17:11:07 +00:00
Adam a0b2195d6b
Pulled in backup-relevant code for review (#9193)
for issue #8879
2017-12-07 11:35:20 -05:00
Ben Johnson 493c1ed0d1
inmem tests passing. 2017-12-05 10:49:58 -07:00
Ben Johnson ca09f18e65
intermediate: tsdb compile 2017-11-29 11:20:18 -07:00
Edd Robinson e6b7140d65
Merge pull request #9143 from influxdata/er-show-tag-key-perf
SHOW TAG KEYS with high cardinality and many shards
2017-11-27 15:04:15 +00:00
Jason Wilder 279f82a72e Remove dead code 2017-11-22 11:17:34 -07:00
Jason Wilder cacb55fac4 Fix typos 2017-11-22 11:17:34 -07:00
Jason Wilder b674311830 Add magic number to fields index file 2017-11-22 11:17:34 -07:00
Jason Wilder dd1c030815 Remove limit count param on fields
It's not used anymore.
2017-11-22 11:17:34 -07:00
Jason Wilder c14b0e81b7 Save field types to speed up startup
This persists the field types in a shard to avoid having to scan
all the TSM files at startup.
2017-11-22 11:17:34 -07:00
Edd Robinson 68dd5e27c8 Improve performance of TagKeys 2017-11-21 17:16:47 +00:00
Edd Robinson 6851db3fc9 Add FGA support to SHOW MEASUREMENTS 2017-11-17 11:06:43 +00:00
Ben Johnson ede3fcf98e
intermediate 2017-11-15 16:09:25 -07:00
Jason Wilder 97e0d496a6 Add capability to force a full compaction
This adds the capability to the engine to force a full compaction
to be scheduled.  When called, it snapshots any data in the cache,
aborts running compactions and prevents level plans from returning
level plans.
2017-11-15 07:14:27 -07:00
Ben Johnson ba4c9e0317
Merge remote-tracking branch 'upstream/master' into er-tsi-index-part 2017-11-14 16:14:13 -07:00
Jason Wilder aee395d3bd Make DeleteSeriesRange take SeriesIterator 2017-11-13 09:02:10 -07:00
Jason Wilder f893beb6d8 Use MeasurementSeriesKeysByExprIterator for deletes 2017-11-13 09:02:10 -07:00
Jonathan A. Sternberg 0b7c56bcd8 Update the zap logger dependency
The previous sha was taken from a revision on a devel branch that I
thought would continue staying in the tree after it was merged. That
revision was rebased away and the API was changed for the logger.

This updates the usage of the logger and adds a simple package for
constructing the base logger.

The 1.0 version of zap changed the format of the default console logger
so this change moves over to this new logger instead of attempting to
retain backwards compatibility with the old format.
2017-11-10 16:27:16 -06:00
Ben Johnson 0ffd94a37a
Fix rebase 2017-11-09 09:25:10 -07:00
Ben Johnson 9ad2b53881
intermediate 2017-11-09 09:18:33 -07:00
Edd Robinson 98d584b63f Use index for SHOW X meta queries
When a meta query does not include a time component then it can be
answered exclusively by the index. This should result in a much faster
query execution that if the TSM engine was engaged.

This commit rewrites the following queries such that they make use
of the index where no time component is present:

  - SHOW MEASUREMENTS
  - SHOW SERIES
  - SHOW TAG KEYS
  - SHOW FIELD KEYS
2017-11-06 19:15:00 +00:00
Stuart Carnie f3d45ba301 influxdata/influxdb/influxql -> influxdata/influxql 2017-10-30 14:40:26 -07:00
Stuart Carnie c39f1ad748 Add batch cursor support to tsdb and tsm1
* batch cursors return slices of timestamps and values to reduce call
  overhead. Significantly improved iteration.
* added CreateCursor API to Shard, Engine
* moved build*Cursor to code gen
2017-10-25 13:38:07 -07:00
Stuart Carnie b7579340fe return query.ErrQueryInterrupted for read on InterruptCh 2017-10-24 14:10:28 -07:00
Stuart Carnie e9313876ab EXPLAIN ANALYZE
* Introduces EXPLAIN ANALYZE command, which
  produces a detailed tree of operations used to
  execute the query.

introduce context.Context to APIs

metrics package

* create groups of named measurements
* safe for concurrent access

tracing package

EXPLAIN ANALYZE implementation for OSS

Serialize EXPLAIN ANALYZE traces from remote nodes

use context.Background for tests

group with other stdlib packages

additional documentation and remove unused API

use influxdb/pkg/testing/assert

remove testify reference
2017-10-20 08:01:37 -07:00
Joe LeGasse 1443b22379 auth: add series auth to 'show tag values' 2017-09-27 20:01:18 -04:00
Edd Robinson 2def219f09 Refactor Shard to further protect Engine 2017-09-25 17:43:30 +01:00
Edd Robinson 4a67f92acc Prevent store from directly accessing Shard's engine 2017-09-25 17:43:01 +01:00
Edd Robinson 8e9cabbb9c Fix race in TagValues when reaching into engine 2017-09-25 17:43:01 +01:00
Edd Robinson 7739ff749a Ensure engine protected by shard mutex 2017-09-25 17:42:30 +01:00
Jason Wilder 940da04a34 Merge pull request #8829 from influxdata/jw-mmap
Release mmap pages when shard is cold
2017-09-18 12:08:37 -06:00
Jason Wilder 31646aae3a Release mmap pages when shard is cold
This instructs the kernel that it can release memory used by mmap'd
TSM files when they are not actively being used.  It the mappings are
use, the kernel will fault the pages back in.  On linux, this causes
RES memory to drop immediately when run.
2017-09-18 11:51:51 -06:00
Edd Robinson e39de3e427 Merge pull request #8782 from oiooj/pr-shard-fix
Correctly check if the Shard is ready for queries or writes
2017-09-18 18:17:19 +01:00
Jonathan A. Sternberg 2228b91b0d Unsigned data type parsing and prioritization 2017-09-14 12:28:13 -05:00
Stuart Carnie 4a6114028c exported UnloadIndex checks for ready state 2017-09-05 11:22:13 -07:00
kun 8a283e248c Correctly check if the Shard is ready for queries or writes 2017-09-03 15:14:58 +08:00
Jonathan A. Sternberg 091ea5f9a5 Merge pull request #8776 from influxdata/js-explain-plan
Initial implementation of explain plan
2017-09-01 16:19:37 -05:00
Jonathan A. Sternberg 50d404e690 Initial implementation of explain plan
It prints the statistics of each iterator that will access the storage
engine. For each access of the storage engine, it will print the number
of shards that will potentially be accessed, the number of files that
may be accessed, the number of series that will be created, the number
of blocks, and the size of those blocks.
2017-09-01 09:01:10 -05:00
kun 5d5225e77d Fix panic when engine closed in a shard 2017-08-29 17:22:45 +08:00
Jonathan A. Sternberg 9a2357c2c0 Separate the query engine into a separate package
This change provides a clear separation between the query engine
mechanics and the query language so that the language can be parsed and
dealt with separate from the query engine itself.
2017-08-16 13:38:43 -05:00
Ben Johnson 60ab1282ea
Refactor system iterators.
Previously pseudo iterators could be created for meta data such
as series, measurement, and tag data. These iterators were created
at a higher level and lacked a lot of the power of the query engine.

This commit moves system iterators down to the series level and
supports the following:

	- _name
	- _seriesKey
	- _tagKey
	- _tagValue
	- _fieldKey

These can be used as normal fields such as:

	SELECT _seriesKey FROM cpu

This will return all the series keys for `cpu`.
2017-08-16 09:27:29 -06:00
Ben Johnson c9b5d60753
Parse SHOW CARDINALITY. 2017-08-16 09:27:15 -06:00
Ben Johnson 06bc3b6fbf
TSI Index Migration 2017-08-15 11:40:24 -06:00
Stuart Carnie eec80692c4 Taught tsm1 storage engine how to read and write uint64 values
* introduced UnsignedValue type
  * leveraged existing int64 compression algorithms (RLE, Simple 8B)
* tsm and WAL can read and write UnsignedValue
* compaction is aware of UnsignedValue
* unsigned support to model, cursors and write points

NOTE: there is no support to create unsigned points, as the line
protocol has not been modified.
2017-07-24 09:03:22 -07:00
Jason Wilder 77afe50f7e Fix panic in ForEachMeasurementTagKey
If a shard was closed, ForEachMeasurementTagKey and TagKeyCardinality
would panic because the engine was nil.
2017-06-13 12:04:32 -06:00
Stuart Carnie 46796d932f add database to index, engine and shard; call AuthorizeSeriesRead 2017-05-26 13:21:50 -07:00
Ben Johnson 24446a0297
Implement zap logging in TSI. 2017-05-25 08:57:50 -06:00
Stuart Carnie 5c5bea2baa move Measurement and Series to inmem package 2017-05-19 08:17:09 -07:00
Jason Wilder 9445ccbad3 Expose shard meta info on Shard 2017-05-16 11:18:02 -06:00
Jason Wilder 2cac46ebbc Convert usage of strings to []byte
Measurement name and field were converted between []byte and string
repetively causing lots of garbage.  This switches the code to use
[]byte in the write path.
2017-05-12 14:05:19 -06:00
Jason Wilder 00bdf62b83 Make shard is ready before returning index type
Shard can be created before they are opened and not have an index
setup yet.  This can cause a panic if IndexType is called.
2017-05-08 12:48:35 -06:00
Jason Wilder 041262af0e Fix race in shard
engine was accessed outside of an RLock which can cause a race when
montitoring goroutines access the shard while it's closed/closing.
2017-05-08 12:37:18 -06:00
Jason Wilder fc34d30038 Uses SeriesN instead of copying sketches
Avoids some extra allocations.
2017-05-04 10:12:38 -06:00
Jason Wilder 88848a9426 Remove per shard monitor goroutine
The monitor goroutine ran for each shard and updated disk stats
as well as logged cardinality warnings.  This goroutine has been
removed by making the disks stats more lightweight and callable
direclty from Statisics and move the logging to the tsdb.Store.  The
latter allows one goroutine to handle all shards.
2017-05-03 16:31:57 -06:00
Jason Wilder f87fd7c7ed Stop background compaction goroutines when shard is cold
Each shard has a number of goroutines for compacting different levels
of TSM files.  When a shard goes cold and is fully compacted, these
goroutines are still running.

This change will stop background shard goroutines when the shard goes
cold and start them back up if new writes arrive.
2017-05-03 16:31:57 -06:00
Jason Wilder a76146e34a Add Store.Import capability
This allows the contents of a backup to be imported into a shard without
requiring the whole shard to be replaced.
2017-04-28 13:30:46 -06:00
Stuart Carnie b2d2976466 update reason messages 2017-04-28 11:21:57 -07:00
Stuart Carnie 8097e817f6 prefix partial write errors with `partial write:`
NOTE: parser errors (via http API) are also transformed into
PartialWriteError
2017-04-28 11:00:14 -07:00
Jason Wilder 0e715b5b74 Reduce lock contention on MeasurementFields 2017-04-20 12:28:42 -06:00
Jason Wilder 5c51ae7319 Merge branch '1.2' into jw-merge-123 2017-04-14 14:36:54 -06:00
Jason Wilder ff1270dfeb Fix dropping fields created data corruption
The Point is intended to be immutable after being parsed since it
is shared by several goroutines.  When dropping a field (e.g. time),
corrupted data can result if one goroutine is delete the field
while another is marshaling the underlying byte slices.

To avoid this, the shard will just skip invalid fields and series
instead of trying to mutate them by deleting them.
2017-04-07 12:58:42 -06:00
Jason Wilder 7ac3c9a26f Remove unused cardinality func 2017-04-03 11:24:55 -06:00
Edd Robinson fddaff2cc8 Merge master in 2017-03-29 18:00:28 +01:00
Ben Johnson 9fb8f1ec1d
Fix database and tag limits. 2017-03-24 09:48:10 -06:00
Edd Robinson 1c4ecb12c1 Don't panic on nil engine 2017-03-22 10:07:29 -06:00
Edd Robinson f89de550ed Significantly speed up DROP DATABASE 2017-03-21 11:35:31 +00:00