Commit Graph

463 Commits (master)

Author SHA1 Message Date
Ben Johnson ba4c9e0317
Merge remote-tracking branch 'upstream/master' into er-tsi-index-part 2017-11-14 16:14:13 -07:00
Jason Wilder 8b18cc4456 Optimize deletes in tsi
The DropSeries code path ended up creating a MeasurementSeriesIterator
for each dropped series, this was too expensive just to see if a
series exists.

This adds a HasSeries func and fixes and issue where TSI files were
compacted while an iterator was still in use causing a panic.
2017-11-13 12:35:38 -07:00
Jason Wilder 13692639cb Fix create/delete series race
This fixes a race where writes and deletes to the same series and
measurements could sometimes leave the index in an inconsistent state.
2017-11-13 09:02:10 -07:00
Jason Wilder 80cd5e63af Optimize DeleteSeriesRange
This removes more allocations and speeds up some critical sections.
2017-11-13 09:02:10 -07:00
Jason Wilder aee395d3bd Make DeleteSeriesRange take SeriesIterator 2017-11-13 09:02:10 -07:00
Jason Wilder f893beb6d8 Use MeasurementSeriesKeysByExprIterator for deletes 2017-11-13 09:02:10 -07:00
Jason Wilder 88c48ec78b Rework Engine.DeleteSeriesRange to avoid allocations
This removes the containsSeries func which ends up creating a map
sized to the slice of keys passed in.  This doesn't scale well to
high cardinalities and creates a lot of garbage.
2017-11-13 08:50:07 -07:00
Jason Wilder b0c7a44eaa Adjust min/max time to work in the engine
The query language min and max times are slighly different than the
values used in the engine.  This allows faster codes to be used when
the whole time range is deleted.
2017-11-13 08:50:07 -07:00
Jonathan A. Sternberg 0b7c56bcd8 Update the zap logger dependency
The previous sha was taken from a revision on a devel branch that I
thought would continue staying in the tree after it was merged. That
revision was rebased away and the API was changed for the logger.

This updates the usage of the logger and adds a simple package for
constructing the base logger.

The 1.0 version of zap changed the format of the default console logger
so this change moves over to this new logger instead of attempting to
retain backwards compatibility with the old format.
2017-11-10 16:27:16 -06:00
Ben Johnson 9ad2b53881
intermediate 2017-11-09 09:18:33 -07:00
Andrew Hare ecb3952fa9 Allow human-readable byte sizes in config
Update support in the `toml` package for parsing human-readble byte sizes.
Supported size suffixes are "k" or "K" for kibibytes, "m" or "M" for
mebibytes, and "g" or "G" for gibibytes. If a size suffix isn't specified
then bytes are assumed.

In the config, `cache-max-memory-size` and `cache-snapshot-memory-size` are
now typed as `toml.Size` and support the new syntax.
2017-11-01 11:09:09 -05:00
Stuart Carnie 9a43c14653
Merge pull request #9041 from influxdata/sgc-influxql
influxdata/influxdb/influxql -> influxdata/influxql
2017-10-31 07:31:31 -07:00
Stuart Carnie f3d45ba301 influxdata/influxdb/influxql -> influxdata/influxql 2017-10-30 14:40:26 -07:00
Jason Wilder 48ebc53154 Revert "Fix race in disableLevelCompactions"
This reverts commit 4f8580fbaa.
2017-10-30 14:14:50 -06:00
Stuart Carnie c39f1ad748 Add batch cursor support to tsdb and tsm1
* batch cursors return slices of timestamps and values to reduce call
  overhead. Significantly improved iteration.
* added CreateCursor API to Shard, Engine
* moved build*Cursor to code gen
2017-10-25 13:38:07 -07:00
Stuart Carnie b7579340fe return query.ErrQueryInterrupted for read on InterruptCh 2017-10-24 14:10:28 -07:00
Stuart Carnie e9313876ab EXPLAIN ANALYZE
* Introduces EXPLAIN ANALYZE command, which
  produces a detailed tree of operations used to
  execute the query.

introduce context.Context to APIs

metrics package

* create groups of named measurements
* safe for concurrent access

tracing package

EXPLAIN ANALYZE implementation for OSS

Serialize EXPLAIN ANALYZE traces from remote nodes

use context.Background for tests

group with other stdlib packages

additional documentation and remove unused API

use influxdb/pkg/testing/assert

remove testify reference
2017-10-20 08:01:37 -07:00
Jason Wilder 4f8580fbaa Fix race in disableLevelCompactions
There was a race on the WaitGroup where we could end up calling Add
while another goroutine was still waiting.  The functions were confusing
so they have been simplified a bit since the compactions goroutines
have been reworked a lot already.
2017-10-16 10:50:16 -06:00
Jason Wilder 1401950b10 Only schedule one compaction per shard at a time
The scheduling logic ended up favoring more backlogged shards
too much and would starved active, less backed up shards.  This
occurred because the scheduling kicks in once a second.  When it
runs, it schedules as many compactions as it can.  A backed up shard
would end up having more compactions to run during the loop an would
generally get to schedule them more frequently.

This now allows each shard to try and schedule one compaction at a time
which provides a more balanced approach.  At some point, we'll probably
want to more directly balanc the each shards backlog vs letting it happen
somewhat randomly.
2017-10-09 11:40:32 -06:00
Jason Wilder 6b6ccf1a40 Wait for compaction gorotuines to finish 2017-10-04 10:01:44 -06:00
Jason Wilder 90df803802 Prevent infinite scheduling loop
One shard might be able to run a compaction, but could fail to
limits being hit.  This loop would continue indefinitely as the
same task would continue to be rescheduled.
2017-10-03 10:48:14 -06:00
Jason Wilder 71071ed67a Add compaction backlog stat
This gives an indication as to whether compactions are backed up
or not.
2017-10-03 10:48:14 -06:00
Jason Wilder ae821f4e2d Rework compaction scheduling
This changes the compaction scheduling to better utilize the available
cores that are free.  Previously, a level was planned in its own goroutine
and would kick off a number of compactions groups.  The problem with this
model was that if there were 4 groups, and 3 completed quickly, the planning
would be blocked for that level until the last group finished.  If the compactions
at the prior level are running more quickly, a large backlog could accumlate.

This now moves the planning to a single goroutine that plans each level in
succession and starts as many groups as it can.  When one group finishes,
the planning will start the next group for the level.
2017-10-03 10:48:13 -06:00
Joe LeGasse 1443b22379 auth: add series auth to 'show tag values' 2017-09-27 20:01:18 -04:00
Edd Robinson d0b81c1e6c Fix race on Cache entry 2017-09-27 18:10:23 +01:00
Jason Wilder db204f3eb7 Default concurrent compactions to 50% of available cores 2017-09-21 12:48:11 -06:00
Jason Wilder 0d52b060df Skip onFileStoreReplace with tsi 2017-09-19 15:27:25 -06:00
Jason Wilder 31646aae3a Release mmap pages when shard is cold
This instructs the kernel that it can release memory used by mmap'd
TSM files when they are not actively being used.  It the mappings are
use, the kernel will fault the pages back in.  On linux, this causes
RES memory to drop immediately when run.
2017-09-18 11:51:51 -06:00
Jason Wilder 2a0d7935d7 Switch level 3 compactions to use fast compaction strategy
This leaves the slower compactions that create full blocks to only
the full compaction.  This helps reduce CPU usage and memory while shards
are hot, but increases disk usage (reduced compression) slightly.
2017-09-11 15:26:24 -06:00
Jason Wilder 94e229ff59 Merge branch 'master' into jw-drop-series 2017-09-08 15:34:32 -06:00
Jason Wilder b9b648e2a0 Dynamically allocate cache store
The cache store can be memory intensive with many shards.  This
lazyily allocates it when needed and frees it when the cache is
empty and cold.
2017-09-07 16:35:08 -06:00
Jason Wilder a8d9eeef36 Reduce lock contention when deleting high cardinality series
Deleting high cardinality series could take a very long time, cause
write timeouts as well as dead lock the process.  This fixes these
issue to by changing the approach for cleaning up the indexes and
reducing lock contention.

The prior approach delete each series and updated every index (inmem)
during the delete.  This was very slow and cause the index to be locked
while it items in a slice were removed one by one.  This has been changed
to mark series as deleted and then rebuild the index asynchronously which
speeds up the process.

There was also a dead lock that could occur when deleing the field set.
Deleting the field set held a write lock and the function it invoked under
the lock could try to take a read lock on the field set.  This would then
deadlock.  This approach was also very slow and caused time out for writes.
It now uses faster approach that checks for the existing of the measurment
in the cache and filestore which does not take write locks.
2017-09-07 11:36:02 -06:00
Jonathan A. Sternberg 590be193e5 Include the number of scanned cached values in the iterator cost 2017-09-06 15:41:07 -05:00
Jonathan A. Sternberg 50d404e690 Initial implementation of explain plan
It prints the statistics of each iterator that will access the storage
engine. For each access of the storage engine, it will print the number
of shards that will potentially be accessed, the number of files that
may be accessed, the number of series that will be created, the number
of blocks, and the size of those blocks.
2017-09-01 09:01:10 -05:00
Jonathan A. Sternberg 466fc9026e Reduce how long it takes to walk the varrefs in an expression
This is used quite a bit to determine which fields are needed in a
condition. When the condition gets large, the memory usage begins to
slow it down considerably and it doesn't take care of duplicates.
2017-08-31 09:33:45 -05:00
Ben Johnson 1dbe0662d8
Use system cursors for measurement, series, and tag key meta queries. 2017-08-30 08:35:20 -06:00
Stuart Carnie d189621d07 log message when iterator closed by finalizer 2017-08-21 16:46:24 -07:00
Stuart Carnie 25edd7bfdf naming 2017-08-17 15:47:47 -07:00
Stuart Carnie c86dc0d103 redundant allocation is overwritten by line 1769 2017-08-17 11:12:41 -07:00
Stuart Carnie 823f903cc6 inputs are closed if Merge returns error and use <type>FinalizerIterator
* <type>FinalizerIterator sets a runtime finalizer and calls Close
  when garbage collected. This will ensure any associated cursors
  are closed and the associated TSM files released
* `query.Iterators#Merge` call could return an error and the inputs
  would not be closed, causing a cursor leak
2017-08-17 11:12:18 -07:00
Jason Wilder 85842503be Fix deadlock in engine/measurement fields
The OnReplace func ends up trying to acquire locks on MeasurementFields.  When
its called via snapshotting, this can deadlock because the snapshotting goroutine
also holds an RLock on the engine.  If a delete measurement calls is run at the
right time, it will lock the MeasurementFields and try to acquire a lock on the engine
to disable compactions.  This creates a deadlock.

To fix this, the OnReplace callback is moved to a function param to allow only Replace
calls as part of a compaction to invoke it as opposed to both snapshotting and compactions.

Fixes #8713
2017-08-16 16:43:40 -06:00
Jonathan A. Sternberg 697759613c Remove time comparisons from the inner sections of the storage engine 2017-08-16 16:51:13 -05:00
Jonathan A. Sternberg 9a2357c2c0 Separate the query engine into a separate package
This change provides a clear separation between the query engine
mechanics and the query language so that the language can be parsed and
dealt with separate from the query engine itself.
2017-08-16 13:38:43 -05:00
Stuart Carnie 3caeee8a24 fix: cursor leak when cur == nil and aux or conds is not empty 2017-08-16 09:17:20 -07:00
Ben Johnson e0d8cb0ef3
Cardinality AST, parser, & rewriter fixes. 2017-08-16 09:27:29 -06:00
Ben Johnson 60ab1282ea
Refactor system iterators.
Previously pseudo iterators could be created for meta data such
as series, measurement, and tag data. These iterators were created
at a higher level and lacked a lot of the power of the query engine.

This commit moves system iterators down to the series level and
supports the following:

	- _name
	- _seriesKey
	- _tagKey
	- _tagValue
	- _fieldKey

These can be used as normal fields such as:

	SELECT _seriesKey FROM cpu

This will return all the series keys for `cpu`.
2017-08-16 09:27:29 -06:00
David Norton 1d8d739418 fix #8677: check for snapshot size == 0 2017-08-16 09:43:56 -04:00
Jason Wilder 90e2cadeb6 Fix drop measurement not dropping all data
If there were multiple shards, drop measurement could update the index
and remove the measurement before the other shards ran their deletes.
This causes the later shards to not see any series to delete.

The fix is to all deleteSeries to handle the index delete which already
accounts for removing the measurement when it is fully removed from the
index.
2017-08-15 11:19:45 -06:00
Edd Robinson aa7095be5a Use a merge-based approach for TagValues 2017-08-02 14:10:52 +01:00
Jason Wilder 94a48774b7 Pull in new index filter 2017-08-02 14:10:52 +01:00
Stuart Carnie ff65f0f24d Reduce allocations using nil cursors and literal value cursors
```
benchmark                           old ns/op     new ns/op     delta
BenchmarkIntegerIterator_Next-8     82.8          22.7          -72.58%

benchmark                           old allocs     new allocs     delta
BenchmarkIntegerIterator_Next-8     3              0              -100.00%

benchmark                           old bytes     new bytes     delta
BenchmarkIntegerIterator_Next-8     32            0             -100.00%
```
2017-07-30 09:15:34 -07:00
Jason Wilder 778000435a Conver all keys from string to []byte in TSM engine
This switches all the interfaces that take string series key to
take a []byte.  This eliminates many small allocations where we
convert between to two repeatedly.  Eventually, this change should
propogate futher up the stack.
2017-07-28 11:00:50 -06:00
Jason Wilder 18a02d50d7 Interrupt in progress TSM compactions
When snapshots and compactions are disabled, the check to see if
the compaction should be aborted occurs in between writing to the
next TSM file.  If a large compaction is running, it might take
a while for the file to be finished writing causing long delays.

This now interrupts compactions while iterating over the blocks to
write which allows them to abort immediately.
2017-07-27 15:58:56 -06:00
Stuart Carnie eec80692c4 Taught tsm1 storage engine how to read and write uint64 values
* introduced UnsignedValue type
  * leveraged existing int64 compression algorithms (RLE, Simple 8B)
* tsm and WAL can read and write UnsignedValue
* compaction is aware of UnsignedValue
* unsigned support to model, cursors and write points

NOTE: there is no support to create unsigned points, as the line
protocol has not been modified.
2017-07-24 09:03:22 -07:00
Jason Wilder 839cddf6d5 Refresh index after compactions
The in-memory index can get out of sync when deletes and writes
to the same measurement are running concurrently.  The index is
updated independently from data on disk and it's possible for the
index to unassign a shard when data still exists on disk.  What happens
is that there are TSM files on disk, but the index does not know that
the series that exist in those files still are in the shard.  Restarting
the server reloads the index and the data is visible again.  From and
end user perspective, this can look like more data is deleted than should
have been or that deleted data re-appears after a restart or writes to the
shard occur again.

There isn't an easy way to resolve this since the index and storage
are not transactional resources and we cannot atomically commit or
rollback changes to both at once.

As a workaround, after new TSM files are installed, we refresh the
index with series keys that exist in the new tsm files as well as
any lingering data still in the cache.  There is a small window of time
when the index may be missing series, but it will re-appear after the refresh
completes.
2017-07-07 12:19:30 -06:00
Jason Wilder 9ac042b5cd Reduce lock contention when disabling compactions
The monitor goroutine calls enable compactions every 10s to spin down
(or start up) goroutines for cold shards.  This frequent Lock may be
causing lock contention for writes and queries which get blocked trying
to acquire an RLock.

The go RWMutex says that new RLock calls will block if there is a
pending Lock call that is blocked.  Switching the common path to use
an RLock should avoid the Lock and reduce lock contention for writes
and queries.
2017-07-05 15:42:21 -06:00
Stuart Carnie 46796d932f add database to index, engine and shard; call AuthorizeSeriesRead 2017-05-26 13:21:50 -07:00
Joe LeGasse 815f740f4c initial fga work
wip

wip

fix tests / build
2017-05-26 13:16:27 -07:00
Jason Wilder 29e4287fd2 Preven masking root errors when compactions are in progress
The root error when creating a tmp file when writing a snapshot
was hidden making it difficult to determine why snapshots were
failing.
2017-05-23 12:09:36 -06:00
Jason Wilder bd6d0681e9 Ensure planned files are released
The defer was never executed because the planning happens in a
long running goroutine that loops.  The plans need to be released
immediately after applying them.
2017-05-23 12:08:25 -06:00
Jason Wilder 1833475c09 Fix TSM tmp files leaking
TMP files could leak when compactions failed for various reasons. They
were also being deleted inadvertently when compactions were disabled causing
other errors to be reported in the logs.
2017-05-22 14:51:18 -06:00
Jason Wilder 4d002bb370 Limit concurrent compactions within a shard
This changes full compactions within a shard to run sequentially
instead of running all the compaction groups in parallel.  Normally,
there is only 1 full compaction group to run.  At times, there could
be several which causes instability if they are all running concurrently
as they tie up a cpu for long periods of time.

Level compactions are also capped to a max of 4 concurrently running for each level
in a shard.  This prevents sudden spikes in CPU and disk usage due to a large backlog
of tsm files at a given level.
2017-05-12 14:05:24 -06:00
Jason Wilder 2cac46ebbc Convert usage of strings to []byte
Measurement name and field were converted between []byte and string
repetively causing lots of garbage.  This switches the code to use
[]byte in the write path.
2017-05-12 14:05:19 -06:00
Jason Wilder c0c6ad6880 Don't disable snapshots when snapshot compactions are disabled
Snapshot compactions can be disabled independently of snapshotting
capability.  This prevents taking backups of shards that have compactions
disabled.
2017-05-05 14:15:45 -06:00
Jason Wilder bc639c5982 Make disableLevelCompactions lighter weight
Since this is called more frequently now, the cleanup func was invoked
quite a bit which makes several syscalls per shard.  This should only
be called the first time compactions are disabled.
2017-05-04 09:56:15 -06:00
Jason Wilder 88848a9426 Remove per shard monitor goroutine
The monitor goroutine ran for each shard and updated disk stats
as well as logged cardinality warnings.  This goroutine has been
removed by making the disks stats more lightweight and callable
direclty from Statisics and move the logging to the tsdb.Store.  The
latter allows one goroutine to handle all shards.
2017-05-03 16:31:57 -06:00
Jason Wilder f87fd7c7ed Stop background compaction goroutines when shard is cold
Each shard has a number of goroutines for compacting different levels
of TSM files.  When a shard goes cold and is fully compacted, these
goroutines are still running.

This change will stop background shard goroutines when the shard goes
cold and start them back up if new writes arrive.
2017-05-03 16:31:57 -06:00
Jason Wilder 3d1c0cd981 Don't return compaction plans for files already part of a plan
The compactor prevents the same file from being compacted by different
compaction runs, but it can result in warning errors in the logs that
are confusing.

This adds compaction plan tracking to the planner so that files are
only part of one plan at a given time.
2017-05-03 16:31:57 -06:00
Jason Wilder 8fc9853ed8 Add max-concurrent-compactions limit
This limit allows the number of concurrent level and full compactions
to be throttled.  Snapshot compactions are not affected by this limit
as then need to run continously.

This limit can be used to control how much CPU is consumed by compactions.
The default is to limit to the number of CPU available.
2017-05-03 16:31:57 -06:00
Jason Wilder 141f0d71cd Update index when import files 2017-04-28 14:00:45 -06:00
Jason Wilder a76146e34a Add Store.Import capability
This allows the contents of a backup to be imported into a shard without
requiring the whole shard to be replaced.
2017-04-28 13:30:46 -06:00
Jason Wilder 137d0c0d09 Rename WAL.WritePoints to WAL.WriteMulti
To match Cache.WriteMulti
2017-04-28 13:20:55 -06:00
Jason Wilder 5c51ae7319 Merge branch '1.2' into jw-merge-123 2017-04-14 14:36:54 -06:00
Jason Wilder ff1270dfeb Fix dropping fields created data corruption
The Point is intended to be immutable after being parsed since it
is shared by several goroutines.  When dropping a field (e.g. time),
corrupted data can result if one goroutine is delete the field
while another is marshaling the underlying byte slices.

To avoid this, the shard will just skip invalid fields and series
instead of trying to mutate them by deleting them.
2017-04-07 12:58:42 -06:00
Ben Johnson 9c97cd8601
Merge remote-tracking branch 'upstream/master' into tsi 2017-04-04 12:46:09 -06:00
Jason Wilder 5fa8073fc2 Merge branch '1.2' into jw-merge-123 2017-04-04 11:12:06 -06:00
Jason Wilder 8da84e6144 Merge branch 'master' into tsi 2017-04-03 11:21:02 -06:00
Jason Wilder 32c4d43952 Speed up drop measurement
This reworks drop measurement to use a sorted list of series keys
instead of creating an intermediate map.  It remove allocations
and some extra garbage that is created during drop measurement.
2017-04-03 08:57:53 -06:00
Edd Robinson fddaff2cc8 Merge master in 2017-03-29 18:00:28 +01:00
Ben Johnson 2edfb1c92d
Ignore series limit on database load. 2017-03-24 16:27:16 -06:00
Ben Johnson 9fb8f1ec1d
Fix database and tag limits. 2017-03-24 09:48:10 -06:00
Jason Wilder 7119ef8f29 Merge pull request #8193 from influxdata/jw-123-backports
1.2.3 backports
2017-03-23 13:31:35 -06:00
Jason Wilder ced953ae89 Use typed values to avoid allocations
This switches compactions to use type values (FloatValues) from the
generic Values type.  It avoids a bunch of allocations where each value
much be converted from a specific type to an interface{}.
2017-03-23 12:53:17 -06:00
Jason Wilder 2972a3f223 Remove MMAP derefencing code
This code was added to address some slow startup issues.  It is believed
to be the cause of some segfault panic's that occur at query time when
the underlying MMAP array has been unmapped.  The current structure of
code makes this change unnecessary now.
2017-03-23 12:46:23 -06:00
Jason Wilder 8f7b251afd Merge branch 'master' into jw-tsi 2017-03-20 17:17:26 -06:00
Jason Wilder 8177df2dab Simplify Measurement.TagSets signature 2017-03-17 16:19:10 -06:00
Jason Wilder 2d5d899ac2 Allow queries to be interrupted during planning
If a bad query is run, kill query and limits would not kick in until
after it started executing.  Some bad queries that involve high
cardinality can cause the server to OOM just from planning which
defeats the purpose of the max-select-series limit.

This change primarily fixes max-select-series limit so that the query
is killed earlier and has the side effect that kill query now can kill
a query while it's being planned.
2017-03-17 16:00:54 -06:00
Jason Wilder bc4aeefbed Check max-series-limit in shard iterator creation
The limit waited until all the iterators had been created which still
allows problem queries to be planned.  This allows the queries to be
aborted much earlier in some cases.
2017-03-17 16:00:25 -06:00
Jason Wilder e9eb925170 Coalesce multiple WAL fsyncs
Fsyncs to the WAL can cause higher IO with lots of small writes or
slower disks.  This reworks the previous wal fsyncing to remove the
extra goroutine and remove the hard-coded 100ms delay.  Writes to
the wal still maintain the invariant that they do not return to the
caller until the write is fsync'd.

This also adds a new config options wal-fsync-delay (default 0s)
which can be increased if a delay is desired.  This is somewhat useful
for system with slower disks, but the current default works well as
is.
2017-03-15 16:31:03 -06:00
Ben Johnson 1807772388
Fix tsi tests. 2017-03-15 11:23:58 -06:00
Ben Johnson 358b1e0b05
Merge remote-tracking branch 'upstream/master' into tsi 2017-03-15 10:13:32 -06:00
Jason Wilder 2f7d4995b4 Use typed values to avoid allocations
This switches compactions to use type values (FloatValues) from the
generic Values type.  It avoids a bunch of allocations where each value
much be converted from a specific type to an interface{}.
2017-03-09 16:27:07 -07:00
Jason Wilder b9e5375043 Merge branch '1.2' into jw-merge-12 2017-03-08 13:16:50 -07:00
Jason Wilder 37187cbe6d Delete series under fields lock
Still seeing the panic that switching this logic around was supposed
to fix.  We now delete the bulk of data outside of the fields lock
and then again, under the write lock, to ensure that the field mapping
is accurate.  We don't do the full delete under the lock because it
can block writes and queries that require a read lock.
2017-03-06 14:19:55 -07:00
Jason Wilder 675d7c9d65 Merge branch '1.2' into jw-merge12 2017-03-06 11:09:05 -07:00
Jason Wilder 3c70abf061 Delete series before remove from field index
There is a race where the field type can be deleted while a new type
is written and during a query.  When this happens, an iterator for
the new type is created but old data make still exist in the cache
for TSM files causing a panic.
2017-03-06 09:38:27 -07:00
Jason Wilder a024003f2c Merge branch '1.2' into jw-merge-12 2017-02-22 12:13:29 -07:00
Ben Johnson 78a9bb2527 Remove Tags.shouldCopy, replace with forceCopy on series creation.
Previously, tags had a `shouldCopy` flag to indicate if those tags
referenced an underlying buffer and should be copied to allow GC.
Unfortunately, this prevented tags from being copied that were
created and referenced the mmap which caused segfaults.

This change removes the `shouldCopy` flag and replaces it with a
`forceCopy` argument in `CreateSeriesIfNotExists()`. This allows
the write path to indicate that tags must be cloned on insert.
2017-02-21 11:13:35 -07:00
Mark Rushakoff 601cbcd084 Merge branch '1.2' into mr-merge-12 2017-02-17 16:14:22 -08:00
Jonathan A. Sternberg 2fe48d6781 Rename zap import back to github.com/uber-go/zap
They rebased a revision we were previously relying upon that allowed us
to use the vanity name so we are reverting back to an older version with
the old import path.
2017-02-17 17:17:22 -06:00
Jason Wilder 4b6289ce58 Merge pull request #7942 from influxdata/jw-cache-partitions
Reduce write timeouts
2017-02-10 10:07:08 -07:00
Edd Robinson 38eb6d5994 Don't load meta data for tsi 2017-02-09 18:04:23 +00:00
Edd Robinson a6a2f9d5f0 Don't load meta data for tsi 2017-02-09 17:59:14 +00:00
Jonathan A. Sternberg e1fa48d0dd Fix ORDER BY time DESC with ordering series keys
The order of series keys is in ascending alphabetical order, not
descending alphabetical order, when it is ordered by descending time.
This fixes the ordering so points are returned in descending order. The
emitter also had the conditions for choosing which iterator to use in
the wrong direction (which only affects aggregates with `FILL(none)`).
2017-02-06 15:49:12 -06:00
Jonathan A. Sternberg 95831b3307 Fix LIMIT and OFFSET when they are used in a subquery
This fixes LIMIT and OFFSET when they are used in a subquery where the
grouping of the inner query is different than the grouping of the outer
query. When organizing tag sets, the grouping of the outer query is
used so the final result is in the correct order. But, unfortunately,
the optimization incorrectly limited the number of points based on the
grouping in the outer query rather than the grouping in the inner query.

The ideal solution would be to use the outer grouping to further
organize it by the grouping for the inner subquery, but that's more
difficult to do at the moment. As an easier fix, the query engine now
limits the output of each series. This may result in these types of
queries being slower in some situations like this one:

    SELECT mean(value) FROM (SELECT value FROM cpu GROUP BY host LIMIT 1)

This will be slower in a situation where the `cpu` measurement has a
high cardinality and many different tags.

This also fixes `last()` and `first()` when they are used in a subquery
because those functions use `LIMIT 1` as an internal optimization.
2017-02-06 14:04:34 -06:00
Ben Johnson 57f44d5f0c
Include index in snapshot. 2017-02-01 14:19:42 -07:00
Jason Wilder 6eb46d2100 Remove unnecessary read lock on engine 2017-02-01 11:10:41 -07:00
Ben Johnson 047c21f4d9
Merge remote-tracking branch 'upstream/master' into tsi 2017-01-24 09:28:58 -07:00
Joe LeGasse bf58d9ffb7 Update backup to use ioutil.ReadDir 2017-01-12 16:28:01 -05:00
Jason Wilder 1e56b5416b Fix compactions sometimes getting stuck
I ran into an issue where the cache snapshotting seemed to stop
completely causing the cache to fill up and never recover.  I believe
this is due to the the Timer being reused incorrectly.  Instead,
use a Ticker that will fire more regularly and not require the resetting
logic (which was wrong).
2017-01-11 17:57:40 -07:00
Jonathan A. Sternberg 3ba950b029 Fix for subqueries to use the parallel iterator correctly
Also, fix the `Iterators.Merge(IteratorOptions)` function so it consults
the `Ordered` attribute to determine which iterator it should use to
merge the input iterators.
2017-01-11 10:47:18 -06:00
Mark Rushakoff a135906b43 Merge pull request #7747 from influxdata/mr-lint-cleanup
Miscellaneous lint cleanup
2017-01-10 08:22:00 -08:00
Jonathan A. Sternberg 4a559c4620 Merge pull request #7646 from influxdata/js-4619-subqueries
Support subquery execution in the query language
2017-01-09 14:14:01 -06:00
Jason Wilder eb4d311c0a Add retry/backup when backing up a shard fails
The backup command can fail if a snapshot is running which silently
closes the connection.  This causes the backup shard command to continue
on as if nothing failed.
2017-01-09 11:28:48 -07:00
Jonathan A. Sternberg d7c8c7ca4f Support subquery execution in the query language
This adds query syntax support for subqueries and adds support to the
query engine to execute queries on subqueries.

Subqueries act as a source for another query. It is the equivalent of
writing the results of a query to a temporary database, executing
a query on that temporary database, and then deleting the database
(except this is all performed in-memory).

The syntax is like this:

    SELECT sum(derivative) FROM (SELECT derivative(mean(value)) FROM cpu GROUP BY *)

This will execute derivative and then sum the result of those derivatives.
Another example:

    SELECT max(min) FROM (SELECT min(value) FROM cpu GROUP BY host)

This would let you find the maximum minimum value of each host.

There is complete freedom to mix subqueries with auxiliary fields. The only
caveat is that the following two queries:

    SELECT mean(value) FROM cpu
    SELECT mean(value) FROM (SELECT value FROM cpu)

Have different performance characteristics. The first will calculate
`mean(value)` at the shard level and will be faster, especially when it comes to
clustered setups. The second will process the mean at the top level and will not
include that optimization.
2017-01-07 13:00:48 -06:00
Ben Johnson 2b3cd415e2
Fixing rebase. 2017-01-06 09:52:16 -07:00
Ben Johnson d1f1e19591
Fixing rebase. 2017-01-06 09:31:25 -07:00
Ben Johnson c1c98223ec
Fix and optimize tsi1 FileSet. 2017-01-05 10:17:12 -07:00
Ben Johnson 9b1e8215e0
Remove dictionary encoding, add bulk series insertion. 2017-01-05 10:17:11 -07:00
Ben Johnson 9bd19cdc69
Fix inmem DELETE SERIES. 2017-01-05 10:17:11 -07:00
Ben Johnson f9efcb3365
Re-add shared in-memory index. 2017-01-05 10:17:09 -07:00
Edd Robinson 0f9b2bfe6a
Fix tests 2017-01-05 10:16:15 -07:00
Edd Robinson 4ccb8dbab1
Move series count check to shard 2017-01-05 10:16:13 -07:00
Ben Johnson 745b1973a8
tsi compaction 2017-01-05 10:15:37 -07:00
Ben Johnson 183418dcbd
Fix tsi TAG KEYS iterator. 2017-01-05 10:15:36 -07:00
Ben Johnson 9f8b206b51
Fix measurement system queries. 2017-01-05 10:15:34 -07:00
Ben Johnson 4aa78383d1
Fix tsi1 series deletion. 2017-01-05 10:14:48 -07:00
Ben Johnson e7940cc556
Add tsi1 series system iterator. 2017-01-05 10:14:00 -07:00
Ben Johnson 87f4e0ec0a
Add regex support in tsi1. 2017-01-05 10:12:29 -07:00
Jason Wilder 1ba64f3610
Disable max-value-per-tag option temporarily
This is too slow currently and causes all writes to timeout.
2017-01-05 10:11:47 -07:00
Ben Johnson fbe7f464ee
Improve insert performance. 2017-01-05 10:11:12 -07:00
Ben Johnson cb93f10120
Remove per-shard in-memory index. 2017-01-05 10:11:09 -07:00
Ben Johnson 409b0165f5
shared in-memory index 2017-01-05 10:09:57 -07:00
Ben Johnson a812502ea3
reintegrating in-memory index 2017-01-05 10:07:35 -07:00
Ben Johnson 1ac067e53b
intermediate 2017-01-05 10:03:09 -07:00
Ben Johnson 62d2b3ebe9
Series filtering. 2017-01-05 10:02:42 -07:00
Ben Johnson 62269c3cea
intermediate 2017-01-05 10:02:41 -07:00
Ben Johnson 5f5b02e052
intermediate 2017-01-05 10:01:49 -07:00
Ben Johnson afce53e81b
Rebase fixes. 2017-01-05 10:00:44 -07:00
Edd Robinson 9ed6040265
Tidy up 2017-01-05 09:58:37 -07:00
Edd Robinson 2d9bd09784
Use []byte where possible in Index 2017-01-05 09:57:34 -07:00
Edd Robinson 4b1ef68dc9
Move series and measurement stats to store 2017-01-05 09:54:05 -07:00
Edd Robinson bd8dd9a291
Sketches working 2017-01-05 09:54:04 -07:00
Edd Robinson d19fbf5ab4
Wire in HLL estimator 2017-01-05 09:54:03 -07:00
Edd Robinson 2b8efefef4
Initial index interface 2017-01-05 09:51:43 -07:00
Edd Robinson 05bc4dec00
Refactor 2017-01-05 09:50:23 -07:00
Edd Robinson c535e3899a
Remove in-memory index from Shard and Store 2017-01-05 09:47:09 -07:00
Mark Rushakoff 6a94d200c8 Merge remote-tracking branch 'influx/master' into mr-godoc 2017-01-04 13:27:36 -08:00
Cory LaNou 3c518f8927
panicing is bad -> error returns are good 2017-01-03 14:28:29 -06:00
Mark Rushakoff 07b87f2630 Miscellaneous lint cleanup 2017-01-03 09:47:32 -08:00
Mark Rushakoff 41415cf2fb Update godoc for tsm1 package 2017-01-02 07:30:18 -08:00
Jason Wilder 637a67ea35 Reduce lock contention on measurementFields 2016-12-19 14:17:01 -07:00
Jonathan A. Sternberg ec57108520 Use proper uber-go/zap import path
It looks like the real import path to the project is go.uber.org/zap
instead of github.com/uber-go/zap since the example in the project
references that path.
2016-12-15 08:54:14 -06:00
Edd Robinson 05ec6ad9ad Add to index safely 2016-12-14 18:23:36 +00:00
Edd Robinson d78ca1a0f3 Fix some races 2016-12-14 18:23:36 +00:00
Edd Robinson 66edb32182 Sharded Cache using a hash ring 2016-12-14 18:23:36 +00:00
Jonathan A. Sternberg 21502a39e8 Switch logging to use structured logging everywhere
The logging library has been switched to use uber-go/zap. While the
logging has been changed to use structured logging, this commit does not
change any of the logging statements to take advantage of the new
structured log or new log levels. Those changes will come in future
commits.
2016-12-14 10:45:15 -06:00
Allen Petersen 31129ab0e9 Use slash separator for filenames in tar archives
NO-OP on platforms with unix path separator.
On Windows paths get converted to slashes before adding to archive and back to backslashes during restore.
2016-11-29 09:44:08 -08:00
Jason Wilder e8a28cfbab Expose Shard.LastModified
This returns the LastModified time of the shard.  The LastModified
time is the wall time when a change to the shards state occurred.
It uses the WAL or FileStore to determine the max mod time.
2016-11-23 10:04:07 -07:00
Jason Wilder 0ee58c208a Switch time.Sleep to time.Ticker
Avoids an allocation when calling time.Sleep
2016-11-15 16:13:55 -07:00
Jonathan A. Sternberg a515aeda39 Optimize first/last when no group by interval is present
The `first()` and `last()` functions response rate would increase linear
to the number of points even though it seems like it shouldn't. This
optimization greatly reduces the amount of time to return a response
when no `GROUP BY time(...)` clause is present in a query.
2016-10-25 09:57:31 -05:00
Jason Wilder 686d1a7ba4 Remove unused config options 2016-10-24 15:32:38 -06:00
Jonathan A. Sternberg 3681bc8a43 Filter out series within shards that do not have data for that series
Previously, we would return a full tag set for every shard and the tag
set would include all series that existed in the database index
including series that didn't physically exist within that shard. This
led to the tag sets returned being incredibly huge when we had high
cardinality but sparse data. Since the data was sparse, it was
unexpected that it would cause such a large strain on the system by most
people.

Now we filter out the series ids that are not assigned to the current
shard when computing a tag set for that shard. This lowers the memory
usage for high cardinality sparse data drastically and allows queries on
those to complete successfully.

This does not resolve issues for high cardinality data in every shard
that is also spread out over a long series of time. That situation isn't
nearly as common as the above situation though.
2016-10-20 14:15:34 -05:00
Jason Wilder b50d9558cf Merge pull request #7479 from influxdata/jw-clean-err
Skip cleanup if dir does not exist
2016-10-18 15:49:09 -06:00
Jason Wilder f30b00c24f Skip cleanup if dir does not exist 2016-10-18 15:33:39 -06:00
Mark Rushakoff 377c40f122 Add stats for active compactions
Unify logic around compaction execution to a single place.

Also report on the error stats that we track. Previously they were not
emitted in the stats output.
2016-10-18 14:12:21 -07:00
Joe LeGasse de9c743004 TSM: update comments for disabling level compactions 2016-10-18 14:14:59 -06:00
Joe LeGasse eda8f70372 TSM: Handle concurrent deletes for compaction 2016-10-18 14:14:59 -06:00
Jason Wilder ed7975874f Rename Enabled -> Enable 2016-10-18 12:22:00 -06:00
Jason Wilder f254b4f3ae Allow snapshot compactions during deletes
If a delete takes a long time to process while writes to the
shard are occuring, it was possible for the cache to fill up
and writes to be rejected.  This occurred because we disabled
all compactions while writing tombstone file to prevent deleted
data from re-appearing after a compaction completed.

Instead, we only disable the level compactions and allow snapshot
compactions to continue.  Snapshots already handle deleted data
with the cache and wal.

Fixes #7161
2016-10-18 12:14:51 -06:00
Jason Wilder 798fa0a9f8 Return error with unknown field type
This will just panic when trying to snapshot the value because EmptyValue
can't be written to TSM files.
2016-10-03 16:30:21 -06:00
Jason Wilder 125f106956 Pre-size the values map when write points 2016-10-03 16:30:21 -06:00
Joe LeGasse 743946fafb models: Add FieldIterator type
The FieldIterator is used to scan over the fields of a point, providing
information, and delaying parsing/decoding the value until it is needed.
This change uses this new type to avoid the allocation of a map for the
fields which is then thrown away as soon as the points get converted
into columns within the datastore.
2016-10-03 16:30:21 -06:00
Jason Wilder 190537a557 Fix DeleteSeries when multiple fields exists
The logic for determining whether a series key was already in the
the set of TSM series was too restrictive.  It allowed only the first
field of a series to be added leaving all the remaing fields.
2016-08-31 20:35:35 -06:00
Jonathan A. Sternberg dc2527ce86 Merge branch '1.0' 2016-08-31 14:45:57 -05:00
Jason Wilder d878d30d18 Fix shard write stats
* Rename *Fail to *Err for consistency with other metrics
* Use index Series count instead of sepaate counter
2016-08-29 09:46:11 -06:00
Edd Robinson 90ff713f21 Fix base64 encoding issue in stats
Fixes #7177.
2016-08-22 15:21:31 +01:00
Ben Johnson 8aa224b22d
reduce memory allocations in index
This commit changes the index to point to index data in the shards
instead of keeping it in-memory on the heap.
2016-08-16 14:09:00 -06:00
Jason Wilder c1a94e8861 Remove temp TSM files when disabling compactions
If they were left around, re-enabling them again could cause
future compactions to continuously fail.  A restart of the
server would clean them up correctly though.
2016-07-28 20:25:37 -06:00
Jason Wilder 5764a730d5 Prevent tombstoning series keys more than once
If there were multiple TSM files and a delete/drop was run,
we would write the delete series to the tombstone file N
times for each file.  This occurred because FileStore.WalkKeys walks
every key in every TSM file which can return duplicate keys.

This issue caused TSM files to be much larger than they should be
and also cause large memory usage during the delete.
2016-07-28 20:25:36 -06:00
Jason Wilder cab84ae279 Prevent concurrent compactions from stepping on each other
Normally, compactions do not conflict on the files they are compacting.
If the full cold threshold is set very low, it can cause conflicts where
two compactions compact the same files.  The full compaction was the
only place this could happen as it's planning is greedy.

To make this safer for concurrent execution, the compaction tracks which
files are current being compacted and prevents any new compactions from
starting if the file set overlaps.

Fixes #6595
2016-07-26 12:58:25 -06:00
Cory LaNou 968d322d6d finish tsm file exporter 2016-07-21 17:20:51 -05:00
Edd Robinson f37e726869 Add trace logging statements to tsdb 2016-07-21 11:14:29 +01:00
Edd Robinson 44231abcbd Add trace logger controlled via DataLoggingEnabled 2016-07-21 11:14:29 +01:00
Edd Robinson 83cc580ff8 Tidy up logging 2016-07-21 11:14:29 +01:00
Jason Wilder 46fdcba6e3 Remove compaction enabled logging
Too verbose
2016-07-17 23:53:12 -06:00
Jason Wilder 2fa28ba1d3 Don't log error when compactions are aborted 2016-07-17 23:53:12 -06:00
Jason Wilder b48d88ce9e Abort running compactions when series are deleted
If a delete is issued while a compaction is running, the a newly
deleted series could re-appear after the compaction completed. This
could occur the compaction had already written the blocks for series
that were just deleted.  When the compaction completes, the newly
written tombstone files would be deleted, essentially undeleting the
series.
2016-07-17 23:53:12 -06:00
Jason Wilder 0264966f5c Add index optimize planning step
For larger datasets, it's possible for shards to get into a state where
many large, dense TSM files exist.  While the shard is still hot for
writes, full compactions will skip these files since they are already
fairly optimized and full compactions are expensive.  If the write volume
is large enough, the shard can accumulate lots of these files.  When
a file is in this state, it's index can contain every series which
causes startup times to increase since each file must parse the full
set of series keys for every file.  If the number of series is high,
the index can be quite large causing large amount of disk IO at startup.

To fix this, a optmize compaction is run when a full compaction planning
step decides there is nothing to do.  The optimize compaction combines
and spreads the data and series keys across all files resulting in each
file containing the full series data for that shard and a subset of the
total set of keys in the shard.

This allows a shard to only store a series key once in the shard reducing
storage size as well allows a shard to only load each key once at startup.
2016-07-14 11:32:36 -06:00
Jonathan A. Sternberg 12a33fe0d3 Add stats and diagnostics to the TSM engine
Track the number of TSM files in the file store and keep engine
statistics related to the number of TSM compactions.
2016-07-07 19:35:55 -05:00
Jonathan A. Sternberg 837a9804cf Refactoring the monitor service to avoid expvar
Truncate the time interval output of the monitor service to be on even
time intervals rather than on every minute based on the start time. This
normalizes the output from the monitor service.
2016-07-07 11:13:58 -05:00
Jonathan A. Sternberg 497db2a6d3 Removing dead code from every package except influxql
The tsdb package had a substantial amount of dead code related to the
old query engine still in there. It is no longer used, so it was removed
since it was left unmaintained. There is likely still more code that is
the same, but wasn't found as part of this code cleanup.

influxql has dead code show up because of the code generation so it is
not included in this pruning.
2016-06-20 22:41:07 -05:00
Jonathan A. Sternberg 6e205ce135 Set the condition cursor instead of aux iterator when creating a nil condition cursor
A copy/paste error had nil cursors destined for a condition cursor get
set to the auxiliary cursor instead. When the number of conditions
exceeded the number of auxiliary fields, this would result in a stack
trace in some situations. When the number of conditions was less than or
equal to the number of auxiliary fields, it means that an auxiliary
cursor may have been overwritten with a nil cursor accidentally and a
leak might have happened since it was never closed.

Fixes #6859.
2016-06-17 14:54:48 -05:00
Jason Wilder ac6addd0b5 Ensure restore doesn't write broken files
Restore would try to open the shard if there was an error.  If there
was an error, the files written are very likely to be partially written
and they can cause the server to panic.

To prevent a shard from trying to open broken files, we now write to
a temp file and rename it to the actual name only after fully writing
and fsyncing the file.
2016-06-07 14:36:46 -06:00
Jason Wilder a74ea4cbf4 Allow creating shards in a disable state
For restoring a shard, we need to be able to have the shard open,
but disabled.  It was racy to open it and then disable it separately
since writes/queries could occur in between that time.
2016-06-01 16:17:18 -06:00
Jason Wilder 1ff8ecf4fb Add ability to disable shards
Disabling a shard causes all writes and queries to a shard to return
an error.  This also disables compactions for the shard.
2016-05-31 10:51:54 -06:00
Jason Wilder 11959005f4 Switch backup to use shard.Snapshot
This switch the backup shard call to use the shard Snapshot that
internally creates a snapshot by hardlinking all of the TSM and
tombstone files instead.  This reduces the time that the FileStore
is locked and will allow for larger shards to be backup more easily.
2016-05-27 09:30:25 -06:00
Edd Robinson 6a7f9527e3 Revert d2672a3 and 1e0a4e9 2016-05-27 10:34:14 +01:00
Edd Robinson d2672a3280 Update Go version 2016-05-26 15:26:09 +01:00
Edd Robinson 1e0a4e9119 Move fields under mutex 2016-05-26 12:00:46 +01:00