Commit Graph

1464 Commits (db/wait-timeout-utility)

Author SHA1 Message Date
Jason Wilder 9ee305f6f5 Periodically re-allocate cache store
This perioically re-allocates the cache store to avoid memory
fragmentation and gradual slow down of the store after repeated
deletes and inserts into the map.
2017-09-19 15:27:25 -06:00
Jason Wilder 2885b9b310 Remove entrySizeHints map
There is a lot of overhead for calculating the hints for larger
cardinalities.  This slows down resetting the partitions in the ring.
2017-09-19 15:27:25 -06:00
Jason Wilder 4124a8ed97 Simplify cache ring
The continuum slice is not needed since the number of partitions
doesn't change.  This removes the slice to make the mapping simpler.
2017-09-19 15:27:25 -06:00
Stuart Carnie ed7bc9d825 fix FindValues panic for empty array 2017-09-19 14:23:32 -07:00
Stuart Carnie 92756ec0ad Reduce allocations, improve readEntries performance by simplifying loop
* callers of ReadEntries and Key API can cache allocated slice
2017-09-19 11:57:10 -07:00
Stuart Carnie baa05de3f8 add benchmarks 2017-09-19 11:47:48 -07:00
Stuart Carnie cfc6a1cd9f implement optimization for Include function
```
benchmark                                            old ns/op     new ns/op     delta
BenchmarkIntegerValues_IncludeNone_1000-8            651           6.69          -98.97%
BenchmarkIntegerValues_IncludeMiddleHalf_1000-8      1131          114           -89.92%
BenchmarkIntegerValues_IncludeFirst_1000-8           638           33.9          -94.69%
BenchmarkIntegerValues_IncludeLast_1000-8            1269          32.2          -97.46%
BenchmarkIntegerValues_IncludeNone_10000-8           7751          6.76          -99.91%
BenchmarkIntegerValues_IncludeMiddleHalf_10000-8     11582         1378          -88.10%
BenchmarkIntegerValues_IncludeFirst_10000-8          7911          43.8          -99.45%
BenchmarkIntegerValues_IncludeLast_10000-8           12442         38.4          -99.69%
```

(cherry picked from commit fb93ad5)
2017-09-19 09:53:28 -07:00
Stuart Carnie ca40c1ad3c <type>Values.Exclude function uses binary search and copy builtin
```
± benchcmp old.txt new.txt
benchmark                                            old ns/op     new ns/op     delta
BenchmarkIntegerValues_ExcludeNone_1000-8            1285          7.34          -99.43%
BenchmarkIntegerValues_ExcludeMiddleHalf_1000-8      1258          148           -88.24%
BenchmarkIntegerValues_ExcludeFirst_1000-8           1268          7.51          -99.41%
BenchmarkIntegerValues_ExcludeLast_1000-8            1125          27.7          -97.54%
BenchmarkIntegerValues_ExcludeNone_10000-8           12665         7.31          -99.94%
BenchmarkIntegerValues_ExcludeMiddleHalf_10000-8     12039         976           -91.89%
BenchmarkIntegerValues_ExcludeFirst_10000-8          12663         7.29          -99.94%
BenchmarkIntegerValues_ExcludeLast_10000-8           10990         34.9          -99.68%
```

(cherry picked from commit d7a3c23)
2017-09-19 09:53:26 -07:00
Jason Wilder 31646aae3a Release mmap pages when shard is cold
This instructs the kernel that it can release memory used by mmap'd
TSM files when they are not actively being used.  It the mappings are
use, the kernel will fault the pages back in.  On linux, this causes
RES memory to drop immediately when run.
2017-09-18 11:51:51 -06:00
Jason Wilder 7d467c2047 Fix windows unmapping of anonymous index slice 2017-09-12 10:30:10 -06:00
Jason Wilder b4b3c159cc Fixup rebase 2017-09-11 17:04:10 -06:00
Jason Wilder 26f92ce6ac Remove commented out code 2017-09-11 15:30:05 -06:00
Jason Wilder 820856347c Don't use disk temp file for snapshots 2017-09-11 15:29:26 -06:00
Jason Wilder 4ed9c75896 Fix unmapping anonymous memory slice 2017-09-11 15:29:26 -06:00
Jason Wilder 97f7857715 Remove mutex on TSMWriter
This isn't used by more than one goroutine so locks are unnecessary.
2017-09-11 15:29:26 -06:00
Jason Wilder a93a5e9bdf Include the size of the key in the cache size 2017-09-11 15:29:26 -06:00
Jason Wilder 7388eb9499 Use disk when writing TSM index 2017-09-11 15:29:25 -06:00
Jason Wilder d3e832b462 Use offheap memory for indirect index offsets slice 2017-09-11 15:29:25 -06:00
Jason Wilder 91eb9de341 Use existing TSMReader from file store during compactions
Compactions would create their own TSMReaders for simplicity. With
very high cardinality compactions, creating the reader and indirectIndex
can start to use a significant amount of memory.

This changes the compactions to use a reader that is already allocated
and managed by the FileStore.
2017-09-11 15:29:25 -06:00
Jason Wilder 739ecd2ebd Fix a compaction planning bug
There was a race where the plan returned was for files that were just
compacted so the compaction would immediately abort.
2017-09-11 15:26:25 -06:00
Jason Wilder bc4fb0ea10 Sort index entries if necessary
These are already sorted during compaction, so switch to sorting lazily
to avoid the CPU and allocations.  This would only occur when using if
using the writer directly.
2017-09-11 15:26:25 -06:00
Jason Wilder f18dec6a4a Use sorted slice for writing TSM index
The directIndex used by the TSMWriter maintained a map of series keys
to index entries.  When the index is written to the TSM file, the keys
are sorted and then written out in order.

The reason for this is because directIndex used to be the only index
and it was optimized more for reading.  The reading has been replaced
by the indirectIndex so the map of keys ends up wasting space.

During compactions, the series keys (and index entries) are already sorted
so this change uses the sorting to avoid the map and sort when writing the
index.  This reduces allocations and CPU usage quite a bit for larger cardinality
TSM files.
2017-09-11 15:26:24 -06:00
Jason Wilder 2a0d7935d7 Switch level 3 compactions to use fast compaction strategy
This leaves the slower compactions that create full blocks to only
the full compaction.  This helps reduce CPU usage and memory while shards
are hot, but increases disk usage (reduced compression) slightly.
2017-09-11 15:26:24 -06:00
Jason Wilder 94e229ff59 Merge branch 'master' into jw-drop-series 2017-09-08 15:34:32 -06:00
Jason Wilder 78922f9821 Set rc to nil when closing WALSegmentReader 2017-09-08 14:55:02 -06:00
Jason Wilder b9b648e2a0 Dynamically allocate cache store
The cache store can be memory intensive with many shards.  This
lazyily allocates it when needed and frees it when the cache is
empty and cold.
2017-09-07 16:35:08 -06:00
Jason Wilder 5581f8b4ae Re-use WALSegmentReaders at startup 2017-09-07 12:56:17 -06:00
Jason Wilder e39276b96f Skip reading 0 byte wal segments 2017-09-07 12:24:54 -06:00
Jason Wilder a8d9eeef36 Reduce lock contention when deleting high cardinality series
Deleting high cardinality series could take a very long time, cause
write timeouts as well as dead lock the process.  This fixes these
issue to by changing the approach for cleaning up the indexes and
reducing lock contention.

The prior approach delete each series and updated every index (inmem)
during the delete.  This was very slow and cause the index to be locked
while it items in a slice were removed one by one.  This has been changed
to mark series as deleted and then rebuild the index asynchronously which
speeds up the process.

There was also a dead lock that could occur when deleing the field set.
Deleting the field set held a write lock and the function it invoked under
the lock could try to take a read lock on the field set.  This would then
deadlock.  This approach was also very slow and caused time out for writes.
It now uses faster approach that checks for the existing of the measurment
in the cache and filestore which does not take write locks.
2017-09-07 11:36:02 -06:00
Jonathan A. Sternberg 590be193e5 Include the number of scanned cached values in the iterator cost 2017-09-06 15:41:07 -05:00
Jonathan A. Sternberg 50d404e690 Initial implementation of explain plan
It prints the statistics of each iterator that will access the storage
engine. For each access of the storage engine, it will print the number
of shards that will potentially be accessed, the number of files that
may be accessed, the number of series that will be created, the number
of blocks, and the size of those blocks.
2017-09-01 09:01:10 -05:00
Jonathan A. Sternberg 466fc9026e Reduce how long it takes to walk the varrefs in an expression
This is used quite a bit to determine which fields are needed in a
condition. When the condition gets large, the memory usage begins to
slow it down considerably and it doesn't take care of duplicates.
2017-08-31 09:33:45 -05:00
Joe LeGasse 732a0c2eaa Merge pull request #8769 from influxdata/jl-map-cleanup
cleanup: remove poor usage of ',ok' with maps
2017-08-31 09:18:42 -04:00
Ben Johnson 1dbe0662d8
Use system cursors for measurement, series, and tag key meta queries. 2017-08-30 08:35:20 -06:00
Joe LeGasse a95647b720 cleanup: remove poor usage of ',ok' with maps
There are several places in the code where comma-ok map retrieval was
being used poorly. Some were benign, like checking existence before
issuing an unconditional delete with no cleanup. Others were potentially
far more serious: assuming that if 'ok' was true, then the resulting
pointer retrieved from the map would be non-nil. `nil` is a perfectly
valid value to store in a map of pointers, and the comma-ok syntax is
meant for when membership is distinct from having a non-zero value.
There was only one or two cases that I saw that being used correctly for
maps of pointers.
2017-08-30 09:49:31 -04:00
Edd Robinson 9be7c5aaa6 Run relevant engine tests on both indexes 2017-08-23 10:47:01 +01:00
Jason Wilder d305b89f74 Merge pull request #8726 from influxdata/jw-tsm-file-leak
Fix leaking tmp file when large compaction aborted
2017-08-22 09:59:23 -05:00
Stuart Carnie 2ef9b489f0 Merge pull request #8727 from influxdata/sgc-finalizer
log message when iterator is closed by finalizer
2017-08-22 07:29:38 -07:00
Stuart Carnie d189621d07 log message when iterator closed by finalizer 2017-08-21 16:46:24 -07:00
Jason Wilder e265d150be Fix leaking tmp file when large compaction aborted
If a large compaction was running and was aborted. It could would leave
some tmp files around for files that it had fully written.  The current
active file was cleaned up, but already completed ones would not.  This
would occur when a TSM file needed to rollover due to size.
2017-08-21 17:04:57 -06:00
Jonathan A. Sternberg 5ce6007347 Merge pull request #8724 from influxdata/js-remove-unused-cursor
This cursor implementation appears to be completely unused
2017-08-21 17:44:51 -05:00
Jonathan A. Sternberg c0f7a8af5b This cursor implementation appears to be completely unused
Remove it so that its existence doesn't confuse someone that this is
actually the cursor. The real cursors appear to be in file_store.gen.go.
2017-08-21 16:27:23 -05:00
Stuart Carnie 25edd7bfdf naming 2017-08-17 15:47:47 -07:00
Stuart Carnie c86dc0d103 redundant allocation is overwritten by line 1769 2017-08-17 11:12:41 -07:00
Stuart Carnie 823f903cc6 inputs are closed if Merge returns error and use <type>FinalizerIterator
* <type>FinalizerIterator sets a runtime finalizer and calls Close
  when garbage collected. This will ensure any associated cursors
  are closed and the associated TSM files released
* `query.Iterators#Merge` call could return an error and the inputs
  would not be closed, causing a cursor leak
2017-08-17 11:12:18 -07:00
Jason Wilder 85842503be Fix deadlock in engine/measurement fields
The OnReplace func ends up trying to acquire locks on MeasurementFields.  When
its called via snapshotting, this can deadlock because the snapshotting goroutine
also holds an RLock on the engine.  If a delete measurement calls is run at the
right time, it will lock the MeasurementFields and try to acquire a lock on the engine
to disable compactions.  This creates a deadlock.

To fix this, the OnReplace callback is moved to a function param to allow only Replace
calls as part of a compaction to invoke it as opposed to both snapshotting and compactions.

Fixes #8713
2017-08-16 16:43:40 -06:00
Jonathan A. Sternberg 697759613c Remove time comparisons from the inner sections of the storage engine 2017-08-16 16:51:13 -05:00
Jonathan A. Sternberg 9a2357c2c0 Separate the query engine into a separate package
This change provides a clear separation between the query engine
mechanics and the query language so that the language can be parsed and
dealt with separate from the query engine itself.
2017-08-16 13:38:43 -05:00
Stuart Carnie 3caeee8a24 fix: cursor leak when cur == nil and aux or conds is not empty 2017-08-16 09:17:20 -07:00
Ben Johnson e0d8cb0ef3
Cardinality AST, parser, & rewriter fixes. 2017-08-16 09:27:29 -06:00
Ben Johnson 60ab1282ea
Refactor system iterators.
Previously pseudo iterators could be created for meta data such
as series, measurement, and tag data. These iterators were created
at a higher level and lacked a lot of the power of the query engine.

This commit moves system iterators down to the series level and
supports the following:

	- _name
	- _seriesKey
	- _tagKey
	- _tagValue
	- _fieldKey

These can be used as normal fields such as:

	SELECT _seriesKey FROM cpu

This will return all the series keys for `cpu`.
2017-08-16 09:27:29 -06:00
David Norton 1d8d739418 fix #8677: check for snapshot size == 0 2017-08-16 09:43:56 -04:00
Jason Wilder 90e2cadeb6 Fix drop measurement not dropping all data
If there were multiple shards, drop measurement could update the index
and remove the measurement before the other shards ran their deletes.
This causes the later shards to not see any series to delete.

The fix is to all deleteSeries to handle the index delete which already
accounts for removing the measurement when it is fully removed from the
index.
2017-08-15 11:19:45 -06:00
Jason Wilder 61b13eb12b Fix partiallyRead logic
The partiallyRead func didn't account for the initial values and would
return true for blocks that had not been read at all.  This causes a
slower path during compactions that forces a block to be decoded when
it could just be merged as is without decoded.  This causes compactions
to consume more CPU and run slower at times.
2017-08-14 16:44:32 -06:00
Edd Robinson aa7095be5a Use a merge-based approach for TagValues 2017-08-02 14:10:52 +01:00
Jason Wilder 94a48774b7 Pull in new index filter 2017-08-02 14:10:52 +01:00
Stuart Carnie 5449285c4c Merge pull request #8652 from influxdata/sgc-literal-cursor
Reduce allocations using nil cursors and literal value cursors
2017-08-01 10:20:24 -07:00
Jason Wilder 173276a409 Remove unused filestore reference
Reduces cursor struct size from 119 bytes to 111.
2017-08-01 09:41:16 -06:00
Stuart Carnie ff65f0f24d Reduce allocations using nil cursors and literal value cursors
```
benchmark                           old ns/op     new ns/op     delta
BenchmarkIntegerIterator_Next-8     82.8          22.7          -72.58%

benchmark                           old allocs     new allocs     delta
BenchmarkIntegerIterator_Next-8     3              0              -100.00%

benchmark                           old bytes     new bytes     delta
BenchmarkIntegerIterator_Next-8     32            0             -100.00%
```
2017-07-30 09:15:34 -07:00
Jason Wilder 3d12c62121 Avoid repeatedly growning decoded values slices 2017-07-28 11:00:56 -06:00
Jason Wilder 778000435a Conver all keys from string to []byte in TSM engine
This switches all the interfaces that take string series key to
take a []byte.  This eliminates many small allocations where we
convert between to two repeatedly.  Eventually, this change should
propogate futher up the stack.
2017-07-28 11:00:50 -06:00
Jason Wilder 8009da0187 Remove some extra cursor buffers that are not needed 2017-07-28 10:53:07 -06:00
Jason Wilder 6582caa78b Reduce allocations when creating KeyCursors
The refs map was to increment the file references one time each.
It doesn't hurt to increment them multiple times though.

We also do not need to copy the files slice as we are accessing it
under a read lock so it can't be changed.
2017-07-28 10:53:07 -06:00
Jason Wilder 6e6cc991ee Merge pull request #8629 from influxdata/jw-compaction-abort
Interrupt in progress TSM compactions
2017-07-27 16:12:40 -06:00
Jason Wilder 18a02d50d7 Interrupt in progress TSM compactions
When snapshots and compactions are disabled, the check to see if
the compaction should be aborted occurs in between writing to the
next TSM file.  If a large compaction is running, it might take
a while for the file to be finished writing causing long delays.

This now interrupts compactions while iterating over the blocks to
write which allows them to abort immediately.
2017-07-27 15:58:56 -06:00
Stuart Carnie 0c79ec6f17 update xxhash and use Sum64String to avoid allocs
```
± benchcmp ring_before.txt ring_after.txt
benchmark                             old ns/op     new ns/op     delta
BenchmarkRing_getPartition_100-8      108           48.1          -55.46%
BenchmarkRing_getPartition_1000-8     113           48.9          -56.73%

benchmark                             old allocs     new allocs     delta
BenchmarkRing_getPartition_100-8      1              0              -100.00%
BenchmarkRing_getPartition_1000-8     1              0              -100.00%

benchmark                             old bytes     new bytes     delta
BenchmarkRing_getPartition_100-8      192           0             -100.00%
BenchmarkRing_getPartition_1000-8     192           0             -100.00%
```
2017-07-26 10:16:54 -07:00
Stuart Carnie d243df5ca3 simplify loop 2017-07-24 09:03:22 -07:00
Stuart Carnie eec80692c4 Taught tsm1 storage engine how to read and write uint64 values
* introduced UnsignedValue type
  * leveraged existing int64 compression algorithms (RLE, Simple 8B)
* tsm and WAL can read and write UnsignedValue
* compaction is aware of UnsignedValue
* unsigned support to model, cursors and write points

NOTE: there is no support to create unsigned points, as the line
protocol has not been modified.
2017-07-24 09:03:22 -07:00
Jason Wilder 4244d0e053 Merge pull request #8568 from influxdata/jw-tombstone-compress
Compress tombstone files
2017-07-10 11:28:09 -06:00
Jason Wilder dba3ce1a42 Merge pull request #8576 from influxdata/jw-delete-index
Fix index inconsistency after deletes
2017-07-07 14:36:33 -06:00
Jason Wilder e9370e0b86 Fix indefinite hang in WAL.writeToLog
There was a race in the WAL writeToLog and scheduleSync which could
lead to a writing goroutine blocking indefinitely on its syncErr channel.

The issue was that the clearing of the syncCount happenend after the
wal was unlock.  If a goroutine was able to lock, write and call scheduleSync
before the existing scheduleSync goroutine returns and ran the defer to
clear the syncCount, then a new scheduleSync goroutine would not get started.
This left the writing goroutine block with nothing to signal it.

While in this state, a RLock on the engine was held.  If a Lock was requested
on the engine during this time, all future writes and queries would block waiting
on the blocked wal writer.

The fix is to move the atomic clearing of syncCount before the Lock is released.
2017-07-07 13:31:52 -06:00
Jason Wilder 5e11cdcdd7 Fix incorrect condition in OverlapsKeyRange
The min key was not used in OverlapsKeyRange which caused it to return
false when it should be true.  This causes a bug where deletes would not
write tombstones for files that actually contained the data it was supposed
to delete.
2017-07-07 12:19:33 -06:00
Jason Wilder 839cddf6d5 Refresh index after compactions
The in-memory index can get out of sync when deletes and writes
to the same measurement are running concurrently.  The index is
updated independently from data on disk and it's possible for the
index to unassign a shard when data still exists on disk.  What happens
is that there are TSM files on disk, but the index does not know that
the series that exist in those files still are in the shard.  Restarting
the server reloads the index and the data is visible again.  From and
end user perspective, this can look like more data is deleted than should
have been or that deleted data re-appears after a restart or writes to the
shard occur again.

There isn't an easy way to resolve this since the index and storage
are not transactional resources and we cannot atomically commit or
rollback changes to both at once.

As a workaround, after new TSM files are installed, we refresh the
index with series keys that exist in the new tsm files as well as
any lingering data still in the cache.  There is a small window of time
when the index may be missing series, but it will re-appear after the refresh
completes.
2017-07-07 12:19:30 -06:00
Jason Wilder 3e7dfad7c4 Compress tombstone files
This adds a v3 format that is a gzip compressed version of the v2
format.  It reduces the size of tombstone files substantially without
having to support a more feature rich file format for tombstones.
2017-07-06 10:10:31 -06:00
Jason Wilder 9ac042b5cd Reduce lock contention when disabling compactions
The monitor goroutine calls enable compactions every 10s to spin down
(or start up) goroutines for cold shards.  This frequent Lock may be
causing lock contention for writes and queries which get blocked trying
to acquire an RLock.

The go RWMutex says that new RLock calls will block if there is a
pending Lock call that is blocked.  Switching the common path to use
an RLock should avoid the Lock and reduce lock contention for writes
and queries.
2017-07-05 15:42:21 -06:00
Edd Robinson 101af89987 Update CHANGELOG 2017-07-05 16:35:41 +01:00
Edd Robinson 0748d28986 Ensure tmp files cleaned up when compaction disabled 2017-07-04 20:04:23 +01:00
Stuart Carnie 46796d932f add database to index, engine and shard; call AuthorizeSeriesRead 2017-05-26 13:21:50 -07:00
Joe LeGasse 815f740f4c initial fga work
wip

wip

fix tests / build
2017-05-26 13:16:27 -07:00
Jason Wilder 208ef09f87 Prevent writing series keys that exceed max key size
WriteBlock was missing the check for the max series keys which allowed
series keys to be written that were larger than the 2 bytes allocated
to store their length.  When this occurred, the TSM can fail to load.
2017-05-24 13:41:09 -06:00
Jason Wilder 29e4287fd2 Preven masking root errors when compactions are in progress
The root error when creating a tmp file when writing a snapshot
was hidden making it difficult to determine why snapshots were
failing.
2017-05-23 12:09:36 -06:00
Jason Wilder bd6d0681e9 Ensure planned files are released
The defer was never executed because the planning happens in a
long running goroutine that loops.  The plans need to be released
immediately after applying them.
2017-05-23 12:08:25 -06:00
Jason Wilder 4e582f297a Fix race in findGenerations
It was possible that the findGenerations could get stuck returning
no files even when generations existed on disk.
2017-05-23 12:05:47 -06:00
Jason Wilder 1833475c09 Fix TSM tmp files leaking
TMP files could leak when compactions failed for various reasons. They
were also being deleted inadvertently when compactions were disabled causing
other errors to be reported in the logs.
2017-05-22 14:51:18 -06:00
Stuart Carnie c863923e68 cache MarshalSize 2017-05-12 14:05:25 -06:00
Stuart Carnie 0151afe31c check size and allocate once 2017-05-12 14:05:25 -06:00
Stuart Carnie 096d6f65b4 explicit sizes 2017-05-12 14:05:24 -06:00
Jason Wilder 4d002bb370 Limit concurrent compactions within a shard
This changes full compactions within a shard to run sequentially
instead of running all the compaction groups in parallel.  Normally,
there is only 1 full compaction group to run.  At times, there could
be several which causes instability if they are all running concurrently
as they tie up a cpu for long periods of time.

Level compactions are also capped to a max of 4 concurrently running for each level
in a shard.  This prevents sudden spikes in CPU and disk usage due to a large backlog
of tsm files at a given level.
2017-05-12 14:05:24 -06:00
Jason Wilder 2cac46ebbc Convert usage of strings to []byte
Measurement name and field were converted between []byte and string
repetively causing lots of garbage.  This switches the code to use
[]byte in the write path.
2017-05-12 14:05:19 -06:00
Jason Wilder 503d41a08f Add LimitedBytePool for wal buffers
This pool was previously a pool.Bytes to avoid repetitive allocations.
It was recently switchted to a sync.Pool because pool.Bytes held onto
very larger buffers at times which were never released.  sync.Pool is
showing up in allocation profiles quite frequently.

This switches the pool to a new pool that limits how many buffers are
in the pool as well as the max size of each buffer in the pool.  This
provides better bounds on allocations.
2017-05-11 11:27:00 -06:00
Jason Wilder e17be9f4ba Merge pull request #8377 from influxdata/jw-encoders
Speed up time encoding/decoding
2017-05-11 10:38:27 -06:00
Joe LeGasse 087d9f4670 tsm: fixed test to not require sorted backup tarball 2017-05-11 12:00:19 -04:00
Jason Wilder b150a6293c Merge pull request #8380 from influxdata/jw-wal-buffer
Use buffer writer for wal segments
2017-05-11 08:34:44 -06:00
Jason Wilder b81ac21bcb Merge pull request #8378 from influxdata/jw-snapshot-disable
Don't disable snapshots when snapshot compactions are disabled
2017-05-10 12:00:27 -06:00
Jason Wilder e102fcca9c Use buffer writer for wal segments 2017-05-10 11:42:32 -06:00
Jason Wilder 39a829c1ae Speed up time encoding/decoding
This speeds up time encoding and decoding by skipping the divisor
scaling if scaling by 1.  Since division and multiplication are expensive
cpu and scaling by 1 has no effect, this just slows encoding and decoding
down.
2017-05-10 11:12:35 -06:00
Jason Wilder 4e3e707abc Fix packed time encoded benchmark 2017-05-10 10:35:44 -06:00
Jason Wilder 29c2b1958e Fix deletes triggering unnecessary compactions
Tombstone files would be written to all TSM files even if the deleted
keys or timerange did not exist in the TSM file.  This had the side
effect of causing shards to get recompacted back to the same state. If
any shards or large numbers of TSM files existed, disk usage and CPU
utilization would spike causing issues.

This prevents tombstones being written for TSM files that could not
possiby contain the series keys being deleted or if the delted time
range is outside the range of the file.
2017-05-08 14:52:28 -06:00
Jason Wilder c0c6ad6880 Don't disable snapshots when snapshot compactions are disabled
Snapshot compactions can be disabled independently of snapshotting
capability.  This prevents taking backups of shards that have compactions
disabled.
2017-05-05 14:15:45 -06:00
Jason Wilder bc639c5982 Make disableLevelCompactions lighter weight
Since this is called more frequently now, the cleanup func was invoked
quite a bit which makes several syscalls per shard.  This should only
be called the first time compactions are disabled.
2017-05-04 09:56:15 -06:00
Jason Wilder b4ea523910 Include snapshot size in the total cache size
This was causing a shard to appear idle when in fact a snapshot compaction
was running.  If the time was write, the compactions would be disabled and
the snapshot compaction would be aborted.
2017-05-03 16:31:58 -06:00
Jason Wilder 88848a9426 Remove per shard monitor goroutine
The monitor goroutine ran for each shard and updated disk stats
as well as logged cardinality warnings.  This goroutine has been
removed by making the disks stats more lightweight and callable
direclty from Statisics and move the logging to the tsdb.Store.  The
latter allows one goroutine to handle all shards.
2017-05-03 16:31:57 -06:00
Jason Wilder f87fd7c7ed Stop background compaction goroutines when shard is cold
Each shard has a number of goroutines for compacting different levels
of TSM files.  When a shard goes cold and is fully compacted, these
goroutines are still running.

This change will stop background shard goroutines when the shard goes
cold and start them back up if new writes arrive.
2017-05-03 16:31:57 -06:00
Jason Wilder 3d1c0cd981 Don't return compaction plans for files already part of a plan
The compactor prevents the same file from being compacted by different
compaction runs, but it can result in warning errors in the logs that
are confusing.

This adds compaction plan tracking to the planner so that files are
only part of one plan at a given time.
2017-05-03 16:31:57 -06:00
Jason Wilder 8fc9853ed8 Add max-concurrent-compactions limit
This limit allows the number of concurrent level and full compactions
to be throttled.  Snapshot compactions are not affected by this limit
as then need to run continously.

This limit can be used to control how much CPU is consumed by compactions.
The default is to limit to the number of CPU available.
2017-05-03 16:31:57 -06:00
Jason Wilder 3c130cd39c Expose TSMWriter.Flush
Allows flushing the writer so we don't always need to close and
re-open the file handle.
2017-04-28 14:00:50 -06:00
Jason Wilder 141f0d71cd Update index when import files 2017-04-28 14:00:45 -06:00
Jason Wilder a76146e34a Add Store.Import capability
This allows the contents of a backup to be imported into a shard without
requiring the whole shard to be replaced.
2017-04-28 13:30:46 -06:00
Jason Wilder 3839fe34ea Remove FileStore.Add/Remove
Can use Replace which handles files in-use and stats correctly.
2017-04-28 13:20:55 -06:00
Jason Wilder 137d0c0d09 Rename WAL.WritePoints to WAL.WriteMulti
To match Cache.WriteMulti
2017-04-28 13:20:55 -06:00
Jason Wilder 28422f2fec Use consistent receiver var name for Value types 2017-04-28 13:20:55 -06:00
Jason Wilder 1bc4936336 Export Reader.ReadBytes 2017-04-28 13:20:55 -06:00
Jason Wilder d88604f6f2 Move repetive loop checks outside of values loop 2017-04-20 13:45:04 -06:00
Jason Wilder 888689f5d3 Move values loop under type switch
All the values read must be of the same type so repeatedly using
the type switch is confusing and less efficiient.
2017-04-20 13:39:49 -06:00
Jason Wilder b0988511bf Use fixed size array instead of slice 2017-04-20 13:38:33 -06:00
Jason Wilder da6bdfdda8 Use bufio.Reader when reading wal segments
Reduces disk IO due to small reads.
2017-04-20 13:33:42 -06:00
Jason Wilder 8e9cbd7ffc Simplify WALSegmentReader.UnmarshalBinary
There were two loops over nvals which created some extra allocation
which coudl be replaced with a simplet slice capacity and append.
2017-04-20 13:33:42 -06:00
Jason Wilder ef65ee77f4 Switch WAL byte pools to sync/pool
The current bytes.Pool will hold onto byte slices indefinitely. Large
writes can cause the pool to hold onto very large buffers over time.
Testing w/ sync/pool seems to perform similarly now so using a sync/pool
will allow these buffers to be GC'd when necessary.
2017-04-20 12:28:42 -06:00
Jason Wilder d155d37ca8 Reduce TSM write buffer
When many TSM files are being compacted, the buffers can add up fairly
quickly.
2017-04-20 12:28:42 -06:00
Jason Wilder d7c5dd0a3e Reduce wal sync goroutine churn
Under high write load, the sync goroutine would startup, and end
very frequently.  Starting a new goroutine so frequently adds a small
amount of latency which causes writes to take long and sometimes timeout.

This changes the goroutine to loop until there are no more waiters which
reduce the churn and latency.
2017-04-20 12:28:34 -06:00
Jason Wilder aa9925621b Fix deadlock in wal
If the sync waiters channel was full, it would block sending to the
channel while holding a the wal write lock.  The sync goroutine would
then be stuck acquiring the write lock and could not drain the channel.

This increases the buffer to 1024 which would require a very high write
load to fill as well as retuns and error if the channel is full to prevent
the blocking.
2017-04-19 11:33:13 -06:00
Jason Wilder 5c51ae7319 Merge branch '1.2' into jw-merge-123 2017-04-14 14:36:54 -06:00
Jason Wilder ff1270dfeb Fix dropping fields created data corruption
The Point is intended to be immutable after being parsed since it
is shared by several goroutines.  When dropping a field (e.g. time),
corrupted data can result if one goroutine is delete the field
while another is marshaling the underlying byte slices.

To avoid this, the shard will just skip invalid fields and series
instead of trying to mutate them by deleting them.
2017-04-07 12:58:42 -06:00
Ben Johnson 9c97cd8601
Merge remote-tracking branch 'upstream/master' into tsi 2017-04-04 12:46:09 -06:00
Jason Wilder 5fa8073fc2 Merge branch '1.2' into jw-merge-123 2017-04-04 11:12:06 -06:00
Jason Wilder 84cbee227a Fix file store not close all TSM files
Regression added via #8192
2017-04-04 10:58:51 -06:00
Jason Wilder 4f850b5cff Skip TestCache_Deduplicate_Concurrent on windows 2017-04-04 08:48:55 -06:00
Jason Wilder 8da84e6144 Merge branch 'master' into tsi 2017-04-03 11:21:02 -06:00
Jason Wilder 32c4d43952 Speed up drop measurement
This reworks drop measurement to use a sorted list of series keys
instead of creating an intermediate map.  It remove allocations
and some extra garbage that is created during drop measurement.
2017-04-03 08:57:53 -06:00
Jason Wilder a78da51b7c Use buffered writer when writing tombstones
When deleting many series, the many small writes flood the disks
and consume a lot of CPU time.
2017-04-03 08:57:52 -06:00
Jason Wilder 6232d5e56d Remove defer allocations in TSMReader 2017-04-03 08:57:52 -06:00
Jason Wilder 920c8396c6 Use sorted merge in FileStore.WalkKeys
WalkKeys serially walked each TSM file and invoked fn for each key.
Caller needed to handle duplicate calls to fn with the same key
because the same key could exist in multiple TSM files.  The serial
execution was also slower.

Since the series keys are already sorted, we can iterate over all
files in parallel and skip duplicates using a sorted merge.  This
fixes the duplicate invocation issue as well as speeds up walking
all keys.

This can significant improve startup performance when many TSM files
exists that may not have been fully compacted.  This also has benefits
for deletes (measurements/series) since duplicates are removed saving
extra allocations and work.  This may also allow for the optimize
compaction to be removed provided startup times are fast enough.
2017-04-03 08:57:52 -06:00
Edd Robinson fddaff2cc8 Merge master in 2017-03-29 18:00:28 +01:00
Ben Johnson 2edfb1c92d
Ignore series limit on database load. 2017-03-24 16:27:16 -06:00
Ben Johnson 9fb8f1ec1d
Fix database and tag limits. 2017-03-24 09:48:10 -06:00
Jason Wilder 631681796d Remove tsl file committed by mistake 2017-03-23 16:18:27 -06:00
Jason Wilder 7119ef8f29 Merge pull request #8193 from influxdata/jw-123-backports
1.2.3 backports
2017-03-23 13:31:35 -06:00
Jason Wilder ca1919e5de Use standard merge algorithm for merging values
The previous version was very innefficient due to the benchmarks used
to optimize it having a bug.  This version always allocates a new
slice, but is O(n).
2017-03-23 12:53:59 -06:00
Jason Wilder ba2571903d Fix broken Values.Merge benchmark
Merge had the side effect of modifying the original values so
the results are wrong because they always hit the fast path
after the first run.
2017-03-23 12:53:50 -06:00
Jason Wilder 890ffb4ce8 Generate encode*Values funcs 2017-03-23 12:53:29 -06:00
Jason Wilder ced953ae89 Use typed values to avoid allocations
This switches compactions to use type values (FloatValues) from the
generic Values type.  It avoids a bunch of allocations where each value
much be converted from a specific type to an interface{}.
2017-03-23 12:53:17 -06:00
Jason Wilder a1c84ae6f3 Add block type for BlockIterator 2017-03-23 12:49:17 -06:00
Jason Wilder 2972a3f223 Remove MMAP derefencing code
This code was added to address some slow startup issues.  It is believed
to be the cause of some segfault panic's that occur at query time when
the underlying MMAP array has been unmapped.  The current structure of
code makes this change unnecessary now.
2017-03-23 12:46:23 -06:00
Jason Wilder 61f80db1b9 Skip cardinaltiy dups on circle race test 2017-03-22 15:20:38 -06:00
Jason Wilder c443e639b0 Fix 32bit alignment issue in wal.sync 2017-03-22 11:21:29 -06:00
Ben Johnson afe41f1c80
Fix tsm1/tsi1 broken tests. 2017-03-21 12:21:48 -06:00
Jason Wilder 8f7b251afd Merge branch 'master' into jw-tsi 2017-03-20 17:17:26 -06:00
Jason Wilder 8177df2dab Simplify Measurement.TagSets signature 2017-03-17 16:19:10 -06:00
Jason Wilder 2d5d899ac2 Allow queries to be interrupted during planning
If a bad query is run, kill query and limits would not kick in until
after it started executing.  Some bad queries that involve high
cardinality can cause the server to OOM just from planning which
defeats the purpose of the max-select-series limit.

This change primarily fixes max-select-series limit so that the query
is killed earlier and has the side effect that kill query now can kill
a query while it's being planned.
2017-03-17 16:00:54 -06:00
Jason Wilder bc4aeefbed Check max-series-limit in shard iterator creation
The limit waited until all the iterators had been created which still
allows problem queries to be planned.  This allows the queries to be
aborted much earlier in some cases.
2017-03-17 16:00:25 -06:00
Jason Wilder e9eb925170 Coalesce multiple WAL fsyncs
Fsyncs to the WAL can cause higher IO with lots of small writes or
slower disks.  This reworks the previous wal fsyncing to remove the
extra goroutine and remove the hard-coded 100ms delay.  Writes to
the wal still maintain the invariant that they do not return to the
caller until the write is fsync'd.

This also adds a new config options wal-fsync-delay (default 0s)
which can be increased if a delay is desired.  This is somewhat useful
for system with slower disks, but the current default works well as
is.
2017-03-15 16:31:03 -06:00
Ben Johnson 1807772388
Fix tsi tests. 2017-03-15 11:23:58 -06:00
Ben Johnson cf7ba96377
Merge branch 'tsi-log-compact' into tsi 2017-03-15 10:18:40 -06:00
Ben Johnson 358b1e0b05
Merge remote-tracking branch 'upstream/master' into tsi 2017-03-15 10:13:32 -06:00
Jason Wilder 65464ea0d1 Merge pull request #8131 from influxdata/jw-values-merge
Use standard merge algorithm when merging Values
2017-03-15 09:51:21 -06:00
Jason Wilder a4cfeacedb Use standard merge algorithm for merging values
The previous version was very innefficient due to the benchmarks used
to optimize it having a bug.  This version always allocates a new
slice, but is O(n).
2017-03-15 08:59:41 -06:00
Jason Wilder 4d37c9dc9e Fix broken Values.Merge benchmark
Merge had the side effect of modifying the original values so
the results are wrong because they always hit the fast path
after the first run.
2017-03-14 14:20:24 -06:00
Jason Wilder ca9c67a877 Generate encode*Values funcs 2017-03-14 11:54:53 -06:00
Jason Wilder 2f7d4995b4 Use typed values to avoid allocations
This switches compactions to use type values (FloatValues) from the
generic Values type.  It avoids a bunch of allocations where each value
much be converted from a specific type to an interface{}.
2017-03-09 16:27:07 -07:00
Jason Wilder 78b7815c49 Add block type for BlockIterator 2017-03-09 09:16:59 -07:00
Jason Wilder b9e5375043 Merge branch '1.2' into jw-merge-12 2017-03-08 13:16:50 -07:00
Jason Wilder 37187cbe6d Delete series under fields lock
Still seeing the panic that switching this logic around was supposed
to fix.  We now delete the bulk of data outside of the fields lock
and then again, under the write lock, to ensure that the field mapping
is accurate.  We don't do the full delete under the lock because it
can block writes and queries that require a read lock.
2017-03-06 14:19:55 -07:00
Jason Wilder 675d7c9d65 Merge branch '1.2' into jw-merge12 2017-03-06 11:09:05 -07:00
Jason Wilder eab012ef61 Fix points missing after compaction
If blocks containing overlapping ranges of time where partially
recombined, it was possible for the some points to get dropped
during compactions.  This occurred because the window of time of
the points we need to merge did not account for the partial blocks
created from a prior merge.

Fixes #8084
2017-03-06 10:17:11 -07:00
Jason Wilder 3c70abf061 Delete series before remove from field index
There is a race where the field type can be deleted while a new type
is written and during a query.  When this happens, an iterator for
the new type is created but old data make still exist in the cache
for TSM files causing a panic.
2017-03-06 09:38:27 -07:00
Jason Wilder 29f8d8de76 Fix race in WALEntry.Encode and Value.Deduplicate
Under high query load, a race exists in the cache and the WAL.  Since
writes currently hit the cache first, they are availble for query before
they hit the WAL.  If the WAL is writing and accessign the Value slice
at the same time that a query is run that needs to dedup the same slice,
a race occurs.

To fix this, the cache now just copies the values instead of storing the
slice passed in.  Another way to fix this might be to have the writes go
to the wal before the cache.  I think the latter would be better, but it
introduces some larger write path issues that we'd need to also address.
e.g. if the cache was full, writes to the WAL would need to be rejected
to avoid filling the disk.

Copying the slice in the cache is simpler for now and does not appear to
dramatically affect performance.
2017-03-06 09:38:22 -07:00
Jason Wilder a024003f2c Merge branch '1.2' into jw-merge-12 2017-02-22 12:13:29 -07:00
Ben Johnson 78a9bb2527 Remove Tags.shouldCopy, replace with forceCopy on series creation.
Previously, tags had a `shouldCopy` flag to indicate if those tags
referenced an underlying buffer and should be copied to allow GC.
Unfortunately, this prevented tags from being copied that were
created and referenced the mmap which caused segfaults.

This change removes the `shouldCopy` flag and replaces it with a
`forceCopy` argument in `CreateSeriesIfNotExists()`. This allows
the write path to indicate that tags must be cloned on insert.
2017-02-21 11:13:35 -07:00
Mark Rushakoff 601cbcd084 Merge branch '1.2' into mr-merge-12 2017-02-17 16:14:22 -08:00
Jonathan A. Sternberg 2fe48d6781 Rename zap import back to github.com/uber-go/zap
They rebased a revision we were previously relying upon that allowed us
to use the vanity name so we are reverting back to an older version with
the old import path.
2017-02-17 17:17:22 -06:00
Ben Johnson 673143a0ad
Remove .tsl file. 2017-02-15 08:44:01 -07:00
Jason Wilder 4b6289ce58 Merge pull request #7942 from influxdata/jw-cache-partitions
Reduce write timeouts
2017-02-10 10:07:08 -07:00
Edd Robinson 38eb6d5994 Don't load meta data for tsi 2017-02-09 18:04:23 +00:00
Edd Robinson a6a2f9d5f0 Don't load meta data for tsi 2017-02-09 17:59:14 +00:00
Jason Wilder 2f74e3f3d5 Use simple8b.CountBytes to avoid allocations 2017-02-09 10:47:03 -07:00
Jason Wilder 1bc0f68490 Merge branch '1.2' into jw-merge-12 2017-02-07 12:48:36 -07:00
Jonathan A. Sternberg e1fa48d0dd Fix ORDER BY time DESC with ordering series keys
The order of series keys is in ascending alphabetical order, not
descending alphabetical order, when it is ordered by descending time.
This fixes the ordering so points are returned in descending order. The
emitter also had the conditions for choosing which iterator to use in
the wrong direction (which only affects aggregates with `FILL(none)`).
2017-02-06 15:49:12 -06:00
Jonathan A. Sternberg 95831b3307 Fix LIMIT and OFFSET when they are used in a subquery
This fixes LIMIT and OFFSET when they are used in a subquery where the
grouping of the inner query is different than the grouping of the outer
query. When organizing tag sets, the grouping of the outer query is
used so the final result is in the correct order. But, unfortunately,
the optimization incorrectly limited the number of points based on the
grouping in the outer query rather than the grouping in the inner query.

The ideal solution would be to use the outer grouping to further
organize it by the grouping for the inner subquery, but that's more
difficult to do at the moment. As an easier fix, the query engine now
limits the output of each series. This may result in these types of
queries being slower in some situations like this one:

    SELECT mean(value) FROM (SELECT value FROM cpu GROUP BY host LIMIT 1)

This will be slower in a situation where the `cpu` measurement has a
high cardinality and many different tags.

This also fixes `last()` and `first()` when they are used in a subquery
because those functions use `LIMIT 1` as an internal optimization.
2017-02-06 14:04:34 -06:00
Jason Wilder 93a9d01643 Increase default waiting WAL writes 2017-02-06 11:48:51 -07:00
Jason Wilder 38a649fc40 Batch multiple WAL fsyncs
Every write to the WAL current runs and fsync before returning.  When
there are lot of concurrent writes, this can cause the WAL to bottleneck
write throughput since fsyncs are very expensive.

This changes the writeToLog to fsync on an interval to allow multiple fsyncs
calls to be batched up into one.  The writeToLog behavior is the same in that
it won't return until an fsync has been performed.
2017-02-06 11:48:45 -07:00
Ben Johnson d91e6eabac
Add max-values-per-tag to inmem index. 2017-02-06 11:14:13 -07:00
Ben Johnson 57f44d5f0c
Include index in snapshot. 2017-02-01 14:19:42 -07:00
Jason Wilder 54ab3a7a0a Don't write lock file store when opening new files
When replacing TSM files, the new files can be opened before
the write lock is taken to reduce lock contention in this code
path.
2017-02-01 11:11:26 -07:00
Jason Wilder 6eb46d2100 Remove unnecessary read lock on engine 2017-02-01 11:10:41 -07:00
Jason Wilder 784a851742 Release cpu during compactions 2017-01-31 17:04:36 -07:00
Jason Wilder 278c1449d6 Increase number of cache partitions 2017-01-31 16:49:57 -07:00
Ben Johnson 047c21f4d9
Merge remote-tracking branch 'upstream/master' into tsi 2017-01-24 09:28:58 -07:00
Edd Robinson feb7a2842c Use unbuffered error channels in tests 2017-01-17 10:53:15 -08:00
Edd Robinson fb7388cdfc Remove dead code from various pkgs 2017-01-17 09:47:34 -08:00
Edd Robinson 292b30b82b Fix subtle bugs and remove dead code from tsdb 2017-01-17 09:47:34 -08:00
Joe LeGasse bf58d9ffb7 Update backup to use ioutil.ReadDir 2017-01-12 16:28:01 -05:00
Jason Wilder 11f264563a Fix 32bit alignment 2017-01-12 12:01:49 -07:00
Jason Wilder 06a8fd6ca2 Simplifications and cleanup 2017-01-12 09:55:38 -07:00
Edd Robinson 73ed864e1d Add cache tests 2017-01-12 16:27:16 +00:00
Jason Wilder 1e56b5416b Fix compactions sometimes getting stuck
I ran into an issue where the cache snapshotting seemed to stop
completely causing the cache to fill up and never recover.  I believe
this is due to the the Timer being reused incorrectly.  Instead,
use a Ticker that will fire more regularly and not require the resetting
logic (which was wrong).
2017-01-11 17:57:40 -07:00
Jason Wilder 40b017f4a4 Fix Cache stats size collection
The memory stats as well as the size of the cache were not accurate.
There was also a problem where the cache size would be increased
optimisitically, but if the cache size limit was hit, it would not
be decreased.  This would cause the cache size to grow without
bounds with every failed write.
2017-01-11 17:54:51 -07:00
Jason Wilder c433ff331f Encode snapshots concurrently
The CacheKeyIterator (used for snapshot compactions), iterated over
each key and serially encoded the values for that key as the TSM
file is written.  With many series, this can be slow and will only
use 1 CPU core even if more are available.

This changes it so that the key space is split amongst a number of
goroutines that start encoding all keys in parallel to improve
throughput.
2017-01-11 17:54:27 -07:00
Jason Wilder ae838ef323 Simplify Cache.Snapshot
This simplifies the cache.Snapshot func to swap the hot cache to
the snapshot cache instead of copy and appending entries.  This
reduces the amount of time the cache is write locked which should
reduce cache contention for the read only code paths.
2017-01-11 11:12:02 -07:00
Jonathan A. Sternberg 3ba950b029 Fix for subqueries to use the parallel iterator correctly
Also, fix the `Iterators.Merge(IteratorOptions)` function so it consults
the `Ordered` attribute to determine which iterator it should use to
merge the input iterators.
2017-01-11 10:47:18 -06:00
Jonathan A. Sternberg b58d1778e2 Remove improper newlines from logging statements 2017-01-10 11:20:09 -06:00
Mark Rushakoff a135906b43 Merge pull request #7747 from influxdata/mr-lint-cleanup
Miscellaneous lint cleanup
2017-01-10 08:22:00 -08:00
Mark Rushakoff 3b3604e362 Fix race in (*tsm1.Cache).values
Without this read lock, this race would happen during a concurrent
snapshot compaction and query.
2017-01-09 14:48:28 -08:00
Jonathan A. Sternberg 4a559c4620 Merge pull request #7646 from influxdata/js-4619-subqueries
Support subquery execution in the query language
2017-01-09 14:14:01 -06:00
Jason Wilder eb4d311c0a Add retry/backup when backing up a shard fails
The backup command can fail if a snapshot is running which silently
closes the connection.  This causes the backup shard command to continue
on as if nothing failed.
2017-01-09 11:28:48 -07:00
Jason Wilder 194c5adfaf Fix race on t.refs
Read at 0x00c42018f620 by goroutine 58:
  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*TSMReader).Close()
      /root/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/reader.go:330 +0x94
  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Close()
      /root/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/file_store.go:464 +0x123

Previous write at 0x00c42018f620 by goroutine 63:
  sync/atomic.AddInt64()
      /usr/local/go/src/runtime/race_amd64.s:276 +0xb
  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*TSMReader).Unref()
      /root/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/reader.go:352 +0x43
  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*KeyCursor).Close()
2017-01-07 12:39:45 -07:00
Jonathan A. Sternberg d7c8c7ca4f Support subquery execution in the query language
This adds query syntax support for subqueries and adds support to the
query engine to execute queries on subqueries.

Subqueries act as a source for another query. It is the equivalent of
writing the results of a query to a temporary database, executing
a query on that temporary database, and then deleting the database
(except this is all performed in-memory).

The syntax is like this:

    SELECT sum(derivative) FROM (SELECT derivative(mean(value)) FROM cpu GROUP BY *)

This will execute derivative and then sum the result of those derivatives.
Another example:

    SELECT max(min) FROM (SELECT min(value) FROM cpu GROUP BY host)

This would let you find the maximum minimum value of each host.

There is complete freedom to mix subqueries with auxiliary fields. The only
caveat is that the following two queries:

    SELECT mean(value) FROM cpu
    SELECT mean(value) FROM (SELECT value FROM cpu)

Have different performance characteristics. The first will calculate
`mean(value)` at the shard level and will be faster, especially when it comes to
clustered setups. The second will process the mean at the top level and will not
include that optimization.
2017-01-07 13:00:48 -06:00
Mark Rushakoff 153277c01d Merge pull request #7786 from influxdata/mr-cache-decrease-size
Use one atomic operation in (*Cache).decreaseSize
2017-01-06 10:17:01 -08:00
Ben Johnson 2b3cd415e2
Fixing rebase. 2017-01-06 09:52:16 -07:00
Ben Johnson d1f1e19591
Fixing rebase. 2017-01-06 09:31:25 -07:00
Ben Johnson c1c98223ec
Fix and optimize tsi1 FileSet. 2017-01-05 10:17:12 -07:00
Ben Johnson 9b1e8215e0
Remove dictionary encoding, add bulk series insertion. 2017-01-05 10:17:11 -07:00
Ben Johnson 9bd19cdc69
Fix inmem DELETE SERIES. 2017-01-05 10:17:11 -07:00
Ben Johnson f9efcb3365
Re-add shared in-memory index. 2017-01-05 10:17:09 -07:00
Edd Robinson 0f9b2bfe6a
Fix tests 2017-01-05 10:16:15 -07:00
Edd Robinson 4ccb8dbab1
Move series count check to shard 2017-01-05 10:16:13 -07:00
Ben Johnson 745b1973a8
tsi compaction 2017-01-05 10:15:37 -07:00
Ben Johnson 183418dcbd
Fix tsi TAG KEYS iterator. 2017-01-05 10:15:36 -07:00
Ben Johnson 9f8b206b51
Fix measurement system queries. 2017-01-05 10:15:34 -07:00
Ben Johnson 4aa78383d1
Fix tsi1 series deletion. 2017-01-05 10:14:48 -07:00
Ben Johnson e7940cc556
Add tsi1 series system iterator. 2017-01-05 10:14:00 -07:00
Ben Johnson 87f4e0ec0a
Add regex support in tsi1. 2017-01-05 10:12:29 -07:00
Jason Wilder 1ba64f3610
Disable max-value-per-tag option temporarily
This is too slow currently and causes all writes to timeout.
2017-01-05 10:11:47 -07:00
Ben Johnson fbe7f464ee
Improve insert performance. 2017-01-05 10:11:12 -07:00
Ben Johnson cb93f10120
Remove per-shard in-memory index. 2017-01-05 10:11:09 -07:00
Ben Johnson 409b0165f5
shared in-memory index 2017-01-05 10:09:57 -07:00
Ben Johnson a812502ea3
reintegrating in-memory index 2017-01-05 10:07:35 -07:00
Ben Johnson 1ac067e53b
intermediate 2017-01-05 10:03:09 -07:00
Ben Johnson fda84955ea
Remove TODO 2017-01-05 10:02:42 -07:00
Ben Johnson 62d2b3ebe9
Series filtering. 2017-01-05 10:02:42 -07:00
Ben Johnson 62269c3cea
intermediate 2017-01-05 10:02:41 -07:00
Ben Johnson 5f5b02e052
intermediate 2017-01-05 10:01:49 -07:00
Ben Johnson 8863e3c0f3
Refactor tsi1 merge iterators, finish multi-file compaction. 2017-01-05 10:01:25 -07:00
Ben Johnson e3af4b0dad
Refactor iterators. 2017-01-05 10:00:45 -07:00
Ben Johnson ce9e3181a5
Refactor merge iterators. 2017-01-05 10:00:45 -07:00
Ben Johnson 0294e717a0
Add mm, tag key, tag value, & series iterators. 2017-01-05 10:00:44 -07:00
Ben Johnson 2bfafaed76
tsi1 log compaction 2017-01-05 10:00:44 -07:00
Ben Johnson afce53e81b
Rebase fixes. 2017-01-05 10:00:44 -07:00
Ben Johnson 992e651588
Add tsi1.Log. 2017-01-05 10:00:44 -07:00
Ben Johnson 2a81351992
Implement tsdb.Index interface on tsi1.Index. 2017-01-05 10:00:43 -07:00
Edd Robinson ebc92ca04f
Fix overflow issues 2017-01-05 09:59:12 -07:00
Edd Robinson 149b1cef1d
Fix 32bit overflow; limit capacity 2017-01-05 09:59:10 -07:00
Edd Robinson 9ed6040265
Tidy up 2017-01-05 09:58:37 -07:00
Edd Robinson 2a5c865b44
Use xxhash 2017-01-05 09:57:35 -07:00
Edd Robinson 2d9bd09784
Use []byte where possible in Index 2017-01-05 09:57:34 -07:00
Edd Robinson 4b1ef68dc9
Move series and measurement stats to store 2017-01-05 09:54:05 -07:00
Edd Robinson bd8dd9a291
Sketches working 2017-01-05 09:54:04 -07:00
Edd Robinson d19fbf5ab4
Wire in HLL estimator 2017-01-05 09:54:03 -07:00
Edd Robinson 2b8efefef4
Initial index interface 2017-01-05 09:51:43 -07:00
Edd Robinson 05bc4dec00
Refactor 2017-01-05 09:50:23 -07:00
Edd Robinson c535e3899a
Remove in-memory index from Shard and Store 2017-01-05 09:47:09 -07:00
Ben Johnson 57d0556174
Fix 32-bit issues. 2017-01-05 09:34:37 -07:00
Ben Johnson 41f2babe66
Minor TSI index benchmark refactor 2017-01-05 09:34:37 -07:00
Ben Johnson ac9c6a0207
Add TSI index benchmark. 2017-01-05 09:34:37 -07:00
Ben Johnson 8d40ceb00c
TSI1 Index 2017-01-05 09:34:36 -07:00
Ben Johnson 9b62df23d2
Add MeasurementBlock. 2017-01-05 09:34:36 -07:00
Ben Johnson 3240af07e0
Fix RHH packing. 2017-01-05 09:34:36 -07:00
Ben Johnson e25d61e4bd
TagSet writer & reader. 2017-01-05 09:34:36 -07:00
Ben Johnson 4eeb81ef38
Add SeriesList tombstoning. 2017-01-05 09:34:36 -07:00
Ben Johnson 2c34b24f5c
Implemented SeriesList 2017-01-05 09:34:36 -07:00
Ben Johnson 6523675c20
Implemented RHH hash map. 2017-01-05 09:34:35 -07:00
Mark Rushakoff 6a94d200c8 Merge remote-tracking branch 'influx/master' into mr-godoc 2017-01-04 13:27:36 -08:00
Mark Rushakoff 89a587e865 Use one atomic operation in (*Cache).decreaseSize
The previous implementation was susceptible to a race condition (of
correctness) since c.decreaseSize is called without a lock in
(*Cache).WriteMulti.

There were already tests which asserted the correctness of the result of
decreaseSize, so no tests were added or modified.
2017-01-04 13:13:31 -08:00
Cory LaNou 3c518f8927
panicing is bad -> error returns are good 2017-01-03 14:28:29 -06:00
Mark Rushakoff 07b87f2630 Miscellaneous lint cleanup 2017-01-03 09:47:32 -08:00
Mark Rushakoff 41415cf2fb Update godoc for tsm1 package 2017-01-02 07:30:18 -08:00
Gustav Westling 26b33307ae
Resolved PR comments on test files 2016-12-30 11:42:38 +01:00
Gustav Westling 56d98325da
Removed ineffective assignments, and added checks for errors that previsouly was not checked 2016-12-29 20:26:15 +01:00
Jason Wilder 2468347ffb Fix comment 2016-12-19 14:17:49 -07:00
Jason Wilder 326557e539 Fix race in partition.reset 2016-12-19 14:17:01 -07:00
Jason Wilder e91e45d71c Fix panic in cache benchmark 2016-12-19 14:17:01 -07:00
Jason Wilder 0b6b9ea1cb Use atomics for cache.snapshotSize stat 2016-12-19 14:17:01 -07:00
Jason Wilder 637a67ea35 Reduce lock contention on measurementFields 2016-12-19 14:17:01 -07:00
Jason Wilder b7c1e625b0 Move needSort tracking to Deduplicate
This eliminates some *UnixNano() calls and also simplifies the cache
logic so that it does not need to worry about whether entries are
sorted.
2016-12-19 14:17:01 -07:00
Jason Wilder dea87703cd Reduce UnixNano pointer call 2016-12-19 14:17:01 -07:00
Mark Rushakoff 722b6345fe Fix unchecked error in templated Read${TYPE}Block 2016-12-19 09:31:26 -08:00
Jonathan A. Sternberg ec57108520 Use proper uber-go/zap import path
It looks like the real import path to the project is go.uber.org/zap
instead of github.com/uber-go/zap since the example in the project
references that path.
2016-12-15 08:54:14 -06:00
Edd Robinson ec27c57127 Further optimisations and a race fix 2016-12-14 18:23:36 +00:00
Edd Robinson 05ec6ad9ad Add to index safely 2016-12-14 18:23:36 +00:00
Edd Robinson d78ca1a0f3 Fix some races 2016-12-14 18:23:36 +00:00
Edd Robinson d2923c7bf9 Add hints as to how to pre-allocate entry values
Currently, whenever a snapshot occurs the Cache is reset and so many
allocations are repeated, as the same type of data is re-added to
the Cache.

This commit allows the stores to keep track of the number of values
within an entry, and use that size as a hint when the same entry needs
to be recreated after a snapshot.

To avoid hints persisting over a long period of time they are deleting
after every snapshot, and rebuilt using the most recent entries only.
2016-12-14 18:23:36 +00:00
Edd Robinson f2b5c7f5be Reduce contention when adding entries 2016-12-14 18:23:36 +00:00
Edd Robinson 98f0392ca6 Update size using atomic 2016-12-14 18:23:36 +00:00
Edd Robinson 66edb32182 Sharded Cache using a hash ring 2016-12-14 18:23:36 +00:00
Edd Robinson d3e6d4e7ca Add benchmarks 2016-12-14 18:21:50 +00:00
Jonathan A. Sternberg 21502a39e8 Switch logging to use structured logging everywhere
The logging library has been switched to use uber-go/zap. While the
logging has been changed to use structured logging, this commit does not
change any of the logging statements to take advantage of the new
structured log or new log levels. Those changes will come in future
commits.
2016-12-14 10:45:15 -06:00
Jason Wilder 4f28c90b54 Optimize Value.Deduplicate
Deduplicate is called from various places in the engine and can cause
a lot of garbage to get created.  It first creates a map and then
adds each value to the map in order (1st alloc).  It then creates a
new slice (2nd alloc) and appends everything from the map to the slice.
Finally, it sorted the new slice (3rd alloc).

This switches the algorithm to use stable sorting and resuing the existing
slice to avoid allocations.
2016-12-08 21:10:56 -07:00
Hrvoje Marjanovic 9483b8b409 gofmt 2016-12-03 22:06:38 +01:00
Hrvoje Marjanovic 6ed708e3fd Reduce pool size, change WAL writers default
Big pool can lead to huge memory usage in certain loads.

See #7640 for detailed discussion.
2016-12-02 18:45:43 +01:00
Allen Petersen 31129ab0e9 Use slash separator for filenames in tar archives
NO-OP on platforms with unix path separator.
On Windows paths get converted to slashes before adding to archive and back to backslashes during restore.
2016-11-29 09:44:08 -08:00
Jason Wilder e8a28cfbab Expose Shard.LastModified
This returns the LastModified time of the shard.  The LastModified
time is the wall time when a change to the shards state occurred.
It uses the WAL or FileStore to determine the max mod time.
2016-11-23 10:04:07 -07:00
Jason Wilder 3a5a01181b Switch all Value types from pointers 2016-11-15 16:13:55 -07:00
Jason Wilder 0ee58c208a Switch time.Sleep to time.Ticker
Avoids an allocation when calling time.Sleep
2016-11-15 16:13:55 -07:00
Jason Wilder 73b8f52ca0 Cache results onf findGenerations
This allocates quite a bit and it's called multiple times per
second per shard.  The generations don't change until a compaction
has occurred so most of the time is re-calculating the same thing
and creating garbage.
2016-11-15 16:13:55 -07:00
Jason Wilder 0b6f5441b9 Add config option to messages when limits exceeded
When a limit is exceeded, we return errors and sometimes log (if appropriate)
that a limit was exceeded.  The messages don't always provide an indication
as to where or how they are configured.

Instead, return the config option (easily searchable for) as well as the limit
currently set and the value that exceeded it when possible.
2016-10-28 14:54:45 -06:00
Jason Wilder b1ceb5e66d Add cache write OK, Dropped, Error stats
Adds a new dropped stat as well as fixes OK and error stats not
actually get collected and stored.
2016-10-28 12:15:50 -06:00
Jason Wilder 873189e0c2 Fix panic: interface conversion: tsm1.Value is *tsm1.FloatValue, not *tsm1.StringValue
If concurrent writes to the same shard occur, it's possible for different types to
be added to the cache for the same series.  The way the measurementFields map on the
shard is updated is racy in this scenario which would normally prevent this from occurring.
When this occurs, the snapshot compaction panics because it can't encode different types
in the same series.

To prevent this, we have the cache return an error a different type is added to existing
values in the cache.

Fixes #7498
2016-10-28 12:15:50 -06:00
Jason Wilder e388912b6c Fix race in findGenerations
The file store stats slice is re-used which causes the race below:

WARNING: DATA RACE
Write at 0x00c42007e140 by goroutine 43:
  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Stats()
      /Users/jason/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/file_store.go:511 +0x22e
  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*DefaultPlanner).findGenerations()
      /Users/jason/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/compact.go:461 +0x6f
  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*DefaultPlanner).PlanLevel()

Previous read at 0x00c42007e140 by goroutine 40:
  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*DefaultPlanner).findGenerations()
      /Users/jason/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/compact.go:463 +0x13d
  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*DefaultPlanner).PlanOptimize()
2016-10-28 12:15:49 -06:00
Steven Hartland 3f16197243 Improve tsm1 cache performance
Reduce the cache lock contention by widening the cache lock scope in WriteMulti, while this sounds counter intuitive it was:
* 1 x Read Lock to read the size
* 1 x Read Lock per values
* 1 x Write Lock per values on race
* 1 x Write Lock to update the size

We now have:
* 1 x Write Lock

This also reduces contention on the entries Values lock too as we have the global cache lock.

Move the calculation of the added size before taking the lock as it takes time and doesn't need the lock.

This also fixes a race in WriteMulti due to the lock not being held across the entire operation, which could cause the cache size to have an invalid value if Snapshot has been run in the between the addition of the values and the size update.

Fix the cache benchmark which where benchmarking the creation of the cache not its operation and add a parallel test for more real world scenario, however this could still be improved.

Add a fast path newEntryValues values for the new case which avoids taking the values lock and all the other calculations.

Drop the lock before performing the sort in Cache.Keys().
2016-10-25 15:24:51 -06:00
Jonathan A. Sternberg a515aeda39 Optimize first/last when no group by interval is present
The `first()` and `last()` functions response rate would increase linear
to the number of points even though it seems like it shouldn't. This
optimization greatly reduces the amount of time to return a response
when no `GROUP BY time(...)` clause is present in a query.
2016-10-25 09:57:31 -05:00
Jason Wilder 686d1a7ba4 Remove unused config options 2016-10-24 15:32:38 -06:00
Edd Robinson 0ee093f1fb Memoize output of FileStore.Stats 2016-10-24 10:23:20 -06:00
Jonathan A. Sternberg 3681bc8a43 Filter out series within shards that do not have data for that series
Previously, we would return a full tag set for every shard and the tag
set would include all series that existed in the database index
including series that didn't physically exist within that shard. This
led to the tag sets returned being incredibly huge when we had high
cardinality but sparse data. Since the data was sparse, it was
unexpected that it would cause such a large strain on the system by most
people.

Now we filter out the series ids that are not assigned to the current
shard when computing a tag set for that shard. This lowers the memory
usage for high cardinality sparse data drastically and allows queries on
those to complete successfully.

This does not resolve issues for high cardinality data in every shard
that is also spread out over a long series of time. That situation isn't
nearly as common as the above situation though.
2016-10-20 14:15:34 -05:00
Jason Wilder b50d9558cf Merge pull request #7479 from influxdata/jw-clean-err
Skip cleanup if dir does not exist
2016-10-18 15:49:09 -06:00
Jason Wilder f30b00c24f Skip cleanup if dir does not exist 2016-10-18 15:33:39 -06:00
Mark Rushakoff 377c40f122 Add stats for active compactions
Unify logic around compaction execution to a single place.

Also report on the error stats that we track. Previously they were not
emitted in the stats output.
2016-10-18 14:12:21 -07:00
Joe LeGasse de9c743004 TSM: update comments for disabling level compactions 2016-10-18 14:14:59 -06:00
Joe LeGasse eda8f70372 TSM: Handle concurrent deletes for compaction 2016-10-18 14:14:59 -06:00
Jason Wilder 47b8049e48 Update comment 2016-10-18 14:14:53 -06:00
Jason Wilder ed7975874f Rename Enabled -> Enable 2016-10-18 12:22:00 -06:00
Jason Wilder f254b4f3ae Allow snapshot compactions during deletes
If a delete takes a long time to process while writes to the
shard are occuring, it was possible for the cache to fill up
and writes to be rejected.  This occurred because we disabled
all compactions while writing tombstone file to prevent deleted
data from re-appearing after a compaction completed.

Instead, we only disable the level compactions and allow snapshot
compactions to continue.  Snapshots already handle deleted data
with the cache and wal.

Fixes #7161
2016-10-18 12:14:51 -06:00
Jason Wilder 798fa0a9f8 Return error with unknown field type
This will just panic when trying to snapshot the value because EmptyValue
can't be written to TSM files.
2016-10-03 16:30:21 -06:00
Jason Wilder 125f106956 Pre-size the values map when write points 2016-10-03 16:30:21 -06:00
Joe LeGasse 743946fafb models: Add FieldIterator type
The FieldIterator is used to scan over the fields of a point, providing
information, and delaying parsing/decoding the value until it is needed.
This change uses this new type to avoid the allocation of a map for the
fields which is then thrown away as soon as the points get converted
into columns within the datastore.
2016-10-03 16:30:21 -06:00
Jason Wilder 20f1fb3f7f Replace gotos with anonymous functions 2016-10-03 12:08:53 -06:00
Jason Wilder 750c8b3932 Reduce lock contention in cache.Values
The cache read lock was held for the whole duration of the call when it
only needs to be held at the beginning since entries have their
own locks.
2016-10-03 10:21:54 -06:00
Jason Wilder 1b462312a9 Re-use decoder pools
The decoders were held onto each iterator to avoid creating them all
the time.  Some of them have use quite a bit of memory so they can
be expensive to create when querying across many series.

Intead, more them to a re-usable pool where we create the minimum that
could active be in use.  This reduces garbage as well as makes the iterators
less expensive to create.
2016-10-03 10:21:54 -06:00
Jason Wilder a15a416eaa Fix decoding RLE integer blocks with negative deltas
Integer blocks that were run length encoded could produce the wrong
value when read back out because the deltas were not zig zag decoded
before scaling the final value.  If the deltas were negative, as would
be seen in a counter that decrements by a constant value, the results
would be random with som negative and positive values.

Fixes #7391
2016-10-02 23:51:29 -06:00
Jason Wilder dcb65865a2 Merge pull request #7376 from influxdata/jw-revert
Revert re-using byte slices during compactions
2016-09-28 08:24:35 -06:00
joelegasse 87ecd97e7b Merge pull request #7371 from influxdata/2016-09-27--rw--use-gotos-for-encoding-cleanup
Gotos to simplify uses of the new encoder pools.
2016-09-28 08:57:33 -04:00
Jason Wilder 1755f20d2a Revent re-using byte slices during compactions
This is causing a fatal error: fault panic when packing blocks.
2016-09-27 23:41:06 -06:00
rw c3fc87b619 Remove dangling named return value. 2016-09-27 14:18:32 -07:00
rw fcd425c8c6 Incorporate style feedback from Joe. 2016-09-27 14:07:06 -07:00
rw 47c1c6763c Use encoder reset to save on allocs. 2016-09-27 13:31:35 -07:00
rw 9429a2f96a Gotos to simplify uses of the new encoder pools.
For maintainability.
2016-09-27 11:47:25 -07:00
rw f131d3cc77 Fix off-by-one error that could panic. 2016-09-26 17:03:03 -07:00
Jason Wilder 4b5d989905 Merge pull request #7335 from influxdata/jw-tsm-syscalls
Avoid stat syscall when planning compactions
2016-09-26 12:30:05 -06:00
Jason Wilder 139ef8062e Simplify encoder buffer usage 2016-09-26 12:19:16 -06:00
Jason Wilder 658149a6ff Removed commented out code 2016-09-26 12:19:15 -06:00
Jason Wilder 7f96d78b79 Make encoder re-usable
This allows encoders to be re-used and maintained in a pool to
avoid allocating new ones on every compactions and write of an encoded
block.  The pool used is not a sync.Pool to ensure that the encoders
will not be garbage collected.
2016-09-26 12:19:15 -06:00
Jason Wilder 0401527093 Pre-allocate cache store and entries
These were not sized so they always had to be grown causing
garbage to be created.
2016-09-26 12:19:15 -06:00
Jason Wilder 730ceeea46 Re-used allocated byte slices during compactions 2016-09-26 12:19:15 -06:00
Jason Wilder c2cfd63091 Avoid stat syscall when planning compactions
When the planner runs, it needs to determine if any files have tombstones.
The code to determine if a tombstone existed involved stating the .tombstone
file.  Since the planner runs very frequently when there are many shards, this
causea a lot of system calls that are unnecessary.

Instead, cache the results of the stats calls and only refresh them when we
haven't checked at least once or we write new tombstone data.

This also caches the results of the TSMReader.Stats call to avoid creating
garbage.
2016-09-24 15:53:28 -06:00
Jason Wilder 95682faec2 Merge branch '1.0' into jw-merge-10 2016-09-08 09:00:51 -06:00
Jason Wilder 1a35c0a3fc Fix neverending full compactions
The full compaction planner could return a plan that only included
one generation.  If this happened, a full compaction would run on that
generation producing just one generation again.  The planner would then
repeat the plan.

This could happen if there were two generations that were both over
the max TSM file size and the second one happened to be in level 3 or
lower.

When this situation occurs, one cpu is pegged running a full compaction
continuously and the disks become very busy basically rewriting the
same files over and over again.  This can eventually cause disk and CPU
saturation if it occurs with more than one shard.

Fixes #7074
2016-09-03 17:35:14 -06:00
Jason Wilder a6f6fda415 Fix DeleteSeries when multiple fields exists
The logic for determining whether a series key was already in the
the set of TSM series was too restrictive.  It allowed only the first
field of a series to be added leaving all the remaing fields.
2016-08-31 20:53:10 -06:00
Jason Wilder 190537a557 Fix DeleteSeries when multiple fields exists
The logic for determining whether a series key was already in the
the set of TSM series was too restrictive.  It allowed only the first
field of a series to be added leaving all the remaing fields.
2016-08-31 20:35:35 -06:00
Jonathan A. Sternberg dc2527ce86 Merge branch '1.0' 2016-08-31 14:45:57 -05:00
Jason Wilder 3d411371f2 Merge pull request #7233 from influxdata/jw-stats2
Write path stats
2016-08-29 10:15:23 -06:00
Jason Wilder d878d30d18 Fix shard write stats
* Rename *Fail to *Err for consistency with other metrics
* Use index Series count instead of sepaate counter
2016-08-29 09:46:11 -06:00
Jason Wilder e203323776 Add wal write success/error stats 2016-08-29 09:38:48 -06:00
Jason Wilder 83ca8c3867 Decrement cache memory stat when deleting series 2016-08-29 09:38:41 -06:00
Jason Wilder 03326f993f Add cache write success/error stats 2016-08-29 09:38:32 -06:00
Jason Wilder b31bf798f1 Fix runtime: goroutine stack exceeds 1000000000-byte limit
Fixes #7225
2016-08-29 09:26:48 -06:00
Jonathan A. Sternberg 8b234546a8 Merge pull request #7204 from influxdata/1.0
Merge 1.0 branch to master
2016-08-25 15:20:30 -05:00
Jonathan A. Sternberg 10029caf2f Support negative timestamps in the query engine
Negative timestamps are now supported. We also now refuse two
nanoseconds that are at the edge of the minimum time window. One of the
nanoseconds we do not accept is because we need MinInt64 to be used for
some internal comparisons in the TSM engine and it was causing an
underflow when we subtracted one from the minimum time. The second is so
we can have one minimum time that signifies the default minimum that
nobody can write to (so we can implicitly rewrite the timestamp on
aggregate queries) but still use the explicit timestamp if it is given
to us by the user. We aren't able to tell the difference between if the
user provided it or if it was implicit without those values being
different.

If the default minimum time is used with an aggregate query, we rewrite
the time to be the epoch for backwards compatibility since we believe
that's more important than supporting that extra nanosecond.
2016-08-25 12:52:41 -05:00
Ben Johnson cc628a1097
Fix mmap dereferencing
Adds a missing dereference call to `Close()` as well as fixes
a tag copy issue.
2016-08-24 10:48:07 -06:00
Edd Robinson 90ff713f21 Fix base64 encoding issue in stats
Fixes #7177.
2016-08-22 15:21:31 +01:00
Ben Johnson 8aa224b22d
reduce memory allocations in index
This commit changes the index to point to index data in the shards
instead of keeping it in-memory on the heap.
2016-08-16 14:09:00 -06:00
Jason Wilder 0ea645642b Remove compaction assert that should not be there
This assert was not removed when the issue that cause the assert
to trigger was fixed in 0f5e994.

Fixes #7121
2016-08-08 09:59:45 -06:00
Jason Wilder 19546faab3 Release cursor/iterator resources aggressively 2016-08-03 00:21:39 -06:00
Jason Wilder e8e6bc44a7 Remove defers in TSM reader read path 2016-08-02 16:39:45 -06:00
Jason Wilder 5576e7fedb Simplifications 2016-07-28 20:25:37 -06:00
Jason Wilder 8367771d35 Fix go vet 2016-07-28 20:25:37 -06:00
Jason Wilder 030f1ef622 Include full for tombstone files
The path info only contained the file name which caused tombstone
files to not be removed if there were queries running against
a file that was compacted.

This is now consistent with the TSMReader.Path which returns the
full path info.
2016-07-28 20:25:37 -06:00
Jason Wilder c3fda24cf9 Make sure all in-use files are tracked
break cause the first one to be tracked and all others would
leak as temp files that would not be removed until the server
restarted.
2016-07-28 20:25:37 -06:00
Jason Wilder c1a94e8861 Remove temp TSM files when disabling compactions
If they were left around, re-enabling them again could cause
future compactions to continuously fail.  A restart of the
server would clean them up correctly though.
2016-07-28 20:25:37 -06:00
Jason Wilder 602a2e80ce Ensure aux and cond cursors are closed when iterator is closed 2016-07-28 20:25:37 -06:00
Jason Wilder 5764a730d5 Prevent tombstoning series keys more than once
If there were multiple TSM files and a delete/drop was run,
we would write the delete series to the tombstone file N
times for each file.  This occurred because FileStore.WalkKeys walks
every key in every TSM file which can return duplicate keys.

This issue caused TSM files to be much larger than they should be
and also cause large memory usage during the delete.
2016-07-28 20:25:36 -06:00
Jason Wilder ef8ecf0e90 Apply reload tombstones in batches
This keeps some memory bounds when reloading a TSM files tombstones
so that the heap does not grow exceedintly fast and stay there
after the deletes are applied.
2016-07-28 20:25:36 -06:00
Jason Wilder 4436e65fb9 Apply deletes to TSM files concurrently 2016-07-28 20:25:36 -06:00
Jason Wilder a8c69e222a Use scanner for reading v1 tombstones
Use a bufio.Scanner to read v1 tombstones instead of reading in
the whole file and parsing it from memory.
2016-07-28 20:25:36 -06:00
Jason Wilder 7b8959f6f2 Apply tombstones iteratively at startup
Tombstone were read fully into memory at startup which could consume
a lot of RAM and OOM the process if there were a lot of deleted
series and many TSM files.

This now walks the tombstone file and iteratively applies the tombstone
which uses significantly less RAM.  This may be slightly slower in the
generate cause, but should scale better.
2016-07-28 20:25:36 -06:00
Jason Wilder 7c3d1aac68 Simplify purger.add logic 2016-07-26 13:02:08 -06:00
Jason Wilder cab84ae279 Prevent concurrent compactions from stepping on each other
Normally, compactions do not conflict on the files they are compacting.
If the full cold threshold is set very low, it can cause conflicts where
two compactions compact the same files.  The full compaction was the
only place this could happen as it's planning is greedy.

To make this safer for concurrent execution, the compaction tracks which
files are current being compacted and prevents any new compactions from
starting if the file set overlaps.

Fixes #6595
2016-07-26 12:58:25 -06:00
Jason Wilder ded6e40d47 Remove lastPlanCheck var
This causes full compactions to not run if the server is running, but
after a restart they do run.
2016-07-26 12:58:25 -06:00
Jason Wilder 2f78c4ec83 Fix race when creating temp file
Using os.O_EXCL is safer than checking and then creating the file.
2016-07-26 12:58:25 -06:00
Cory LaNou 063675b928 updates to make snappy compression tests work again 2016-07-22 14:33:20 -05:00
Cory LaNou 968d322d6d finish tsm file exporter 2016-07-21 17:20:51 -05:00
Jason Wilder fb5a143b08 Fix typos 2016-07-21 12:13:04 -06:00
Jason Wilder 13147efb24 Close underlying cursors when closing iterators
If a query is interrupted via kill query, the tsm files managed
by the file store purger would never get removeed because
KeyCursor.Close was never called.

KeyCursor.Close should always be called now.
2016-07-21 12:13:04 -06:00
Jason Wilder 822f409b31 Allow queries to complete before closing TSM files
If a query was running against a file being compacted, we close the file
and the query would end wherever it had read up to.  This could result
in queries that randomly lost data, but running them again showed the
full results.

We now use a reference counting approach and move the in-use files out
of the way in the filestore and allow the queries to complete against
the old tsm files.  The new files are installed and new queries will
use them.

Fixes #5501
2016-07-21 12:13:04 -06:00
Edd Robinson f37e726869 Add trace logging statements to tsdb 2016-07-21 11:14:29 +01:00
Edd Robinson 44231abcbd Add trace logger controlled via DataLoggingEnabled 2016-07-21 11:14:29 +01:00
Edd Robinson 83cc580ff8 Tidy up logging 2016-07-21 11:14:29 +01:00
Mark Rushakoff 518bd3b565 Micro-optimize BooleanDecoder for 20% speedup
benchmark                          old ns/op     new ns/op     delta
BenchmarkBooleanDecoder_2048-4     9954          7846          -21.18%

benchmark                          old allocs     new allocs     delta
BenchmarkBooleanDecoder_2048-4     0              0              +0.00%

benchmark                          old bytes     new bytes     delta
BenchmarkBooleanDecoder_2048-4     0             0             +0.00%
2016-07-20 08:43:05 -07:00
Mark Rushakoff 523aea715a Protect against bounds errors in FloatDecoder 2016-07-19 15:59:27 -07:00
Mark Rushakoff e483689563 Protect against bounds errors in BooleanDecoder 2016-07-19 15:59:27 -07:00
Mark Rushakoff 35e3adc890 Protect against bounds errors in IntegerDecoder 2016-07-19 15:43:27 -07:00
Mark Rushakoff 42b35ca068 Protect against bounds errors in TimeDecoder 2016-07-19 15:43:27 -07:00
Mark Rushakoff be589a6760 Protect against bounds errors in StringDecoder 2016-07-19 15:43:27 -07:00
Mark Rushakoff 5b549ffdfe Handle bounds errors in UnpackBlock 2016-07-19 15:43:27 -07:00
Mark Rushakoff 39f12e376c Defend against some boundary errors in TSM reading 2016-07-19 15:43:27 -07:00
Mark Rushakoff 28f31b4a0c Add test cases to repro corruption panics 2016-07-19 15:36:17 -07:00
Jason Wilder b692ef4f48 Rename throttle package to limiter 2016-07-18 12:00:58 -06:00
Jason Wilder c2370b437b Limit in-flight wal writes/encodings
A slower disk can can cause excessive allocations to occur when
writing to the WAL because the slower encoding and compression occurs
before taking the write lock.  The encoding/compression grabs a large
byte slice from a pool and ultimately waits until it can acquire the
write lock.

This adds a throttle to limit how many inflight WAL writes can be queued
up to prevent OOMing the processess with slower disks and heavy writes.
2016-07-17 23:53:12 -06:00
Jason Wilder 46fdcba6e3 Remove compaction enabled logging
Too verbose
2016-07-17 23:53:12 -06:00
Jason Wilder 2fa28ba1d3 Don't log error when compactions are aborted 2016-07-17 23:53:12 -06:00
Jason Wilder b48d88ce9e Abort running compactions when series are deleted
If a delete is issued while a compaction is running, the a newly
deleted series could re-appear after the compaction completed. This
could occur the compaction had already written the blocks for series
that were just deleted.  When the compaction completes, the newly
written tombstone files would be deleted, essentially undeleting the
series.
2016-07-17 23:53:12 -06:00
Jason Wilder 0f5e994383 Fix panic in full compactions due to duplciate data in blocks
Due to a bug in compactions, it's possible some blocks may have duplicate
points stored.  If those blocks are decoded and re-compacted, an assertion
panic could trigger.

We now dedup those blocks if necessary to remove the duplicate points
and avoid the panic.
2016-07-14 11:32:36 -06:00
Jason Wilder 0264966f5c Add index optimize planning step
For larger datasets, it's possible for shards to get into a state where
many large, dense TSM files exist.  While the shard is still hot for
writes, full compactions will skip these files since they are already
fairly optimized and full compactions are expensive.  If the write volume
is large enough, the shard can accumulate lots of these files.  When
a file is in this state, it's index can contain every series which
causes startup times to increase since each file must parse the full
set of series keys for every file.  If the number of series is high,
the index can be quite large causing large amount of disk IO at startup.

To fix this, a optmize compaction is run when a full compaction planning
step decides there is nothing to do.  The optimize compaction combines
and spreads the data and series keys across all files resulting in each
file containing the full series data for that shard and a subset of the
total set of keys in the shard.

This allows a shard to only store a series key once in the shard reducing
storage size as well allows a shard to only load each key once at startup.
2016-07-14 11:32:36 -06:00
Jason Wilder 5ee20e04a8 Fix compaction level planner
Large files created early in the leveled compactions could cause
a shard to get into a bad state.  This reworks the level planner
to handle those cases as well as splits large compactions up into
multiple groups to leverage more CPUs when possible.
2016-07-14 11:14:09 -06:00
Jonathan A. Sternberg 12a33fe0d3 Add stats and diagnostics to the TSM engine
Track the number of TSM files in the file store and keep engine
statistics related to the number of TSM compactions.
2016-07-07 19:35:55 -05:00
Jonathan A. Sternberg 837a9804cf Refactoring the monitor service to avoid expvar
Truncate the time interval output of the monitor service to be on even
time intervals rather than on every minute based on the start time. This
normalizes the output from the monitor service.
2016-07-07 11:13:58 -05:00
Jason Wilder 2f82d9a525 Truncate the slice when merging the caches 2016-07-05 12:12:21 -05:00
Jason Wilder fdf0bac717 Fix panic: runtime error: index out of range
Fixes #6829
2016-06-27 18:50:48 -06:00
Jason Wilder ca6bfac01a Fix out of order blocks returned during query
If there were blocks in later TSM files that were for overwritten
points or writes into the past, they could be returned more than
once or out of order causing the cursor values to be unsorted.

One effect of this is that graphs in graphana would render with
the line going all over the place in spots.

This might also cause duplicate data to be returned.

Fixes #6738
2016-06-22 17:34:44 -06:00
Jonathan A. Sternberg 7bdcd669a8 Merge pull request #6879 from influxdata/js-prune-deadcode
Removing dead code from every package except influxql
2016-06-22 08:12:19 -05:00
Jonathan A. Sternberg 497db2a6d3 Removing dead code from every package except influxql
The tsdb package had a substantial amount of dead code related to the
old query engine still in there. It is no longer used, so it was removed
since it was left unmaintained. There is likely still more code that is
the same, but wasn't found as part of this code cleanup.

influxql has dead code show up because of the code generation so it is
not included in this pruning.
2016-06-20 22:41:07 -05:00
Jonathan A. Sternberg 8812bc8a93 Remove a double lock in the tsm1 index writer 2016-06-20 17:32:34 -05:00
Jonathan A. Sternberg 6e205ce135 Set the condition cursor instead of aux iterator when creating a nil condition cursor
A copy/paste error had nil cursors destined for a condition cursor get
set to the auxiliary cursor instead. When the number of conditions
exceeded the number of auxiliary fields, this would result in a stack
trace in some situations. When the number of conditions was less than or
equal to the number of auxiliary fields, it means that an auxiliary
cursor may have been overwritten with a nil cursor accidentally and a
leak might have happened since it was never closed.

Fixes #6859.
2016-06-17 14:54:48 -05:00
Jason Wilder ac6addd0b5 Ensure restore doesn't write broken files
Restore would try to open the shard if there was an error.  If there
was an error, the files written are very likely to be partially written
and they can cause the server to panic.

To prevent a shard from trying to open broken files, we now write to
a temp file and rename it to the actual name only after fully writing
and fsyncing the file.
2016-06-07 14:36:46 -06:00
Jason Wilder 838a29cca8 Fix race in cache
If cache.Deduplicate is called while writes are in-flight on the cache, a data race
could occur.

WARNING: DATA RACE
Write by goroutine 15:
  runtime.mapassign1()
      /usr/local/go/src/runtime/hashmap.go:429 +0x0
  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Cache).entry()
      /Users/jason/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/cache.go:482 +0x27e
  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Cache).WriteMulti()
      /Users/jason/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/cache.go:207 +0x3b2
  github.com/influxdata/influxdb/tsdb/engine/tsm1.TestCache_Deduplicate_Concurrent.func1()
      /Users/jason/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/cache_test.go:421 +0x73

Previous read by goroutine 16:
  runtime.mapiterinit()
      /usr/local/go/src/runtime/hashmap.go:607 +0x0
  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Cache).Deduplicate()
      /Users/jason/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/cache.go:272 +0x7c
  github.com/influxdata/influxdb/tsdb/engine/tsm1.TestCache_Deduplicate_Concurrent.func2()
      /Users/jason/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/cache_test.go:429 +0x69

Goroutine 15 (running) created at:
  github.com/influxdata/influxdb/tsdb/engine/tsm1.TestCache_Deduplicate_Concurrent()
      /Users/jason/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/cache_test.go:423 +0x3f2
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:473 +0xdc

Goroutine 16 (finished) created at:
  github.com/influxdata/influxdb/tsdb/engine/tsm1.TestCache_Deduplicate_Concurrent()
      /Users/jason/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/cache_test.go:431 +0x43b
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:473 +0xdc
2016-06-06 15:45:01 -06:00
Jason Wilder bc76048371 Fix panic in cache.DeleteRange
Deleting keys that did not exist in the cache could cause a panic
because the entry returned would be nil and was not checked.
2016-06-06 14:48:53 -06:00
Jason Wilder a74ea4cbf4 Allow creating shards in a disable state
For restoring a shard, we need to be able to have the shard open,
but disabled.  It was racy to open it and then disable it separately
since writes/queries could occur in between that time.
2016-06-01 16:17:18 -06:00
Jason Wilder d0023dee5d Convert inline errors to constants 2016-05-31 10:51:54 -06:00
Jason Wilder 1ff8ecf4fb Add ability to disable shards
Disabling a shard causes all writes and queries to a shard to return
an error.  This also disables compactions for the shard.
2016-05-31 10:51:54 -06:00
Edd Robinson baf5d505e6 Merge pull request #6754 from influxdata/er-fs
Prevent ReadFloatBlock from panicking when no values
2016-05-31 16:41:29 +01:00
Edd Robinson 003c30989a Check for no values 2016-05-31 16:28:17 +01:00
rw dcec206f2e Dedup `.RUnlock` between two conditionals. 2016-05-29 10:20:58 -07:00
rw 1b160d1af0 Low-contention path for pre-existing cache entries.
This change appears to increase bulk ingestion throughput by 2x-3x in
multiprocessor environments.
2016-05-28 23:50:11 -07:00
Jason Wilder 11959005f4 Switch backup to use shard.Snapshot
This switch the backup shard call to use the shard Snapshot that
internally creates a snapshot by hardlinking all of the TSM and
tombstone files instead.  This reduces the time that the FileStore
is locked and will allow for larger shards to be backup more easily.
2016-05-27 09:30:25 -06:00
David Norton 381059a55c Merge pull request #6736 from influxdata/benchmark-write-points-allocs
Benchmarks to count allocs in WritePoints.
2016-05-27 10:13:17 -04:00
Edd Robinson 6a7f9527e3 Revert d2672a3 and 1e0a4e9 2016-05-27 10:34:14 +01:00
rw 92e7fec5cf Benchmarks to count allocs in WritePoints. 2016-05-26 17:13:14 -07:00
Edd Robinson d2672a3280 Update Go version 2016-05-26 15:26:09 +01:00
Edd Robinson 1e0a4e9119 Move fields under mutex 2016-05-26 12:00:46 +01:00
Jason Wilder d6661060a3 Merge pull request #6719 from shurcooL/fix-tombstone-open-error-check
tsdb/engine/tsm1: Check os.Open error before using file.
2016-05-25 12:11:26 -06:00
Jason Wilder a77dd4fe4c Merge pull request #6725 from influxdata/jw-tsm-query
Fix pathological TSM query case
2016-05-25 11:23:38 -06:00
Jason Wilder 7d50970631 Fix continous compaction edge case
The level planner would keep including the same TSM files to be
recompacted even if they were already quite compacted and split
across several TSM files.

Fixes #6683
2016-05-25 10:36:24 -06:00
Jason Wilder 0b481ff627 Fix pathalogical TSM query case
This fixes a pathalogical query condition cause by and problematic
structuring of TSM files based on how points were written.  The
condition can occur when there are multiple TSM files and a large
number of points are written into the past.  The earlier existing
TSM files must also have points in the past and close to the present
causing their time range to eclipse the later files.

When this condition occurs, some queries can spend an excessive amount
of time merge all the overlapping blocks.

The fix was to constrain the window of overlapping blocks based on
the first one we ran into.  There was also a simple case in the Merge
where we could skip the binary search path and just append the two
inputs.
2016-05-25 09:14:17 -06:00
Dmitri Shuralyov c03ebf896b tsdb/engine/tsm1: Check os.Open error before using file.
os.Open is documented as:

> Open opens the named file for reading. If successful, methods on
> the returned file can be used for reading;

That suggests the file's methods should only be called if opening
was successful. The original code would defer f.Close() right after
os.Open, before ensuring that err is nil, so f.Close() would run
even if os.Open did not return successfully.

Apply https://github.com/golang/go/wiki/CodeReviewComments#indent-error-flow
suggestion to keep the normal path at minimal indentation, and indent
the error handling code instead. This improves code readability.
2016-05-24 21:08:35 -07:00
Jason Wilder f48a106860 Optimized timestamp run-length decoding
Removes the up-front allocation of decoded values and return them
as needed.
2016-05-23 14:05:25 -06:00
Edd Robinson 40732a35d0 Merge pull request #6660 from influxdata/er-vet
Fix vet issues
2016-05-20 11:12:25 +01:00
Jonathan A. Sternberg 5621ccc2ce Remove limit optimization when using an aggregate
The limit optimization was put into the wrong place and caused only part
of the shard to be read when a limit was used. The optimization is
possible, but requires a bit of refactoring to the code here so the call
iterator is created per series before handed to the limit iterator.

Fixes #6661.
2016-05-19 10:29:38 -04:00
Jason Wilder 4c089a56f4 Fix read tombstones: EOF
Due to an bug in TSM tombstone files, it was possible to create
empty tombstone files.  At startup, the TSM file would error out
and not load the TSM file.

Instead, treat it as an empty v1 file so the TSM file can load
correctly.

Fixes #6641
2016-05-18 23:29:25 -06:00
Jason Wilder 7fb7faaaca Fix points already read from being returned more than once
If there were duplicate points in multiple blocks, we would correctly
dedup the points and mark the regions of the blocks we've read.
Unfortunately, we were not excluding the already points as the cursor
moved to points in the later blocks which could cause points to be
return twice incorrectly.

Fixes #6611
2016-05-18 17:21:10 -06:00
Jason Wilder f2bcf9d9ab Code review fixes 2016-05-18 15:25:56 -06:00
Jason Wilder d32ad26d27 Fix data not getting reloaded
The optimization to speed up shard loading had the side effect of
skipping adding series to the index that already exist.  The skipping
was in the wrong location and also skipped the shards measurementFields
index which is required in order to query that series in the shard.
2016-05-18 15:25:56 -06:00
Jason Wilder e859141b75 Speed up tests
Switched the max keys test to write int64 of the same value so RLE
would kick in and the file size will be smaller (84MB vs 3.8MB).

Removed the chunking test which was skipped because the code will
not downsize a block into smaller chunks now.

Skip MaxKeys tests in various environments because it needs to
write too much data to run reliably.
2016-05-18 15:25:56 -06:00
Jason Wilder eff71cbe23 Rollover to new TSM file when max blocks exceeded
Fixes #6406
2016-05-18 15:25:55 -06:00
Jason Wilder 8fda621d8b Fix memory spike when compacting overwritten points
If a large series contains a point that is overwritten, the compactor
would load the whole series into RAM during a full compaction.  If
the series was large, it could cause very large RAM spikes and OOMs.

The change reworks the compactor to merge blocks more incrementally
similar to the fix done in #6556.

Fixes #6557
2016-05-18 15:25:55 -06:00
Edd Robinson f78e67d09c Fix concurrent map access panic 2016-05-18 17:56:50 +01:00
Edd Robinson f680ab0f0d Fix vet issues 2016-05-18 13:34:11 +01:00
Jonathan A. Sternberg 42cdaf0365 Merge pull request #6529 from influxdata/js-6519-select-tag-key-specifier
Support cast syntax for selecting a specific type
2016-05-16 12:30:14 -04:00
Jonathan A. Sternberg 23f6a706bb Support cast syntax for selecting a specific type
Casting syntax is done with the PostgreSQL syntax `field1::float` to
specify which type should be used when selecting a field. You can also
do `field1::field` or `tag1::tag` to specify that a field or tag should
be selected.

This makes it possible to select a tag when a field key and a tag key
conflict with each other in a measurement. It also means it's possible
to choose a field with a specific type if multiple shards disagree. If
no types are given, the same ordering for how a type is chosen is used
to determine which type to return.

The FieldDimensions method has been updated to return the data type for
the fields that get returned. The SeriesKeys function has also been
removed since it is no longer needed. SeriesKeys was originally used for
the fill iterator, but then expanded to be used by auxiliary iterators
for determining the channel iterator types. The fill iterator doesn't
need it anymore and the auxiliary types are better served by
FieldDimensions implementing that functionality, so SeriesKeys is no
longer needed.

Fixes #6519.
2016-05-16 12:08:29 -04:00
Jason Wilder 23fc9ff748 Revert "Fix memory spike when compacting overwritten points"
This reverts commit d99c5e26f6.
2016-05-16 09:30:34 -06:00
Jason Wilder 0dbd4893da Optimize shard index loading
On data sets with many series and potentially large series keys,
the cost of parsing the key and re-indexing can be high.

Loading the TSM keys into the index was being done repeatedly for
series that were already index by an earlier TSM file.  This was
wasted worked and slows down shard loading.

Parsing the key was also innefficient and allocated a new string
slice.  This was simplified to remove that allocation.
2016-05-12 14:02:42 -06:00
Ben Johnson 668bae57df
parallelize query planning
This commit changes the `tsm1.Engine` to create individual series
iterators in batches so that it can be parallelized. Iterators
are combined at the end so they can be redistributed to the
parallelized merge iterator.
2016-05-11 10:38:11 -06:00
Cory LaNou c32906a366 Merge pull request #6593 from influxdata/cjl-copyshard
create shard snapshot
2016-05-10 20:01:59 -05:00
Jason Wilder d8490f1170 Merge pull request #6587 from influxdata/jw-validate-fields
Fix for merge values
2016-05-10 11:56:07 -06:00
Cory LaNou f415cf89ad wip 2016-05-10 11:01:03 -05:00
Jason Wilder 9b86bfea2a Merge pull request #6582 from eleme/fix_engine_cache_size
fix cache size of engine
2016-05-10 09:01:03 -06:00
Jason Wilder 8839cabd41 Add benchmark for Merge 2016-05-10 08:39:55 -06:00
Cory LaNou 4d30ea1eb3 minor PR feedback refactor 2016-05-10 08:14:51 -05:00
Cory LaNou a3bf3e2ef1 added baseline backup/restore plumbing 2016-05-10 08:14:51 -05:00
Jason Wilder 4f39cb2f97 Fix case where Merge return unsorted values 2016-05-09 15:40:34 -06:00
Ben Johnson 078e561820
parallelize iterators 2016-05-09 10:25:30 -06:00
thbourlove 22c2e7e1c5 fix cache memory size of engine 2016-05-09 21:29:34 +08:00
Jason Wilder d99c5e26f6 Fix memory spike when compacting overwritten points
If a large series contains a point that is overwritten, the compactor
would load the whole series into RAM during a full compaction.  If
the series was large, it could cause very large RAM spikes and OOMs.

The change reworks the compactor to merge blocks more incrementally
similar to the fix done in #6556.
2016-05-05 22:31:30 -06:00
Ben Johnson 4c45f8ec32 Merge pull request #6560 from benbjohnson/optimize-tsm1-call-iterator
Move call iterator to series level
2016-05-05 11:13:53 -06:00
Ben Johnson fdf34d4356
move call iterator to series level
This commit moves the `CallIterator` to wrap the individual series
instead of wrapping a shard. This allows individual points to be
aggregated before being merged.

This will cause a small increase in memory usuage per series but
it shows a 20% decrease in query time when there are a moderate
number of points per series.
2016-05-05 09:59:03 -06:00
Jason Wilder a0ac754802 Fix loading huge series into RAM when points are overwritten
In some query scenarios, if there are a lot of points on disk spread
across many blocks in TSM files and a point is overwritten near the
begginning of the shard's timerange, the full series could be loaded
into RAM triggering OOMs and huge allocations.

The issue was that the KeyCursor code that handles overwriting points
had a simple implementation that just deduped the whole series in this
case.  This falls over when the series is quite large.

Instead, the KeyCursor has been changed to only decode blocks with
updated points.  It then keeps track of what section of the blocks
have been read so they are not re-read when the later points are
decoded.

Since the points in a block are always sorted, the code was also changed
to remove the Deduplicate calls since they end up
reallocating the slice.  Instead, we do a sorted merge and re-use
the slice as much as we can.
2016-05-05 09:34:44 -06:00
Jason Wilder 57cb3fdbc0 Merge pull request #6522 from influxdata/tp-tsm-dump
Dump TSM files to line protocol
2016-05-03 10:44:33 -06:00
Jason Wilder 4196554f51 Fix overwriting points returning wrong value
The cursors were returning the wrong value in the case when points
existed in both the cache and tsm files with the same timestamp. The
cache value should have been returned, but the tsm value was returned
incorrectly.

Fixes #6439
2016-05-03 09:21:31 -06:00
Edd Robinson fd77dbe648 Merge pull request #6546 from influxdata/er-build-tag
Fix invalid build tag
2016-05-03 16:00:39 +01:00
Jonathan A. Sternberg a2a5c32770 Merge pull request #6539 from influxdata/js-6495-fix-aggregates-with-empty-shards
Fix aggregate returns when data is missing from some shards
2016-05-03 10:56:21 -04:00
Jonathan A. Sternberg d6d0addcec Fix aggregate returns when data is missing from some shards
If a shard is empty for a specific field and the field type is something
other than a float, a nil iterator would get returned from one of the
empty shards and cause the combined iterators to be cast to the float
type and all other iterator types to be discarded (or for integers, to
be cast).

This is rare since most aggregates don't accept strings or booleans, but
for queries like:

    SELECT distinct(string) FROM mydata

It would result in nothing getting returned if one of the shards didn't
have a value for `string`.

This change modifies the query engine to return nil for the shards
instead of a fake iterator and then to only use the fake iterator if the
final aggregate iterator is nil (meaning that no iterators could be
constructed for the field from any shard).

Fixes #6495.
2016-05-03 10:41:22 -04:00
Edd Robinson d35fa1ec97 Remove redundant windows build tags 2016-05-03 14:22:02 +01:00
Jason Wilder e0304ae3d5 Fix shards not getting assigned to series on restart
Also, simplifies the LoadMetaDataIndex func to not require a *Shard
2016-05-02 11:36:05 -06:00
Jason Wilder 2d09937fd2 Fix removing fully deleted index blocks
If multiple tombstone entries happen to exist for the same key in a
tombstone file, it was possible to panic.  The first application
would remove all index entries and the second time around the code
still assumed entries would exist and would index into the nil slice.

Also fixes a case where the range of time would fully delete all index
entries, but it did not align with math.MinInt64 and math.MaxInt64.  This
would cause the index locations to still exist in the offset slice.  This
is inefficient because the BlockIterator would still scan and decode the block
only to discover that all the values are deleted.  We now just remove it from
the offsets slice in this case since the range of values are deleted.
2016-05-02 11:36:05 -06:00
Jason Wilder 58aa65d5a8 Optimize applyTombstones
When a large tombstone file existed on disk, this code was slow since
it would apply each tombstone to the index one at a time causing the
index to be scanned for each key.

Instead, we group all the tombstones together by timestamp and apply
in bulk so that the index in scan once for each set of tombstones.

If we change to immuntable tombstone files, it might be better to just
write a file where all the keys have the same tombstone so we can re-apply
them efficiently.
2016-05-02 11:36:05 -06:00
Jason Wilder c73c7cea25 Revert filtering index entries in BlockIterator
This was the wrong fix.  The real issue was the tombstones were
being read incorrectly and also applied incorrectly at times.  This
code is slower and not necessary so reverting it.
2016-05-02 11:36:04 -06:00
Jason Wilder f9ace932c0 Fix V2 tombstone reading file position
Each iteration of the loop was incrementing the position by 4 incorrectly.
The position should start at four since the header is 4 bytes.  This
caused tombstones at the end of the file to not be read because the counter
was out of sync with the actual file position which cause the loop to exit early.

Probably better to refactor this to check for io.EOF instead of using the counter.
2016-05-02 11:36:04 -06:00
Jason Wilder bd1009080e Prevent writing empty tombstone files
If you delete from a measurement with a tag those does not match
any series, we would write a empty tombstone file and file to load
it back.
2016-05-02 11:36:04 -06:00
Jason Wilder 8082fc61ba Fix parsing keys when loading database index
The code for parsing a key our of the WAL or TSM files in the engine
was naive and didn't account for measurements with escape chars. This
uses the correct parsing code to parse and load them correctly.

Fixes #6496
2016-04-30 14:47:19 -06:00
Todd Persen 9eb4c1ec57 Fix typo in comment. 2016-04-29 16:26:27 -07:00
Jason Wilder abcb559b09 Remove index meta data when series and measurements are gone
This remove the dropMeta param from the tsdb.Store.DeleteSeries and
lets the shard determine when to remove the meta data from the index
based on what series still have data in the shard.

This uncovered a nasty bug in compactions where a fully deleted series would
prematurely end the compactions and not carry forward the rest of the data
in the TSM file.  This is now fixed as well.
2016-04-29 16:31:57 -06:00
Jason Wilder 4e353867d5 Fix first block not getting purged when deleting series 2016-04-27 17:08:00 -06:00
Ben Johnson f7af787aef
add DELETE query support
This commit adds query language support for deleting series with a
`DELETE` query.
2016-04-27 15:16:23 -06:00
Jason Wilder aefd2ad08b Add DeleteSeries and DeleteSeriesRange 2016-04-27 13:09:53 -06:00
Jason Wilder c306090361 Fix tombstone rename on windows 2016-04-27 13:09:53 -06:00
Jason Wilder 86d37614e4 Remove debugging from test output 2016-04-27 13:09:53 -06:00
Jason Wilder bf3aa5857d Don't add tombstone for timerange not contained by file 2016-04-27 13:09:53 -06:00
Jason Wilder 6042e114a1 Remove tombstoned values during compaction
This will skip blocks that are fully tombstoned as well as remove
points that have been removed within a block.
2016-04-27 13:09:53 -06:00
Jason Wilder 23bbfb2192 Prevent truncated WAL entries from panicing 2016-04-27 13:09:53 -06:00
Jason Wilder 0de21ade40 Add delete range of values support to WAL and cache loader 2016-04-27 13:09:53 -06:00
Jason Wilder d13d01b516 Allow deleting series by time on a shard 2016-04-27 13:09:53 -06:00
Jason Wilder 4d71d2b01f Add support for deleting cache values using time range 2016-04-27 13:09:52 -06:00
Jason Wilder c154cd4b4a Remove TSMReaderOptions
Not used
2016-04-27 13:09:52 -06:00
Jason Wilder c8bd41c2d8 Remove TSM reader Keys func
It's very inneficient and should never be used.
2016-04-27 13:09:52 -06:00
Jason Wilder 7e06d558d5 Update ContainsValue to handle tombstones 2016-04-27 13:09:52 -06:00
Jason Wilder 97504a552c Support time range tombstones in FileStore/KeyCursor 2016-04-27 13:09:52 -06:00
Jason Wilder 27c2bc3f15 Sepearate IndexWriter from TSMIndex
Allows for future versionion of the TSMIndex as well as removing
a lot of unnecessary code.
2016-04-27 13:09:52 -06:00
Jason Wilder bb82331db7 Move TSMIndex defn to reader.go 2016-04-27 13:09:52 -06:00
Jason Wilder 1ac0b01c5a Remove fileAccessor
No longer used
2016-04-27 13:09:52 -06:00
Jason Wilder a789e819a3 Remove NewTSMReaderWithOptions
There are two TSMIndex implementations, the directIndex and the
indirectIndex.  Originally, we only had the directIndex and later
added the indirectIndex and NewTSMReaderWithOptions in order to
allow both indexes to be used in tests and code.  This has created
a problem since we really only use the directIndex for writing and
always use the indirectIndex for reading.

This changes removes the NewTSMReaderWithOptions func so that it is
no longer possible to create a TSMReader with a directIndex.  This
will allow a lot of the block reading code used by the directIndex
to be removed and simplify maintainence.  It also gives better test
coverage of the code that is actually used by the TSM engine now.
2016-04-27 13:09:52 -06:00
Jason Wilder bc6328d196 Add time range support to tombstone files
This adds support for a time range to tombstone files to allow a subset
of points to be deleted instead of the whole series.  It changes the
tombstone file format to a binary format and maintains backwards compatibility
with the old text format tombstone files.
2016-04-27 13:09:52 -06:00
Ben Johnson 286072f65a
update dep: simple8b @ b421ab40 2016-04-22 09:46:05 -06:00
Ben Johnson d204a8b683
optimize tsm1.FloatDecoder
This commit changes the `FloatDecoder.val` from a `float64` type
to a `uint64` to avoid an additional type conversion during read.
Now the type gets converted to a `float64` only on call to `Values()`.
2016-04-21 08:49:12 -06:00
Jason Wilder 87ceb7426a Don't lock the cache while adding entries
Entries have their own locking so the cache doesn't need to be lock
when adding to them.
2016-04-20 16:08:58 -06:00
Jason Wilder fbaa7db54f Don't lock entry when scanning new values to add 2016-04-20 16:00:26 -06:00
Jason Wilder bfa225f149 Merge pull request #6430 from influxdata/jw-cache-load-size
Disable cache max memory size when reloading the cache
2016-04-20 14:35:23 -06:00
Stephen Gutekanst 9dc09c5257 Make logging output location more programmatically configurable (#6213)
This has various benefits:

- Users embedding InfluxDB within other Go programs can specify a different logger / prefix easily.
- More consistent with code used elsewhere in InfluxDB (e.g. services, other `run.Server.*` fields, etc).
- This is also more efficient, because it means `executeQuery` no longer allocates a single `*log.Logger` each time it is called.
2016-04-20 21:07:08 +01:00
Jason Wilder f679787080 Disable cache max memory size when reloading the cache
The cache max memory size is an approximate size and can prevent a
shard from loading at startup.  This change disable the max size
at startup to prevent this problem and sets the limt back after
reloading.

Fixes #6109
2016-04-20 10:41:30 -06:00
Jonathan A. Sternberg c8c38e15cd Merge pull request #6386 from influxdata/js-iterator-next-error
Modify all of the iterators to allow returning an error on Next()
2016-04-20 10:39:53 -04:00
Ben Johnson 54454e1e5b Merge pull request #6424 from benbjohnson/optimize-bit-reader
Optimize tsm1.BitReader
2016-04-20 08:28:24 -06:00
Seif Lotfy c6e3c87e00 Add Block checksum validation and "influx_inspect verify" tool
Fixes #5502
2016-04-19 22:33:03 +02:00
Ben Johnson 1d2238c642
optimize tsm1.BitReader
This commit rewrites the `tsm1.BitReader` to use an 8-byte buffer
instead of a 1-byte buffer and provide an inlineable fast bit read.
2016-04-19 11:34:17 -06:00
Jason Wilder f841a90d35 Use int64 instead of time.Time in timestamp encoder/decoder 2016-04-19 10:25:27 -06:00
Jason Wilder 61beeca426 Update timestamp benchmarks 2016-04-19 10:17:32 -06:00