Commit Graph

1994 Commits (cc22134d8f8b15418f1492d84c000693d61422af)

Author SHA1 Message Date
Jason Wilder fb7135ddc8 Fix corrupted wal segment panic on 32 bit systems 2017-10-16 09:41:20 -06:00
lrita 2f0aa4a420 remove duplicated code in cacheKeyIterator.encode() 2017-10-13 20:39:15 +08:00
Stuart Carnie a0848eac8c remove unnecessary err value
readKey never sets error, so it is always nil
2017-10-12 08:28:53 -07:00
Jason Wilder 1401950b10 Only schedule one compaction per shard at a time
The scheduling logic ended up favoring more backlogged shards
too much and would starved active, less backed up shards.  This
occurred because the scheduling kicks in once a second.  When it
runs, it schedules as many compactions as it can.  A backed up shard
would end up having more compactions to run during the loop an would
generally get to schedule them more frequently.

This now allows each shard to try and schedule one compaction at a time
which provides a more balanced approach.  At some point, we'll probably
want to more directly balanc the each shards backlog vs letting it happen
somewhat randomly.
2017-10-09 11:40:32 -06:00
Jason Wilder 00a403f60e Reduce allocation in tsmKeyIterator.Next
This reuses some intermediate buffers and structs while compacting
files.
2017-10-04 17:35:56 -06:00
Jason Wilder 6b6ccf1a40 Wait for compaction gorotuines to finish 2017-10-04 10:01:44 -06:00
Jason Wilder 06226d6fd3 Handle orphan lower level TSM files during full planning
Some files seem to get orphan behind higher levels.  This causes
the compactions to get blocked as the lowere level files will not
get picked up by their lower level planners.  This allows the full
plan to identify them and pull them into their plans.
2017-10-04 08:13:14 -06:00
Jason Wilder a1d0b52897 Allow lower priority compactions to use excess capacity
If there is a backlog of level 3 and 4 compacitons, but few level 1
and 2 compactions, allow them to use some excess capacity.
2017-10-04 08:11:44 -06:00
Jason Wilder f2a681c4cf Unconditionally remove file when calling Remove 2017-10-03 10:49:17 -06:00
Jason Wilder 0c0505881f Remove multiple file skipping for full compaction planning
This check doesn't make sense for high cardinality data as the files
typically get big and sparse very quickly.  This causes a lot of extra
disk space to be used which is taken up by large indexes and sparse
data.
2017-10-03 10:48:14 -06:00
Jason Wilder 90df803802 Prevent infinite scheduling loop
One shard might be able to run a compaction, but could fail to
limits being hit.  This loop would continue indefinitely as the
same task would continue to be rescheduled.
2017-10-03 10:48:14 -06:00
Jason Wilder 4ff4ba0841 Use first file in generation for level
With higher cardinality or larger series keys, the files can roll
over early which causes them to take longer to be compacted by higher
levels.  This causes larger disk usage and higher numbers of tsm files
at times.
2017-10-03 10:48:14 -06:00
Jason Wilder 71071ed67a Add compaction backlog stat
This gives an indication as to whether compactions are backed up
or not.
2017-10-03 10:48:14 -06:00
Jason Wilder 16ece490ef Reduce allocation in tsmKeyIterator.Next
The chunked slice is unnecessary and we can re-use k.blocks throughout
the compaction.
2017-10-03 10:48:14 -06:00
Jason Wilder 2c5006fccc Rework snapshotting concurrency
This switches the thresholds that are used for writing snapshots
concurrently.  This scales better than the prior model.
2017-10-03 10:48:14 -06:00
Jason Wilder 3af9c7df37 Remove a defer allocation
Shows up under high cardinality compactions.
2017-10-03 10:48:14 -06:00
Jason Wilder 70817350b7 Ensure temp index files are cleaned up on error 2017-10-03 10:48:14 -06:00
Jason Wilder a5afaf7499 Fix cache mem size not including key size 2017-10-03 10:48:14 -06:00
Jason Wilder ae821f4e2d Rework compaction scheduling
This changes the compaction scheduling to better utilize the available
cores that are free.  Previously, a level was planned in its own goroutine
and would kick off a number of compactions groups.  The problem with this
model was that if there were 4 groups, and 3 completed quickly, the planning
would be blocked for that level until the last group finished.  If the compactions
at the prior level are running more quickly, a large backlog could accumlate.

This now moves the planning to a single goroutine that plans each level in
succession and starts as many groups as it can.  When one group finishes,
the planning will start the next group for the level.
2017-10-03 10:48:13 -06:00
Jason Wilder f668b0cc3f Only use O_SYNC for tsm file writing
Doing this for the WAL reduces throughput quite a bit.
2017-10-03 10:48:13 -06:00
Jason Wilder 1610ae5727 Don't return tsm files part of a compaction plan 2017-10-03 10:48:13 -06:00
Joe LeGasse 1525069213 Merge pull request #8892 from influxdata/jl-tag-values
auth: add series auth to 'show tag values'
2017-10-03 08:47:39 -04:00
Lyon Hill 7e5fd14e8a add in some optimization 2017-10-02 12:02:38 -06:00
Lyon Hill a6cbce0d3e fix issues brought up by joe 2017-10-02 11:41:03 -06:00
Lyon Hill 38dc837910 Fix a minor memory leak when batching points for some services.
fixes #8895
2017-10-02 11:26:25 -06:00
Joe LeGasse 1443b22379 auth: add series auth to 'show tag values' 2017-09-27 20:01:18 -04:00
Edd Robinson e0cba4477c Merge pull request #8885 from influxdata/er-entry-race
Fix race on Cache entry
2017-09-27 18:42:45 +01:00
Edd Robinson d0b81c1e6c Fix race on Cache entry 2017-09-27 18:10:23 +01:00
Edd Robinson a1b67160f6 Use math/bits in encoder 2017-09-26 12:51:08 +01:00
Jason Wilder 7fed382dbf Merge pull request #8872 from influxdata/jw-mmap
Fix long process stalls
2017-09-25 14:49:35 -06:00
Jason Wilder 122a74c692 Use synchronous IO for wal and tsm writing
The fysncs due to large writes when writing to TSM files and the
WAL can eventually cause large pauses.  Since we already buffer
writes, using synchronous IO reduces fsync latency by ensuring
the individiual writes hit disk.  This spreads out the latecncy
across multiple writes better.
2017-09-25 12:44:57 -06:00
Edd Robinson 2def219f09 Refactor Shard to further protect Engine 2017-09-25 17:43:30 +01:00
Edd Robinson 4a67f92acc Prevent store from directly accessing Shard's engine 2017-09-25 17:43:01 +01:00
Edd Robinson 8e9cabbb9c Fix race in TagValues when reaching into engine 2017-09-25 17:43:01 +01:00
Edd Robinson 7739ff749a Ensure engine protected by shard mutex 2017-09-25 17:42:30 +01:00
Jason Wilder 5774b44a4c Remove MADV_RANDOM
This was inadvertently added when merging the solaris and unix
mmap files.  This causes large delays due to major page faults.
2017-09-25 10:25:06 -06:00
Edd Robinson ea104596f0 Implement TSI index versioning
This commit adds a basic TSI versioning scheme, by adding a Version field
to an index's MANIFEST file.

Existing TSI indexes will not have this field present in their MANIFEST
files, and thus will be deemed incomatible with the current version.

Users with existing TSI indexes will be able to remove them, and convert the
resulting inmem indexes to the current version of a TSI index using the
influx_inspect tooling.
2017-09-22 17:59:39 +01:00
Jason Wilder 1e345aa7a1 Merge pull request #8856 from influxdata/jw-cache
Snapshot compaction improvements
2017-09-22 10:45:54 -06:00
Edd Robinson 44691847e9 Merge branch 'master' into er-8678-tsi1-where 2017-09-22 16:54:49 +01:00
Jason Wilder 94aba64b88 Re-use index entries slice when writing TSM index 2017-09-21 12:48:16 -06:00
Jason Wilder db204f3eb7 Default concurrent compactions to 50% of available cores 2017-09-21 12:48:11 -06:00
Jason Wilder deef0c5649 Fix 32bit alignment 2017-09-20 10:00:20 -06:00
Jason Wilder 61ca1243c7 Increase index disk writer buffer 2017-09-20 09:05:30 -06:00
Jason Wilder 796de3dcea Reduce encoder pool checkout contention
With higher cardinalities, the encoder pools where become a bottleneck.
This changes the snapshot compactions ot checkout one encoder of each
type and re-use it while writing the snapshots as opposed to repeatedly
checking it out and in.
2017-09-19 15:27:26 -06:00
Jason Wilder 391a6288c6 Write parallel snapshot for higher cardinalities 2017-09-19 15:27:26 -06:00
Jason Wilder 0d52b060df Skip onFileStoreReplace with tsi 2017-09-19 15:27:25 -06:00
Jason Wilder 4fe81aeee6 Remove manual Gosched from compactions
At higher cardinalities, this dramatically slows down compaction throughput.
2017-09-19 15:27:25 -06:00
Jason Wilder 31e785d676 Don't deduplicate a single value 2017-09-19 15:27:25 -06:00
Jason Wilder 2ca9ccee1f Reset snapshot cache outside of write lock 2017-09-19 15:27:25 -06:00
Jason Wilder ddeba2c86b Split large snapshots and write concurrently 2017-09-19 15:27:25 -06:00
Jason Wilder 9ee305f6f5 Periodically re-allocate cache store
This perioically re-allocates the cache store to avoid memory
fragmentation and gradual slow down of the store after repeated
deletes and inserts into the map.
2017-09-19 15:27:25 -06:00
Jason Wilder 2885b9b310 Remove entrySizeHints map
There is a lot of overhead for calculating the hints for larger
cardinalities.  This slows down resetting the partitions in the ring.
2017-09-19 15:27:25 -06:00
Jason Wilder 4124a8ed97 Simplify cache ring
The continuum slice is not needed since the number of partitions
doesn't change.  This removes the slice to make the mapping simpler.
2017-09-19 15:27:25 -06:00
Stuart Carnie ed7bc9d825 fix FindValues panic for empty array 2017-09-19 14:23:32 -07:00
Stuart Carnie 92756ec0ad Reduce allocations, improve readEntries performance by simplifying loop
* callers of ReadEntries and Key API can cache allocated slice
2017-09-19 11:57:10 -07:00
Stuart Carnie baa05de3f8 add benchmarks 2017-09-19 11:47:48 -07:00
Stuart Carnie cfc6a1cd9f implement optimization for Include function
```
benchmark                                            old ns/op     new ns/op     delta
BenchmarkIntegerValues_IncludeNone_1000-8            651           6.69          -98.97%
BenchmarkIntegerValues_IncludeMiddleHalf_1000-8      1131          114           -89.92%
BenchmarkIntegerValues_IncludeFirst_1000-8           638           33.9          -94.69%
BenchmarkIntegerValues_IncludeLast_1000-8            1269          32.2          -97.46%
BenchmarkIntegerValues_IncludeNone_10000-8           7751          6.76          -99.91%
BenchmarkIntegerValues_IncludeMiddleHalf_10000-8     11582         1378          -88.10%
BenchmarkIntegerValues_IncludeFirst_10000-8          7911          43.8          -99.45%
BenchmarkIntegerValues_IncludeLast_10000-8           12442         38.4          -99.69%
```

(cherry picked from commit fb93ad5)
2017-09-19 09:53:28 -07:00
Stuart Carnie ca40c1ad3c <type>Values.Exclude function uses binary search and copy builtin
```
± benchcmp old.txt new.txt
benchmark                                            old ns/op     new ns/op     delta
BenchmarkIntegerValues_ExcludeNone_1000-8            1285          7.34          -99.43%
BenchmarkIntegerValues_ExcludeMiddleHalf_1000-8      1258          148           -88.24%
BenchmarkIntegerValues_ExcludeFirst_1000-8           1268          7.51          -99.41%
BenchmarkIntegerValues_ExcludeLast_1000-8            1125          27.7          -97.54%
BenchmarkIntegerValues_ExcludeNone_10000-8           12665         7.31          -99.94%
BenchmarkIntegerValues_ExcludeMiddleHalf_10000-8     12039         976           -91.89%
BenchmarkIntegerValues_ExcludeFirst_10000-8          12663         7.29          -99.94%
BenchmarkIntegerValues_ExcludeLast_10000-8           10990         34.9          -99.68%
```

(cherry picked from commit d7a3c23)
2017-09-19 09:53:26 -07:00
Jason Wilder 940da04a34 Merge pull request #8829 from influxdata/jw-mmap
Release mmap pages when shard is cold
2017-09-18 12:08:37 -06:00
Jason Wilder 31646aae3a Release mmap pages when shard is cold
This instructs the kernel that it can release memory used by mmap'd
TSM files when they are not actively being used.  It the mappings are
use, the kernel will fault the pages back in.  On linux, this causes
RES memory to drop immediately when run.
2017-09-18 11:51:51 -06:00
Edd Robinson e39de3e427 Merge pull request #8782 from oiooj/pr-shard-fix
Correctly check if the Shard is ready for queries or writes
2017-09-18 18:17:19 +01:00
Jonathan A. Sternberg 2228b91b0d Unsigned data type parsing and prioritization 2017-09-14 12:28:13 -05:00
Jason Wilder 7d467c2047 Fix windows unmapping of anonymous index slice 2017-09-12 10:30:10 -06:00
Jason Wilder b4b3c159cc Fixup rebase 2017-09-11 17:04:10 -06:00
Jason Wilder d5d9f9acfe Remove debug line 2017-09-11 15:31:28 -06:00
Jason Wilder 26f92ce6ac Remove commented out code 2017-09-11 15:30:05 -06:00
Jason Wilder 820856347c Don't use disk temp file for snapshots 2017-09-11 15:29:26 -06:00
Jason Wilder 4ed9c75896 Fix unmapping anonymous memory slice 2017-09-11 15:29:26 -06:00
Jason Wilder 97f7857715 Remove mutex on TSMWriter
This isn't used by more than one goroutine so locks are unnecessary.
2017-09-11 15:29:26 -06:00
Jason Wilder a93a5e9bdf Include the size of the key in the cache size 2017-09-11 15:29:26 -06:00
Jason Wilder 38460ec37e Re-enable compactions during writes
A cold shard that suddenly receives a lot of writes could get a very
big cache that takes a long time to snapshot or causes the cache
max memory limit to be hit more quickly.  This re-enables the compactions
if necessary during writes so we don't have to wait for the shard monitor
goroutine to re-enable them.
2017-09-11 15:29:26 -06:00
Ben Johnson ee4d3c7b3d Invalidate all bloom filters. 2017-09-11 15:29:26 -06:00
Ben Johnson 3c2487b97a Clean up tsi bloom filter invalidation. 2017-09-11 15:29:26 -06:00
Ben Johnson 6af936ee61 Fix bloom filter invalidation. 2017-09-11 15:29:26 -06:00
Ben Johnson a40b2bb210 Simplify bloom filter invalidation. 2017-09-11 15:29:26 -06:00
Edd Robinson 408a78d904 Increase size of SeriesBlock partition 2017-09-11 15:29:26 -06:00
Jason Wilder 7388eb9499 Use disk when writing TSM index 2017-09-11 15:29:25 -06:00
Ben Johnson 0ec2736f23 Incrementally rebuild tsi bloom filters. 2017-09-11 15:29:25 -06:00
Jason Wilder a5a2957567 Reduce allocation in log_file 2017-09-11 15:29:25 -06:00
Jason Wilder d3e832b462 Use offheap memory for indirect index offsets slice 2017-09-11 15:29:25 -06:00
Jason Wilder 91eb9de341 Use existing TSMReader from file store during compactions
Compactions would create their own TSMReaders for simplicity. With
very high cardinality compactions, creating the reader and indirectIndex
can start to use a significant amount of memory.

This changes the compactions to use a reader that is already allocated
and managed by the FileStore.
2017-09-11 15:29:25 -06:00
Jason Wilder 739ecd2ebd Fix a compaction planning bug
There was a race where the plan returned was for files that were just
compacted so the compaction would immediately abort.
2017-09-11 15:26:25 -06:00
Jason Wilder bc4fb0ea10 Sort index entries if necessary
These are already sorted during compaction, so switch to sorting lazily
to avoid the CPU and allocations.  This would only occur when using if
using the writer directly.
2017-09-11 15:26:25 -06:00
Jason Wilder a9e89ede75 Reduce lock contenton on Index
Stat and Size are read-only and can take an RLock.
2017-09-11 15:26:25 -06:00
Jason Wilder f18dec6a4a Use sorted slice for writing TSM index
The directIndex used by the TSMWriter maintained a map of series keys
to index entries.  When the index is written to the TSM file, the keys
are sorted and then written out in order.

The reason for this is because directIndex used to be the only index
and it was optimized more for reading.  The reading has been replaced
by the indirectIndex so the map of keys ends up wasting space.

During compactions, the series keys (and index entries) are already sorted
so this change uses the sorting to avoid the map and sort when writing the
index.  This reduces allocations and CPU usage quite a bit for larger cardinality
TSM files.
2017-09-11 15:26:24 -06:00
Jason Wilder 2a0d7935d7 Switch level 3 compactions to use fast compaction strategy
This leaves the slower compactions that create full blocks to only
the full compaction.  This helps reduce CPU usage and memory while shards
are hot, but increases disk usage (reduced compression) slightly.
2017-09-11 15:26:24 -06:00
Jason Wilder 94e229ff59 Merge branch 'master' into jw-drop-series 2017-09-08 15:34:32 -06:00
Jason Wilder 44e1d3f185 Merge pull request #8804 from influxdata/jw-wal-oom
Fix increased memory usage in cache and wal
2017-09-08 15:10:53 -06:00
Jason Wilder 78922f9821 Set rc to nil when closing WALSegmentReader 2017-09-08 14:55:02 -06:00
Joe LeGasse 4fb35b373b auth: apply series auth to TSI 2017-09-08 09:09:53 -04:00
Jason Wilder b9b648e2a0 Dynamically allocate cache store
The cache store can be memory intensive with many shards.  This
lazyily allocates it when needed and frees it when the cache is
empty and cold.
2017-09-07 16:35:08 -06:00
Jason Wilder 5581f8b4ae Re-use WALSegmentReaders at startup 2017-09-07 12:56:17 -06:00
Jason Wilder e39276b96f Skip reading 0 byte wal segments 2017-09-07 12:24:54 -06:00
Jason Wilder a8d9eeef36 Reduce lock contention when deleting high cardinality series
Deleting high cardinality series could take a very long time, cause
write timeouts as well as dead lock the process.  This fixes these
issue to by changing the approach for cleaning up the indexes and
reducing lock contention.

The prior approach delete each series and updated every index (inmem)
during the delete.  This was very slow and cause the index to be locked
while it items in a slice were removed one by one.  This has been changed
to mark series as deleted and then rebuild the index asynchronously which
speeds up the process.

There was also a dead lock that could occur when deleing the field set.
Deleting the field set held a write lock and the function it invoked under
the lock could try to take a read lock on the field set.  This would then
deadlock.  This approach was also very slow and caused time out for writes.
It now uses faster approach that checks for the existing of the measurment
in the cache and filestore which does not take write locks.
2017-09-07 11:36:02 -06:00
Jonathan A. Sternberg e18425757d Merge pull request #8791 from influxdata/js-explain-cached-values
Include the number of scanned cached values in the iterator cost
2017-09-06 16:00:30 -05:00
Jonathan A. Sternberg 590be193e5 Include the number of scanned cached values in the iterator cost 2017-09-06 15:41:07 -05:00
Stuart Carnie 4a6114028c exported UnloadIndex checks for ready state 2017-09-05 11:22:13 -07:00
kun 8a283e248c Correctly check if the Shard is ready for queries or writes 2017-09-03 15:14:58 +08:00
Jonathan A. Sternberg 091ea5f9a5 Merge pull request #8776 from influxdata/js-explain-plan
Initial implementation of explain plan
2017-09-01 16:19:37 -05:00
Edd Robinson 51e886ba66 Merge pull request #8757 from oiooj/pr-cl
Fix panic when the engine already closed in a shard
2017-09-01 16:59:12 +01:00
Jonathan A. Sternberg 50d404e690 Initial implementation of explain plan
It prints the statistics of each iterator that will access the storage
engine. For each access of the storage engine, it will print the number
of shards that will potentially be accessed, the number of files that
may be accessed, the number of series that will be created, the number
of blocks, and the size of those blocks.
2017-09-01 09:01:10 -05:00
Jonathan A. Sternberg 466fc9026e Reduce how long it takes to walk the varrefs in an expression
This is used quite a bit to determine which fields are needed in a
condition. When the condition gets large, the memory usage begins to
slow it down considerably and it doesn't take care of duplicates.
2017-08-31 09:33:45 -05:00
Joe LeGasse 732a0c2eaa Merge pull request #8769 from influxdata/jl-map-cleanup
cleanup: remove poor usage of ',ok' with maps
2017-08-31 09:18:42 -04:00
Ben Johnson 1dbe0662d8
Use system cursors for measurement, series, and tag key meta queries. 2017-08-30 08:35:20 -06:00
Joe LeGasse a95647b720 cleanup: remove poor usage of ',ok' with maps
There are several places in the code where comma-ok map retrieval was
being used poorly. Some were benign, like checking existence before
issuing an unconditional delete with no cleanup. Others were potentially
far more serious: assuming that if 'ok' was true, then the resulting
pointer retrieved from the map would be non-nil. `nil` is a perfectly
valid value to store in a map of pointers, and the comma-ok syntax is
meant for when membership is distinct from having a non-zero value.
There was only one or two cases that I saw that being used correctly for
maps of pointers.
2017-08-30 09:49:31 -04:00
Stuart Carnie 51eb85193c release lock to avoid dead lock when calling WalkWhereForSeriesIDs
* WalkWhereForSeriesIDs may call SeriesIDs, which may attempt to
  upgrade from a `RLock` to a `Lock`, causing the dead lock
2017-08-29 16:12:51 -07:00
kun 5d5225e77d Fix panic when engine closed in a shard 2017-08-29 17:22:45 +08:00
Stuart Carnie 0ced270197 fix race condition reading map 2017-08-28 13:36:49 -07:00
Edd Robinson d011e43a1b Address feedback 2017-08-23 10:47:01 +01:00
Edd Robinson a5f4b929c9 Ensure Skip is called in test goroutine 2017-08-23 10:47:01 +01:00
Edd Robinson 9be7c5aaa6 Run relevant engine tests on both indexes 2017-08-23 10:47:01 +01:00
Edd Robinson 9c12607c3e Ensure shard tests run with both indexes 2017-08-23 10:46:59 +01:00
Edd Robinson e732cb7a39 Update benchmarks to use sub-benchmarks 2017-08-22 17:51:48 +01:00
Edd Robinson dd808bb77a Ensure TSI tests run with TSI index 2017-08-22 17:51:48 +01:00
Edd Robinson bca4393494 Run most tests for both indexes 2017-08-22 17:51:48 +01:00
Jason Wilder d305b89f74 Merge pull request #8726 from influxdata/jw-tsm-file-leak
Fix leaking tmp file when large compaction aborted
2017-08-22 09:59:23 -05:00
Stuart Carnie 2ef9b489f0 Merge pull request #8727 from influxdata/sgc-finalizer
log message when iterator is closed by finalizer
2017-08-22 07:29:38 -07:00
Stuart Carnie d189621d07 log message when iterator closed by finalizer 2017-08-21 16:46:24 -07:00
Jason Wilder e265d150be Fix leaking tmp file when large compaction aborted
If a large compaction was running and was aborted. It could would leave
some tmp files around for files that it had fully written.  The current
active file was cleaned up, but already completed ones would not.  This
would occur when a TSM file needed to rollover due to size.
2017-08-21 17:04:57 -06:00
Jonathan A. Sternberg 5ce6007347 Merge pull request #8724 from influxdata/js-remove-unused-cursor
This cursor implementation appears to be completely unused
2017-08-21 17:44:51 -05:00
Jonathan A. Sternberg c0f7a8af5b This cursor implementation appears to be completely unused
Remove it so that its existence doesn't confuse someone that this is
actually the cursor. The real cursors appear to be in file_store.gen.go.
2017-08-21 16:27:23 -05:00
Stuart Carnie 25edd7bfdf naming 2017-08-17 15:47:47 -07:00
Stuart Carnie c86dc0d103 redundant allocation is overwritten by line 1769 2017-08-17 11:12:41 -07:00
Stuart Carnie 823f903cc6 inputs are closed if Merge returns error and use <type>FinalizerIterator
* <type>FinalizerIterator sets a runtime finalizer and calls Close
  when garbage collected. This will ensure any associated cursors
  are closed and the associated TSM files released
* `query.Iterators#Merge` call could return an error and the inputs
  would not be closed, causing a cursor leak
2017-08-17 11:12:18 -07:00
Jason Wilder 85842503be Fix deadlock in engine/measurement fields
The OnReplace func ends up trying to acquire locks on MeasurementFields.  When
its called via snapshotting, this can deadlock because the snapshotting goroutine
also holds an RLock on the engine.  If a delete measurement calls is run at the
right time, it will lock the MeasurementFields and try to acquire a lock on the engine
to disable compactions.  This creates a deadlock.

To fix this, the OnReplace callback is moved to a function param to allow only Replace
calls as part of a compaction to invoke it as opposed to both snapshotting and compactions.

Fixes #8713
2017-08-16 16:43:40 -06:00
Jonathan A. Sternberg 697759613c Remove time comparisons from the inner sections of the storage engine 2017-08-16 16:51:13 -05:00
Jonathan A. Sternberg 8bd04ebe39 Remove TimeRange function and replace with a more accurate ConditionExpr function
The ConditionExpr function is more accurate because it parses the
condition and ensures that time conditions are actually used correctly.
That means that attempting to combine conditions with OR will not result
in the query silently pretending it's an AND and nested conditions work
correctly so there is only one way to read the query.

It also extracts the non-time conditions into a separate condition so we
can stop attempting to parse around the time conditions in lower layers
of the storage engine. This change does not remove those hacks, but a
following commit should be able to sanitize the condition and remove
them.
2017-08-16 16:45:35 -05:00
Jonathan A. Sternberg 9a2357c2c0 Separate the query engine into a separate package
This change provides a clear separation between the query engine
mechanics and the query language so that the language can be parsed and
dealt with separate from the query engine itself.
2017-08-16 13:38:43 -05:00
Stuart Carnie 3caeee8a24 fix: cursor leak when cur == nil and aux or conds is not empty 2017-08-16 09:17:20 -07:00
Ben Johnson e0d8cb0ef3
Cardinality AST, parser, & rewriter fixes. 2017-08-16 09:27:29 -06:00
Ben Johnson 60ab1282ea
Refactor system iterators.
Previously pseudo iterators could be created for meta data such
as series, measurement, and tag data. These iterators were created
at a higher level and lacked a lot of the power of the query engine.

This commit moves system iterators down to the series level and
supports the following:

	- _name
	- _seriesKey
	- _tagKey
	- _tagValue
	- _fieldKey

These can be used as normal fields such as:

	SELECT _seriesKey FROM cpu

This will return all the series keys for `cpu`.
2017-08-16 09:27:29 -06:00
Ben Johnson c9b5d60753
Parse SHOW CARDINALITY. 2017-08-16 09:27:15 -06:00
Ben Johnson c4e2ba25c3 Merge pull request #8669 from benbjohnson/1392-tsi-index-migration
TSI Index Migration Tool
2017-08-16 09:16:03 -06:00
David Norton 1d8d739418 fix #8677: check for snapshot size == 0 2017-08-16 09:43:56 -04:00
Jason Wilder 186e44d227 Merge pull request #8702 from influxdata/jw-monitor-cpu
Reduce CPU usage when checking series cardinality
2017-08-15 16:02:17 -06:00
Jason Wilder c74932de94 Limit shard cardinality checks to 1 per database
The tag cardinality checks were run for all inmem shards.  Since inmem
shards share the same index, a lot of the work is redundant.  Inmem shards
also need to sort their measurmenet and tag keys which can be CPU intensive
with many shards or higher cardinality.

This changes the monitoring to just check one shard in each database which
should lower CPU usage due to excessive sorting.  The longer term solution
is to use TSI which would not have this check or required sorting.
2017-08-15 12:17:18 -06:00
Ben Johnson 06bc3b6fbf
TSI Index Migration 2017-08-15 11:40:24 -06:00
Jason Wilder 90e2cadeb6 Fix drop measurement not dropping all data
If there were multiple shards, drop measurement could update the index
and remove the measurement before the other shards ran their deletes.
This causes the later shards to not see any series to delete.

The fix is to all deleteSeries to handle the index delete which already
accounts for removing the measurement when it is fully removed from the
index.
2017-08-15 11:19:45 -06:00
Jason Wilder 61b13eb12b Fix partiallyRead logic
The partiallyRead func didn't account for the initial values and would
return true for blocks that had not been read at all.  This causes a
slower path during compactions that forces a block to be decoded when
it could just be merged as is without decoded.  This causes compactions
to consume more CPU and run slower at times.
2017-08-14 16:44:32 -06:00
Edd Robinson 45969ef3c6 Allow tag filtering when using DELETE with tsi1 2017-08-14 19:09:36 +01:00
Joe LeGasse 1121b69a9e auth: apply FGA to SHOW SERIES 2017-08-09 14:56:53 -04:00
Edd Robinson 0f648e5170 Remove unsafe shenanigans 2017-08-03 16:38:05 +01:00
Edd Robinson 2d57b599e9 Remove debugging statement 2017-08-02 17:24:00 +01:00
Edd Robinson da676a79ae Implement TSI iterator 2017-08-02 16:29:14 +01:00
Edd Robinson befae864bd Add tests for merge function 2017-08-02 14:10:52 +01:00
Edd Robinson aa7095be5a Use a merge-based approach for TagValues 2017-08-02 14:10:52 +01:00
Jason Wilder 94a48774b7 Pull in new index filter 2017-08-02 14:10:52 +01:00
Edd Robinson 1e9ce8e0a7 Add test for TagValues 2017-08-02 14:10:52 +01:00
Stuart Carnie 5449285c4c Merge pull request #8652 from influxdata/sgc-literal-cursor
Reduce allocations using nil cursors and literal value cursors
2017-08-01 10:20:24 -07:00
Jason Wilder 173276a409 Remove unused filestore reference
Reduces cursor struct size from 119 bytes to 111.
2017-08-01 09:41:16 -06:00