Commit Graph

1994 Commits (cc22134d8f8b15418f1492d84c000693d61422af)

Author SHA1 Message Date
Edd Robinson fb646549f4
Index files -> partition files 2017-11-09 09:26:06 -07:00
Ben Johnson 328bffd658
Convert series ids to 64-bits. 2017-11-09 09:26:06 -07:00
Ben Johnson 0ffd94a37a
Fix rebase 2017-11-09 09:25:10 -07:00
Ben Johnson 08e459357a
Fix tsi race conditions. 2017-11-09 09:18:33 -07:00
Ben Johnson c75f1127aa
intermediate 2017-11-09 09:18:33 -07:00
Ben Johnson f223153078
Initial working version of series file. 2017-11-09 09:18:33 -07:00
Ben Johnson e05d4fdeeb
intermediate 2017-11-09 09:18:33 -07:00
Ben Johnson 9ad2b53881
intermediate 2017-11-09 09:18:33 -07:00
Ben Johnson 7259589241
intermediate 2017-11-09 09:18:33 -07:00
Ben Johnson 48b48a8927
intermediate 2017-11-09 09:13:46 -07:00
Edd Robinson 59c4e4b1bc Skip shards we don't have 2017-11-08 13:33:52 +00:00
Ben Johnson 156f25ac23
Improve SHOW TAG KEYS performance. 2017-11-07 10:59:19 -07:00
Edd Robinson e762da9aca Fix race on store close
There was a very small window where it was possible to deadlock during
the close of the Store. When closing, the Store waited on its Waitgroup
under a `Lock`. Naturally, all other goroutines must have been in a
position to call `Done` on the `Waitgroup` before the `Wait` call in
`Close` would return.

For the goroutine running the `monitorShards` method it was possible
that it would be unable to do this. Specifically, if the `monitorShards`
goroutine was jumping into the `t.C` case as the `Close()` goroutine was
acquiring the `Lock` then then `monitorShards` goroutine would be unable
to acquire the `RLock`. Since it would also be unable to progress around
its loop to jump into the `s.closing` case, it would be unable to call
`Done` on the `WaitGroup` and we would have a deadlock.

This was identified during an AppVeyor CI run, though I was unable to
reproduce this locally.
2017-11-07 15:26:46 +00:00
Edd Robinson 07c4fdc1ed Fix data race on SeriesPointIterator 2017-11-07 10:48:23 +00:00
Edd Robinson 88e2ea822d Add inmem shard optimisation to SHOW MEASUREMENTS 2017-11-06 19:15:01 +00:00
Edd Robinson f8353bf300 Check shard index type correctly
Previously we used the EngineOptions to determine which shard index
type we were using. However, these options are set once at runtime
initialisation. Therefore if you're running with TSI enabled but then
accessing a legacy database with the inmem index, TagValues would not
have taken advantage of the inmem index.

This change ensures we always check the actual index of the shard(s).
2017-11-06 19:15:01 +00:00
Edd Robinson fbcb299b8a Support WHERE time clause in SHOW TAG VALUES
This commit adds time support to SHOW TAG VALUES. Time can be used as
both a lower and upper boundary. However, there are some caveats.

For the `inmem` index, filtering by time will still return all results
because the index data is shared across shards.

For the `tsi1` index, filtering by time will only work down to the shard
lever. Specifically, when querying by time all shards within that time
range will be used to generate the results.
2017-11-06 19:15:01 +00:00
Edd Robinson 98d584b63f Use index for SHOW X meta queries
When a meta query does not include a time component then it can be
answered exclusively by the index. This should result in a much faster
query execution that if the TSM engine was engaged.

This commit rewrites the following queries such that they make use
of the index where no time component is present:

  - SHOW MEASUREMENTS
  - SHOW SERIES
  - SHOW TAG KEYS
  - SHOW FIELD KEYS
2017-11-06 19:15:00 +00:00
Stuart Carnie 7cb25ecbff optimized slice when outside timerange
find position then update both slices **once**
2017-11-03 16:31:01 -07:00
Stuart Carnie 295acd6920 also slice values 2017-11-03 15:50:16 -07:00
Stuart Carnie c1da95442c
Merge pull request #9054 from influxdata/js-update-influxql-path-in-templates
Update the influxql path inside of the template files
2017-11-03 09:44:02 -07:00
Jonathan A. Sternberg 748fc4ae79 Update the influxql path inside of the template files 2017-11-03 10:57:17 -05:00
Jonathan A. Sternberg 87ed89ee74 Implement pull request feedback for human readable sizes 2017-11-01 13:08:51 -05:00
Andrew Hare ecb3952fa9 Allow human-readable byte sizes in config
Update support in the `toml` package for parsing human-readble byte sizes.
Supported size suffixes are "k" or "K" for kibibytes, "m" or "M" for
mebibytes, and "g" or "G" for gibibytes. If a size suffix isn't specified
then bytes are assumed.

In the config, `cache-max-memory-size` and `cache-snapshot-memory-size` are
now typed as `toml.Size` and support the new syntax.
2017-11-01 11:09:09 -05:00
Stuart Carnie 9a43c14653
Merge pull request #9041 from influxdata/sgc-influxql
influxdata/influxdb/influxql -> influxdata/influxql
2017-10-31 07:31:31 -07:00
Stuart Carnie f3d45ba301 influxdata/influxdb/influxql -> influxdata/influxql 2017-10-30 14:40:26 -07:00
Jason Wilder 48ebc53154 Revert "Fix race in disableLevelCompactions"
This reverts commit 4f8580fbaa.
2017-10-30 14:14:50 -06:00
Ben Johnson 49c1fca036
Handle nil MeasurementIterator. 2017-10-26 11:25:46 -06:00
Stuart Carnie dc04eaa8f3 Amendments based on feedback
* Fprint* functions
* No nakedness
* clarify panic messages
* spacing between case statements
* remove break in favor of return
* remove goto in favor of for { continue }
2017-10-25 13:38:07 -07:00
Stuart Carnie c39f1ad748 Add batch cursor support to tsdb and tsm1
* batch cursors return slices of timestamps and values to reduce call
  overhead. Significantly improved iteration.
* added CreateCursor API to Shard, Engine
* moved build*Cursor to code gen
2017-10-25 13:38:07 -07:00
Stuart Carnie 3e28323a10 Simplified Decode*Block functions
* array has already been sized correctly
* eliminates bounds checking for each element access
* reduces decoding of 30,000,000 points via storage API from
  584ms to 540ms on average
2017-10-25 13:38:07 -07:00
Edd Robinson 47bd069315 Fix race in Measurement index
Fixes #8989 and #8633.

Previously when issuing commands involving a regex check, walking
through the tags keys/values on a measurement, using the measurement's
index, would be racy.

This commit adds a new `TagKeyValue` type that abstracts away the
multi-layer map we were using as an inverted index from tag keys and
values to series ids. With this abstraction we can also make concurrent
access to this inverted index goroutine safe.

Finally, this commit fixes a very old bug in the index which will affect
any query using a regex. Previously we would always check _every_ tag
against a regex for a measurement, even when we had found a match.
2017-10-25 13:34:21 +01:00
Stuart Carnie b7579340fe return query.ErrQueryInterrupted for read on InterruptCh 2017-10-24 14:10:28 -07:00
Jason Wilder 955829e7c3 Merge pull request #9003 from influxdata/jw-delete-regression
Delete series in batches
2017-10-24 13:54:33 -06:00
Jason Wilder cbbbe8bedb Delete series in batches
This fixes a regression where deleting series keys would happen
one at a time instead of in bulk.
2017-10-24 11:06:21 -06:00
Stuart Carnie 02a05e86ee Add missing template changes for EXPLAIN ANALYZE 2017-10-23 14:46:36 -07:00
Ben Johnson 5a77238f30
Sort & validate TSI key value insertion. 2017-10-23 10:46:01 -06:00
Stuart Carnie e9313876ab EXPLAIN ANALYZE
* Introduces EXPLAIN ANALYZE command, which
  produces a detailed tree of operations used to
  execute the query.

introduce context.Context to APIs

metrics package

* create groups of named measurements
* safe for concurrent access

tracing package

EXPLAIN ANALYZE implementation for OSS

Serialize EXPLAIN ANALYZE traces from remote nodes

use context.Background for tests

group with other stdlib packages

additional documentation and remove unused API

use influxdb/pkg/testing/assert

remove testify reference
2017-10-20 08:01:37 -07:00
Jason Wilder 05131f4453 Fix indirectIndex not removing fully deleted series
If multiple tombstones exists for a series that ended up causing the
full data to be deleted, the blocks were not removed from the offsets
in the index.  This causes the TSMReader to report that a key exist
but does not have any data.

During a compaction, every key should have at least one value.  Since
this invariant was broken, the compaction aborted early and ends up
dropping all series keys that are lexigraphically greater than where
the breakage occured.  This would cause data to be dropped during the
compaction.
2017-10-18 18:16:41 -06:00
Jason Wilder 9f102adabe Abort BlockIterator iteration if deletes detected
This fixes a potential bug where the BlockIterator would skip blocks
if the underlying TSMReader had deletes on it concurrently.  This
could possibly occur due to changes in 91eb9de3 that now use the
existing TSMReaders from the FileStore instead of creating new ones
during compaction.
2017-10-18 18:16:37 -06:00
Jason Wilder 4d171f3f40 Fix data deleted outside of time range 2017-10-18 13:39:47 -06:00
Ben Johnson 62093d2641 Merge pull request #8975 from benbjohnson/tsi-copy-returned-bytes
Copy returned bytes from TSI meta functions.
2017-10-18 09:26:02 -06:00
Ben Johnson 8ad2048a6b
TSI byte copy usage comments. 2017-10-18 07:21:54 -06:00
Ben Johnson d17d0f18e0
Move copyBytes() and copyByteSlices() to bytesutil. 2017-10-18 07:19:46 -06:00
Jason Wilder a6f4069ca7 Fix max select series limit for tsi
TSI did not check that the max select series limit during planning
the same way that inmem did.  This means that the limit could be
set but the planning of a high cardinality query would still OOM
the server.  This fixes that limit as well as makes the query interruptible
during planning.
2017-10-17 15:24:41 -06:00
Ben Johnson dceb88eb30
Copy returned bytes from TSI meta functions. 2017-10-17 14:05:35 -06:00
Jason Wilder 4f8580fbaa Fix race in disableLevelCompactions
There was a race on the WaitGroup where we could end up calling Add
while another goroutine was still waiting.  The functions were confusing
so they have been simplified a bit since the compactions goroutines
have been reworked a lot already.
2017-10-16 10:50:16 -06:00
Jason Wilder 5033783a33 Handle deleted series when rebuilding measurment index 2017-10-16 10:50:16 -06:00
Jason Wilder e683502dd6 Merge pull request #8961 from lrita/master
remove duplicated code in cacheKeyIterator.encode()
2017-10-16 10:17:32 -06:00
Jason Wilder bc360ccfd5 Merge pull request #8970 from influxdata/jw-wal-panic
Fix corrupted wal segment panic on 32 bit systems
2017-10-16 10:00:02 -06:00
Jason Wilder fb7135ddc8 Fix corrupted wal segment panic on 32 bit systems 2017-10-16 09:41:20 -06:00
lrita 2f0aa4a420 remove duplicated code in cacheKeyIterator.encode() 2017-10-13 20:39:15 +08:00
Stuart Carnie a0848eac8c remove unnecessary err value
readKey never sets error, so it is always nil
2017-10-12 08:28:53 -07:00
Jason Wilder 1401950b10 Only schedule one compaction per shard at a time
The scheduling logic ended up favoring more backlogged shards
too much and would starved active, less backed up shards.  This
occurred because the scheduling kicks in once a second.  When it
runs, it schedules as many compactions as it can.  A backed up shard
would end up having more compactions to run during the loop an would
generally get to schedule them more frequently.

This now allows each shard to try and schedule one compaction at a time
which provides a more balanced approach.  At some point, we'll probably
want to more directly balanc the each shards backlog vs letting it happen
somewhat randomly.
2017-10-09 11:40:32 -06:00
Jason Wilder 00a403f60e Reduce allocation in tsmKeyIterator.Next
This reuses some intermediate buffers and structs while compacting
files.
2017-10-04 17:35:56 -06:00
Jason Wilder 6b6ccf1a40 Wait for compaction gorotuines to finish 2017-10-04 10:01:44 -06:00
Jason Wilder 06226d6fd3 Handle orphan lower level TSM files during full planning
Some files seem to get orphan behind higher levels.  This causes
the compactions to get blocked as the lowere level files will not
get picked up by their lower level planners.  This allows the full
plan to identify them and pull them into their plans.
2017-10-04 08:13:14 -06:00
Jason Wilder a1d0b52897 Allow lower priority compactions to use excess capacity
If there is a backlog of level 3 and 4 compacitons, but few level 1
and 2 compactions, allow them to use some excess capacity.
2017-10-04 08:11:44 -06:00
Jason Wilder f2a681c4cf Unconditionally remove file when calling Remove 2017-10-03 10:49:17 -06:00
Jason Wilder 0c0505881f Remove multiple file skipping for full compaction planning
This check doesn't make sense for high cardinality data as the files
typically get big and sparse very quickly.  This causes a lot of extra
disk space to be used which is taken up by large indexes and sparse
data.
2017-10-03 10:48:14 -06:00
Jason Wilder 90df803802 Prevent infinite scheduling loop
One shard might be able to run a compaction, but could fail to
limits being hit.  This loop would continue indefinitely as the
same task would continue to be rescheduled.
2017-10-03 10:48:14 -06:00
Jason Wilder 4ff4ba0841 Use first file in generation for level
With higher cardinality or larger series keys, the files can roll
over early which causes them to take longer to be compacted by higher
levels.  This causes larger disk usage and higher numbers of tsm files
at times.
2017-10-03 10:48:14 -06:00
Jason Wilder 71071ed67a Add compaction backlog stat
This gives an indication as to whether compactions are backed up
or not.
2017-10-03 10:48:14 -06:00
Jason Wilder 16ece490ef Reduce allocation in tsmKeyIterator.Next
The chunked slice is unnecessary and we can re-use k.blocks throughout
the compaction.
2017-10-03 10:48:14 -06:00
Jason Wilder 2c5006fccc Rework snapshotting concurrency
This switches the thresholds that are used for writing snapshots
concurrently.  This scales better than the prior model.
2017-10-03 10:48:14 -06:00
Jason Wilder 3af9c7df37 Remove a defer allocation
Shows up under high cardinality compactions.
2017-10-03 10:48:14 -06:00
Jason Wilder 70817350b7 Ensure temp index files are cleaned up on error 2017-10-03 10:48:14 -06:00
Jason Wilder a5afaf7499 Fix cache mem size not including key size 2017-10-03 10:48:14 -06:00
Jason Wilder ae821f4e2d Rework compaction scheduling
This changes the compaction scheduling to better utilize the available
cores that are free.  Previously, a level was planned in its own goroutine
and would kick off a number of compactions groups.  The problem with this
model was that if there were 4 groups, and 3 completed quickly, the planning
would be blocked for that level until the last group finished.  If the compactions
at the prior level are running more quickly, a large backlog could accumlate.

This now moves the planning to a single goroutine that plans each level in
succession and starts as many groups as it can.  When one group finishes,
the planning will start the next group for the level.
2017-10-03 10:48:13 -06:00
Jason Wilder f668b0cc3f Only use O_SYNC for tsm file writing
Doing this for the WAL reduces throughput quite a bit.
2017-10-03 10:48:13 -06:00
Jason Wilder 1610ae5727 Don't return tsm files part of a compaction plan 2017-10-03 10:48:13 -06:00
Joe LeGasse 1525069213 Merge pull request #8892 from influxdata/jl-tag-values
auth: add series auth to 'show tag values'
2017-10-03 08:47:39 -04:00
Lyon Hill 7e5fd14e8a add in some optimization 2017-10-02 12:02:38 -06:00
Lyon Hill a6cbce0d3e fix issues brought up by joe 2017-10-02 11:41:03 -06:00
Lyon Hill 38dc837910 Fix a minor memory leak when batching points for some services.
fixes #8895
2017-10-02 11:26:25 -06:00
Joe LeGasse 1443b22379 auth: add series auth to 'show tag values' 2017-09-27 20:01:18 -04:00
Edd Robinson e0cba4477c Merge pull request #8885 from influxdata/er-entry-race
Fix race on Cache entry
2017-09-27 18:42:45 +01:00
Edd Robinson d0b81c1e6c Fix race on Cache entry 2017-09-27 18:10:23 +01:00
Edd Robinson a1b67160f6 Use math/bits in encoder 2017-09-26 12:51:08 +01:00
Jason Wilder 7fed382dbf Merge pull request #8872 from influxdata/jw-mmap
Fix long process stalls
2017-09-25 14:49:35 -06:00
Jason Wilder 122a74c692 Use synchronous IO for wal and tsm writing
The fysncs due to large writes when writing to TSM files and the
WAL can eventually cause large pauses.  Since we already buffer
writes, using synchronous IO reduces fsync latency by ensuring
the individiual writes hit disk.  This spreads out the latecncy
across multiple writes better.
2017-09-25 12:44:57 -06:00
Edd Robinson 2def219f09 Refactor Shard to further protect Engine 2017-09-25 17:43:30 +01:00
Edd Robinson 4a67f92acc Prevent store from directly accessing Shard's engine 2017-09-25 17:43:01 +01:00
Edd Robinson 8e9cabbb9c Fix race in TagValues when reaching into engine 2017-09-25 17:43:01 +01:00
Edd Robinson 7739ff749a Ensure engine protected by shard mutex 2017-09-25 17:42:30 +01:00
Jason Wilder 5774b44a4c Remove MADV_RANDOM
This was inadvertently added when merging the solaris and unix
mmap files.  This causes large delays due to major page faults.
2017-09-25 10:25:06 -06:00
Edd Robinson ea104596f0 Implement TSI index versioning
This commit adds a basic TSI versioning scheme, by adding a Version field
to an index's MANIFEST file.

Existing TSI indexes will not have this field present in their MANIFEST
files, and thus will be deemed incomatible with the current version.

Users with existing TSI indexes will be able to remove them, and convert the
resulting inmem indexes to the current version of a TSI index using the
influx_inspect tooling.
2017-09-22 17:59:39 +01:00
Jason Wilder 1e345aa7a1 Merge pull request #8856 from influxdata/jw-cache
Snapshot compaction improvements
2017-09-22 10:45:54 -06:00
Edd Robinson 44691847e9 Merge branch 'master' into er-8678-tsi1-where 2017-09-22 16:54:49 +01:00
Jason Wilder 94aba64b88 Re-use index entries slice when writing TSM index 2017-09-21 12:48:16 -06:00
Jason Wilder db204f3eb7 Default concurrent compactions to 50% of available cores 2017-09-21 12:48:11 -06:00
Jason Wilder deef0c5649 Fix 32bit alignment 2017-09-20 10:00:20 -06:00
Jason Wilder 61ca1243c7 Increase index disk writer buffer 2017-09-20 09:05:30 -06:00
Jason Wilder 796de3dcea Reduce encoder pool checkout contention
With higher cardinalities, the encoder pools where become a bottleneck.
This changes the snapshot compactions ot checkout one encoder of each
type and re-use it while writing the snapshots as opposed to repeatedly
checking it out and in.
2017-09-19 15:27:26 -06:00
Jason Wilder 391a6288c6 Write parallel snapshot for higher cardinalities 2017-09-19 15:27:26 -06:00
Jason Wilder 0d52b060df Skip onFileStoreReplace with tsi 2017-09-19 15:27:25 -06:00
Jason Wilder 4fe81aeee6 Remove manual Gosched from compactions
At higher cardinalities, this dramatically slows down compaction throughput.
2017-09-19 15:27:25 -06:00
Jason Wilder 31e785d676 Don't deduplicate a single value 2017-09-19 15:27:25 -06:00
Jason Wilder 2ca9ccee1f Reset snapshot cache outside of write lock 2017-09-19 15:27:25 -06:00
Jason Wilder ddeba2c86b Split large snapshots and write concurrently 2017-09-19 15:27:25 -06:00