Commit Graph

760 Commits (7374e4e8a457350ab4ba8cb89bdc3cd7d5c2f905)

Author SHA1 Message Date
Jason Wilder 84cbee227a Fix file store not close all TSM files
Regression added via #8192
2017-04-04 10:58:51 -06:00
Jason Wilder 4f850b5cff Skip TestCache_Deduplicate_Concurrent on windows 2017-04-04 08:48:55 -06:00
Jason Wilder 8da84e6144 Merge branch 'master' into tsi 2017-04-03 11:21:02 -06:00
Jason Wilder 32c4d43952 Speed up drop measurement
This reworks drop measurement to use a sorted list of series keys
instead of creating an intermediate map.  It remove allocations
and some extra garbage that is created during drop measurement.
2017-04-03 08:57:53 -06:00
Jason Wilder a78da51b7c Use buffered writer when writing tombstones
When deleting many series, the many small writes flood the disks
and consume a lot of CPU time.
2017-04-03 08:57:52 -06:00
Jason Wilder 6232d5e56d Remove defer allocations in TSMReader 2017-04-03 08:57:52 -06:00
Jason Wilder 920c8396c6 Use sorted merge in FileStore.WalkKeys
WalkKeys serially walked each TSM file and invoked fn for each key.
Caller needed to handle duplicate calls to fn with the same key
because the same key could exist in multiple TSM files.  The serial
execution was also slower.

Since the series keys are already sorted, we can iterate over all
files in parallel and skip duplicates using a sorted merge.  This
fixes the duplicate invocation issue as well as speeds up walking
all keys.

This can significant improve startup performance when many TSM files
exists that may not have been fully compacted.  This also has benefits
for deletes (measurements/series) since duplicates are removed saving
extra allocations and work.  This may also allow for the optimize
compaction to be removed provided startup times are fast enough.
2017-04-03 08:57:52 -06:00
Edd Robinson fddaff2cc8 Merge master in 2017-03-29 18:00:28 +01:00
Ben Johnson 2edfb1c92d
Ignore series limit on database load. 2017-03-24 16:27:16 -06:00
Ben Johnson 9fb8f1ec1d
Fix database and tag limits. 2017-03-24 09:48:10 -06:00
Jason Wilder 631681796d Remove tsl file committed by mistake 2017-03-23 16:18:27 -06:00
Jason Wilder 7119ef8f29 Merge pull request #8193 from influxdata/jw-123-backports
1.2.3 backports
2017-03-23 13:31:35 -06:00
Jason Wilder ca1919e5de Use standard merge algorithm for merging values
The previous version was very innefficient due to the benchmarks used
to optimize it having a bug.  This version always allocates a new
slice, but is O(n).
2017-03-23 12:53:59 -06:00
Jason Wilder ba2571903d Fix broken Values.Merge benchmark
Merge had the side effect of modifying the original values so
the results are wrong because they always hit the fast path
after the first run.
2017-03-23 12:53:50 -06:00
Jason Wilder 890ffb4ce8 Generate encode*Values funcs 2017-03-23 12:53:29 -06:00
Jason Wilder ced953ae89 Use typed values to avoid allocations
This switches compactions to use type values (FloatValues) from the
generic Values type.  It avoids a bunch of allocations where each value
much be converted from a specific type to an interface{}.
2017-03-23 12:53:17 -06:00
Jason Wilder a1c84ae6f3 Add block type for BlockIterator 2017-03-23 12:49:17 -06:00
Jason Wilder 2972a3f223 Remove MMAP derefencing code
This code was added to address some slow startup issues.  It is believed
to be the cause of some segfault panic's that occur at query time when
the underlying MMAP array has been unmapped.  The current structure of
code makes this change unnecessary now.
2017-03-23 12:46:23 -06:00
Jason Wilder 61f80db1b9 Skip cardinaltiy dups on circle race test 2017-03-22 15:20:38 -06:00
Jason Wilder c443e639b0 Fix 32bit alignment issue in wal.sync 2017-03-22 11:21:29 -06:00
Ben Johnson afe41f1c80
Fix tsm1/tsi1 broken tests. 2017-03-21 12:21:48 -06:00
Jason Wilder 8f7b251afd Merge branch 'master' into jw-tsi 2017-03-20 17:17:26 -06:00
Jason Wilder 8177df2dab Simplify Measurement.TagSets signature 2017-03-17 16:19:10 -06:00
Jason Wilder 2d5d899ac2 Allow queries to be interrupted during planning
If a bad query is run, kill query and limits would not kick in until
after it started executing.  Some bad queries that involve high
cardinality can cause the server to OOM just from planning which
defeats the purpose of the max-select-series limit.

This change primarily fixes max-select-series limit so that the query
is killed earlier and has the side effect that kill query now can kill
a query while it's being planned.
2017-03-17 16:00:54 -06:00
Jason Wilder bc4aeefbed Check max-series-limit in shard iterator creation
The limit waited until all the iterators had been created which still
allows problem queries to be planned.  This allows the queries to be
aborted much earlier in some cases.
2017-03-17 16:00:25 -06:00
Jason Wilder e9eb925170 Coalesce multiple WAL fsyncs
Fsyncs to the WAL can cause higher IO with lots of small writes or
slower disks.  This reworks the previous wal fsyncing to remove the
extra goroutine and remove the hard-coded 100ms delay.  Writes to
the wal still maintain the invariant that they do not return to the
caller until the write is fsync'd.

This also adds a new config options wal-fsync-delay (default 0s)
which can be increased if a delay is desired.  This is somewhat useful
for system with slower disks, but the current default works well as
is.
2017-03-15 16:31:03 -06:00
Ben Johnson 1807772388
Fix tsi tests. 2017-03-15 11:23:58 -06:00
Ben Johnson cf7ba96377
Merge branch 'tsi-log-compact' into tsi 2017-03-15 10:18:40 -06:00
Ben Johnson 358b1e0b05
Merge remote-tracking branch 'upstream/master' into tsi 2017-03-15 10:13:32 -06:00
Jason Wilder 65464ea0d1 Merge pull request #8131 from influxdata/jw-values-merge
Use standard merge algorithm when merging Values
2017-03-15 09:51:21 -06:00
Jason Wilder a4cfeacedb Use standard merge algorithm for merging values
The previous version was very innefficient due to the benchmarks used
to optimize it having a bug.  This version always allocates a new
slice, but is O(n).
2017-03-15 08:59:41 -06:00
Jason Wilder 4d37c9dc9e Fix broken Values.Merge benchmark
Merge had the side effect of modifying the original values so
the results are wrong because they always hit the fast path
after the first run.
2017-03-14 14:20:24 -06:00
Jason Wilder ca9c67a877 Generate encode*Values funcs 2017-03-14 11:54:53 -06:00
Jason Wilder 2f7d4995b4 Use typed values to avoid allocations
This switches compactions to use type values (FloatValues) from the
generic Values type.  It avoids a bunch of allocations where each value
much be converted from a specific type to an interface{}.
2017-03-09 16:27:07 -07:00
Jason Wilder 78b7815c49 Add block type for BlockIterator 2017-03-09 09:16:59 -07:00
Jason Wilder b9e5375043 Merge branch '1.2' into jw-merge-12 2017-03-08 13:16:50 -07:00
Jason Wilder 37187cbe6d Delete series under fields lock
Still seeing the panic that switching this logic around was supposed
to fix.  We now delete the bulk of data outside of the fields lock
and then again, under the write lock, to ensure that the field mapping
is accurate.  We don't do the full delete under the lock because it
can block writes and queries that require a read lock.
2017-03-06 14:19:55 -07:00
Jason Wilder 675d7c9d65 Merge branch '1.2' into jw-merge12 2017-03-06 11:09:05 -07:00
Jason Wilder eab012ef61 Fix points missing after compaction
If blocks containing overlapping ranges of time where partially
recombined, it was possible for the some points to get dropped
during compactions.  This occurred because the window of time of
the points we need to merge did not account for the partial blocks
created from a prior merge.

Fixes #8084
2017-03-06 10:17:11 -07:00
Jason Wilder 3c70abf061 Delete series before remove from field index
There is a race where the field type can be deleted while a new type
is written and during a query.  When this happens, an iterator for
the new type is created but old data make still exist in the cache
for TSM files causing a panic.
2017-03-06 09:38:27 -07:00
Jason Wilder 29f8d8de76 Fix race in WALEntry.Encode and Value.Deduplicate
Under high query load, a race exists in the cache and the WAL.  Since
writes currently hit the cache first, they are availble for query before
they hit the WAL.  If the WAL is writing and accessign the Value slice
at the same time that a query is run that needs to dedup the same slice,
a race occurs.

To fix this, the cache now just copies the values instead of storing the
slice passed in.  Another way to fix this might be to have the writes go
to the wal before the cache.  I think the latter would be better, but it
introduces some larger write path issues that we'd need to also address.
e.g. if the cache was full, writes to the WAL would need to be rejected
to avoid filling the disk.

Copying the slice in the cache is simpler for now and does not appear to
dramatically affect performance.
2017-03-06 09:38:22 -07:00
Jason Wilder a024003f2c Merge branch '1.2' into jw-merge-12 2017-02-22 12:13:29 -07:00
Ben Johnson 78a9bb2527 Remove Tags.shouldCopy, replace with forceCopy on series creation.
Previously, tags had a `shouldCopy` flag to indicate if those tags
referenced an underlying buffer and should be copied to allow GC.
Unfortunately, this prevented tags from being copied that were
created and referenced the mmap which caused segfaults.

This change removes the `shouldCopy` flag and replaces it with a
`forceCopy` argument in `CreateSeriesIfNotExists()`. This allows
the write path to indicate that tags must be cloned on insert.
2017-02-21 11:13:35 -07:00
Mark Rushakoff 601cbcd084 Merge branch '1.2' into mr-merge-12 2017-02-17 16:14:22 -08:00
Jonathan A. Sternberg 2fe48d6781 Rename zap import back to github.com/uber-go/zap
They rebased a revision we were previously relying upon that allowed us
to use the vanity name so we are reverting back to an older version with
the old import path.
2017-02-17 17:17:22 -06:00
Ben Johnson 673143a0ad
Remove .tsl file. 2017-02-15 08:44:01 -07:00
Jason Wilder 4b6289ce58 Merge pull request #7942 from influxdata/jw-cache-partitions
Reduce write timeouts
2017-02-10 10:07:08 -07:00
Edd Robinson 38eb6d5994 Don't load meta data for tsi 2017-02-09 18:04:23 +00:00
Edd Robinson a6a2f9d5f0 Don't load meta data for tsi 2017-02-09 17:59:14 +00:00
Jason Wilder 2f74e3f3d5 Use simple8b.CountBytes to avoid allocations 2017-02-09 10:47:03 -07:00
Jason Wilder 1bc0f68490 Merge branch '1.2' into jw-merge-12 2017-02-07 12:48:36 -07:00
Jonathan A. Sternberg e1fa48d0dd Fix ORDER BY time DESC with ordering series keys
The order of series keys is in ascending alphabetical order, not
descending alphabetical order, when it is ordered by descending time.
This fixes the ordering so points are returned in descending order. The
emitter also had the conditions for choosing which iterator to use in
the wrong direction (which only affects aggregates with `FILL(none)`).
2017-02-06 15:49:12 -06:00
Jonathan A. Sternberg 95831b3307 Fix LIMIT and OFFSET when they are used in a subquery
This fixes LIMIT and OFFSET when they are used in a subquery where the
grouping of the inner query is different than the grouping of the outer
query. When organizing tag sets, the grouping of the outer query is
used so the final result is in the correct order. But, unfortunately,
the optimization incorrectly limited the number of points based on the
grouping in the outer query rather than the grouping in the inner query.

The ideal solution would be to use the outer grouping to further
organize it by the grouping for the inner subquery, but that's more
difficult to do at the moment. As an easier fix, the query engine now
limits the output of each series. This may result in these types of
queries being slower in some situations like this one:

    SELECT mean(value) FROM (SELECT value FROM cpu GROUP BY host LIMIT 1)

This will be slower in a situation where the `cpu` measurement has a
high cardinality and many different tags.

This also fixes `last()` and `first()` when they are used in a subquery
because those functions use `LIMIT 1` as an internal optimization.
2017-02-06 14:04:34 -06:00
Jason Wilder 93a9d01643 Increase default waiting WAL writes 2017-02-06 11:48:51 -07:00
Jason Wilder 38a649fc40 Batch multiple WAL fsyncs
Every write to the WAL current runs and fsync before returning.  When
there are lot of concurrent writes, this can cause the WAL to bottleneck
write throughput since fsyncs are very expensive.

This changes the writeToLog to fsync on an interval to allow multiple fsyncs
calls to be batched up into one.  The writeToLog behavior is the same in that
it won't return until an fsync has been performed.
2017-02-06 11:48:45 -07:00
Ben Johnson d91e6eabac
Add max-values-per-tag to inmem index. 2017-02-06 11:14:13 -07:00
Ben Johnson 57f44d5f0c
Include index in snapshot. 2017-02-01 14:19:42 -07:00
Jason Wilder 54ab3a7a0a Don't write lock file store when opening new files
When replacing TSM files, the new files can be opened before
the write lock is taken to reduce lock contention in this code
path.
2017-02-01 11:11:26 -07:00
Jason Wilder 6eb46d2100 Remove unnecessary read lock on engine 2017-02-01 11:10:41 -07:00
Jason Wilder 784a851742 Release cpu during compactions 2017-01-31 17:04:36 -07:00
Jason Wilder 278c1449d6 Increase number of cache partitions 2017-01-31 16:49:57 -07:00
Ben Johnson 047c21f4d9
Merge remote-tracking branch 'upstream/master' into tsi 2017-01-24 09:28:58 -07:00
Edd Robinson feb7a2842c Use unbuffered error channels in tests 2017-01-17 10:53:15 -08:00
Edd Robinson fb7388cdfc Remove dead code from various pkgs 2017-01-17 09:47:34 -08:00
Edd Robinson 292b30b82b Fix subtle bugs and remove dead code from tsdb 2017-01-17 09:47:34 -08:00
Joe LeGasse bf58d9ffb7 Update backup to use ioutil.ReadDir 2017-01-12 16:28:01 -05:00
Jason Wilder 11f264563a Fix 32bit alignment 2017-01-12 12:01:49 -07:00
Jason Wilder 06a8fd6ca2 Simplifications and cleanup 2017-01-12 09:55:38 -07:00
Edd Robinson 73ed864e1d Add cache tests 2017-01-12 16:27:16 +00:00
Jason Wilder 1e56b5416b Fix compactions sometimes getting stuck
I ran into an issue where the cache snapshotting seemed to stop
completely causing the cache to fill up and never recover.  I believe
this is due to the the Timer being reused incorrectly.  Instead,
use a Ticker that will fire more regularly and not require the resetting
logic (which was wrong).
2017-01-11 17:57:40 -07:00
Jason Wilder 40b017f4a4 Fix Cache stats size collection
The memory stats as well as the size of the cache were not accurate.
There was also a problem where the cache size would be increased
optimisitically, but if the cache size limit was hit, it would not
be decreased.  This would cause the cache size to grow without
bounds with every failed write.
2017-01-11 17:54:51 -07:00
Jason Wilder c433ff331f Encode snapshots concurrently
The CacheKeyIterator (used for snapshot compactions), iterated over
each key and serially encoded the values for that key as the TSM
file is written.  With many series, this can be slow and will only
use 1 CPU core even if more are available.

This changes it so that the key space is split amongst a number of
goroutines that start encoding all keys in parallel to improve
throughput.
2017-01-11 17:54:27 -07:00
Jason Wilder ae838ef323 Simplify Cache.Snapshot
This simplifies the cache.Snapshot func to swap the hot cache to
the snapshot cache instead of copy and appending entries.  This
reduces the amount of time the cache is write locked which should
reduce cache contention for the read only code paths.
2017-01-11 11:12:02 -07:00
Jonathan A. Sternberg 3ba950b029 Fix for subqueries to use the parallel iterator correctly
Also, fix the `Iterators.Merge(IteratorOptions)` function so it consults
the `Ordered` attribute to determine which iterator it should use to
merge the input iterators.
2017-01-11 10:47:18 -06:00
Jonathan A. Sternberg b58d1778e2 Remove improper newlines from logging statements 2017-01-10 11:20:09 -06:00
Mark Rushakoff a135906b43 Merge pull request #7747 from influxdata/mr-lint-cleanup
Miscellaneous lint cleanup
2017-01-10 08:22:00 -08:00
Mark Rushakoff 3b3604e362 Fix race in (*tsm1.Cache).values
Without this read lock, this race would happen during a concurrent
snapshot compaction and query.
2017-01-09 14:48:28 -08:00
Jonathan A. Sternberg 4a559c4620 Merge pull request #7646 from influxdata/js-4619-subqueries
Support subquery execution in the query language
2017-01-09 14:14:01 -06:00
Jason Wilder eb4d311c0a Add retry/backup when backing up a shard fails
The backup command can fail if a snapshot is running which silently
closes the connection.  This causes the backup shard command to continue
on as if nothing failed.
2017-01-09 11:28:48 -07:00
Jason Wilder 194c5adfaf Fix race on t.refs
Read at 0x00c42018f620 by goroutine 58:
  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*TSMReader).Close()
      /root/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/reader.go:330 +0x94
  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Close()
      /root/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/file_store.go:464 +0x123

Previous write at 0x00c42018f620 by goroutine 63:
  sync/atomic.AddInt64()
      /usr/local/go/src/runtime/race_amd64.s:276 +0xb
  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*TSMReader).Unref()
      /root/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/reader.go:352 +0x43
  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*KeyCursor).Close()
2017-01-07 12:39:45 -07:00
Jonathan A. Sternberg d7c8c7ca4f Support subquery execution in the query language
This adds query syntax support for subqueries and adds support to the
query engine to execute queries on subqueries.

Subqueries act as a source for another query. It is the equivalent of
writing the results of a query to a temporary database, executing
a query on that temporary database, and then deleting the database
(except this is all performed in-memory).

The syntax is like this:

    SELECT sum(derivative) FROM (SELECT derivative(mean(value)) FROM cpu GROUP BY *)

This will execute derivative and then sum the result of those derivatives.
Another example:

    SELECT max(min) FROM (SELECT min(value) FROM cpu GROUP BY host)

This would let you find the maximum minimum value of each host.

There is complete freedom to mix subqueries with auxiliary fields. The only
caveat is that the following two queries:

    SELECT mean(value) FROM cpu
    SELECT mean(value) FROM (SELECT value FROM cpu)

Have different performance characteristics. The first will calculate
`mean(value)` at the shard level and will be faster, especially when it comes to
clustered setups. The second will process the mean at the top level and will not
include that optimization.
2017-01-07 13:00:48 -06:00
Mark Rushakoff 153277c01d Merge pull request #7786 from influxdata/mr-cache-decrease-size
Use one atomic operation in (*Cache).decreaseSize
2017-01-06 10:17:01 -08:00
Ben Johnson 2b3cd415e2
Fixing rebase. 2017-01-06 09:52:16 -07:00
Ben Johnson d1f1e19591
Fixing rebase. 2017-01-06 09:31:25 -07:00
Ben Johnson c1c98223ec
Fix and optimize tsi1 FileSet. 2017-01-05 10:17:12 -07:00
Ben Johnson 9b1e8215e0
Remove dictionary encoding, add bulk series insertion. 2017-01-05 10:17:11 -07:00
Ben Johnson 9bd19cdc69
Fix inmem DELETE SERIES. 2017-01-05 10:17:11 -07:00
Ben Johnson f9efcb3365
Re-add shared in-memory index. 2017-01-05 10:17:09 -07:00
Edd Robinson 0f9b2bfe6a
Fix tests 2017-01-05 10:16:15 -07:00
Edd Robinson 4ccb8dbab1
Move series count check to shard 2017-01-05 10:16:13 -07:00
Ben Johnson 745b1973a8
tsi compaction 2017-01-05 10:15:37 -07:00
Ben Johnson 183418dcbd
Fix tsi TAG KEYS iterator. 2017-01-05 10:15:36 -07:00
Ben Johnson 9f8b206b51
Fix measurement system queries. 2017-01-05 10:15:34 -07:00
Ben Johnson 4aa78383d1
Fix tsi1 series deletion. 2017-01-05 10:14:48 -07:00
Ben Johnson e7940cc556
Add tsi1 series system iterator. 2017-01-05 10:14:00 -07:00
Ben Johnson 87f4e0ec0a
Add regex support in tsi1. 2017-01-05 10:12:29 -07:00
Jason Wilder 1ba64f3610
Disable max-value-per-tag option temporarily
This is too slow currently and causes all writes to timeout.
2017-01-05 10:11:47 -07:00
Ben Johnson fbe7f464ee
Improve insert performance. 2017-01-05 10:11:12 -07:00
Ben Johnson cb93f10120
Remove per-shard in-memory index. 2017-01-05 10:11:09 -07:00
Ben Johnson 409b0165f5
shared in-memory index 2017-01-05 10:09:57 -07:00