Commit Graph

1396 Commits (a801c9dea69789709d1661aef58ceb976383da6f)

Author SHA1 Message Date
Edd Robinson 98f0392ca6 Update size using atomic 2016-12-14 18:23:36 +00:00
Edd Robinson 66edb32182 Sharded Cache using a hash ring 2016-12-14 18:23:36 +00:00
Edd Robinson d3e6d4e7ca Add benchmarks 2016-12-14 18:21:50 +00:00
Jonathan A. Sternberg 21502a39e8 Switch logging to use structured logging everywhere
The logging library has been switched to use uber-go/zap. While the
logging has been changed to use structured logging, this commit does not
change any of the logging statements to take advantage of the new
structured log or new log levels. Those changes will come in future
commits.
2016-12-14 10:45:15 -06:00
gunnaraasen 78b1a0e771 Add stats on dropped measurements and series; Fixes #7697 2016-12-13 15:17:31 -08:00
Jason Wilder 4f28c90b54 Optimize Value.Deduplicate
Deduplicate is called from various places in the engine and can cause
a lot of garbage to get created.  It first creates a map and then
adds each value to the map in order (1st alloc).  It then creates a
new slice (2nd alloc) and appends everything from the map to the slice.
Finally, it sorted the new slice (3rd alloc).

This switches the algorithm to use stable sorting and resuing the existing
slice to avoid allocations.
2016-12-08 21:10:56 -07:00
Hrvoje Marjanovic 9483b8b409 gofmt 2016-12-03 22:06:38 +01:00
Hrvoje Marjanovic 6ed708e3fd Reduce pool size, change WAL writers default
Big pool can lead to huge memory usage in certain loads.

See #7640 for detailed discussion.
2016-12-02 18:45:43 +01:00
Allen Petersen 31129ab0e9 Use slash separator for filenames in tar archives
NO-OP on platforms with unix path separator.
On Windows paths get converted to slashes before adding to archive and back to backslashes during restore.
2016-11-29 09:44:08 -08:00
Jason Wilder 27d157763a Merge pull request #7651 from influxdata/jw-shard-last-modified
Expose Shard.LastModified
2016-11-23 10:19:26 -07:00
Jason Wilder e8a28cfbab Expose Shard.LastModified
This returns the LastModified time of the shard.  The LastModified
time is the wall time when a change to the shards state occurred.
It uses the WAL or FileStore to determine the max mod time.
2016-11-23 10:04:07 -07:00
Edd Robinson b83b8df32f Merge pull request #7635 from influxdata/er-msg
Fix incorrect error message
2016-11-23 13:58:33 +00:00
Edd Robinson 9e9719749f Sprinkle some golint 2016-11-17 16:31:38 +00:00
Edd Robinson 28ba8ced74 Fixes #7625 2016-11-17 16:31:36 +00:00
Jason Wilder 3a5a01181b Switch all Value types from pointers 2016-11-15 16:13:55 -07:00
Jason Wilder bf17074f58 Avoid allocation when counting tag keys
A new sorted slice was called by the monitor func every 10s.  The
tag keys don't need to be sorted so this avoid the allocation of the
slice and one during sorting.
2016-11-15 16:13:55 -07:00
Jason Wilder 0ee58c208a Switch time.Sleep to time.Ticker
Avoids an allocation when calling time.Sleep
2016-11-15 16:13:55 -07:00
Jason Wilder 73b8f52ca0 Cache results onf findGenerations
This allocates quite a bit and it's called multiple times per
second per shard.  The generations don't change until a compaction
has occurred so most of the time is re-calculating the same thing
and creating garbage.
2016-11-15 16:13:55 -07:00
Jason Wilder 0b6f5441b9 Add config option to messages when limits exceeded
When a limit is exceeded, we return errors and sometimes log (if appropriate)
that a limit was exceeded.  The messages don't always provide an indication
as to where or how they are configured.

Instead, return the config option (easily searchable for) as well as the limit
currently set and the value that exceeded it when possible.
2016-10-28 14:54:45 -06:00
Jason Wilder b1ceb5e66d Add cache write OK, Dropped, Error stats
Adds a new dropped stat as well as fixes OK and error stats not
actually get collected and stored.
2016-10-28 12:15:50 -06:00
Jason Wilder 873189e0c2 Fix panic: interface conversion: tsm1.Value is *tsm1.FloatValue, not *tsm1.StringValue
If concurrent writes to the same shard occur, it's possible for different types to
be added to the cache for the same series.  The way the measurementFields map on the
shard is updated is racy in this scenario which would normally prevent this from occurring.
When this occurs, the snapshot compaction panics because it can't encode different types
in the same series.

To prevent this, we have the cache return an error a different type is added to existing
values in the cache.

Fixes #7498
2016-10-28 12:15:50 -06:00
Jason Wilder e388912b6c Fix race in findGenerations
The file store stats slice is re-used which causes the race below:

WARNING: DATA RACE
Write at 0x00c42007e140 by goroutine 43:
  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Stats()
      /Users/jason/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/file_store.go:511 +0x22e
  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*DefaultPlanner).findGenerations()
      /Users/jason/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/compact.go:461 +0x6f
  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*DefaultPlanner).PlanLevel()

Previous read at 0x00c42007e140 by goroutine 40:
  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*DefaultPlanner).findGenerations()
      /Users/jason/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/compact.go:463 +0x13d
  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*DefaultPlanner).PlanOptimize()
2016-10-28 12:15:49 -06:00
Jason Wilder 96c9fb3648 Actually update the defaults for TSM
7510 update the defaults in the sample config, but did not update
the code.  This updates the defaults in the config that changed.
2016-10-26 09:49:25 -06:00
Steven Hartland 3f16197243 Improve tsm1 cache performance
Reduce the cache lock contention by widening the cache lock scope in WriteMulti, while this sounds counter intuitive it was:
* 1 x Read Lock to read the size
* 1 x Read Lock per values
* 1 x Write Lock per values on race
* 1 x Write Lock to update the size

We now have:
* 1 x Write Lock

This also reduces contention on the entries Values lock too as we have the global cache lock.

Move the calculation of the added size before taking the lock as it takes time and doesn't need the lock.

This also fixes a race in WriteMulti due to the lock not being held across the entire operation, which could cause the cache size to have an invalid value if Snapshot has been run in the between the addition of the values and the size update.

Fix the cache benchmark which where benchmarking the creation of the cache not its operation and add a parallel test for more real world scenario, however this could still be improved.

Add a fast path newEntryValues values for the new case which avoids taking the values lock and all the other calculations.

Drop the lock before performing the sort in Cache.Keys().
2016-10-25 15:24:51 -06:00
Jonathan A. Sternberg a515aeda39 Optimize first/last when no group by interval is present
The `first()` and `last()` functions response rate would increase linear
to the number of points even though it seems like it shouldn't. This
optimization greatly reduces the amount of time to return a response
when no `GROUP BY time(...)` clause is present in a query.
2016-10-25 09:57:31 -05:00
Jason Wilder 686d1a7ba4 Remove unused config options 2016-10-24 15:32:38 -06:00
Edd Robinson 0ee093f1fb Memoize output of FileStore.Stats 2016-10-24 10:23:20 -06:00
Jonathan A. Sternberg 3681bc8a43 Filter out series within shards that do not have data for that series
Previously, we would return a full tag set for every shard and the tag
set would include all series that existed in the database index
including series that didn't physically exist within that shard. This
led to the tag sets returned being incredibly huge when we had high
cardinality but sparse data. Since the data was sparse, it was
unexpected that it would cause such a large strain on the system by most
people.

Now we filter out the series ids that are not assigned to the current
shard when computing a tag set for that shard. This lowers the memory
usage for high cardinality sparse data drastically and allows queries on
those to complete successfully.

This does not resolve issues for high cardinality data in every shard
that is also spread out over a long series of time. That situation isn't
nearly as common as the above situation though.
2016-10-20 14:15:34 -05:00
Jason Wilder 2e473e9518 Fix panic in AppendSeriesKeyByID
Calling this function with a series ID that does not exist in
the measurement causes a panic.

Fixes #7334
2016-10-19 11:07:19 -06:00
Jason Wilder b50d9558cf Merge pull request #7479 from influxdata/jw-clean-err
Skip cleanup if dir does not exist
2016-10-18 15:49:09 -06:00
Jason Wilder f30b00c24f Skip cleanup if dir does not exist 2016-10-18 15:33:39 -06:00
Mark Rushakoff 377c40f122 Add stats for active compactions
Unify logic around compaction execution to a single place.

Also report on the error stats that we track. Previously they were not
emitted in the stats output.
2016-10-18 14:12:21 -07:00
Joe LeGasse de9c743004 TSM: update comments for disabling level compactions 2016-10-18 14:14:59 -06:00
Joe LeGasse eda8f70372 TSM: Handle concurrent deletes for compaction 2016-10-18 14:14:59 -06:00
Jason Wilder 47b8049e48 Update comment 2016-10-18 14:14:53 -06:00
Jason Wilder ed7975874f Rename Enabled -> Enable 2016-10-18 12:22:00 -06:00
Jason Wilder f254b4f3ae Allow snapshot compactions during deletes
If a delete takes a long time to process while writes to the
shard are occuring, it was possible for the cache to fill up
and writes to be rejected.  This occurred because we disabled
all compactions while writing tombstone file to prevent deleted
data from re-appearing after a compaction completed.

Instead, we only disable the level compactions and allow snapshot
compactions to continue.  Snapshots already handle deleted data
with the cache and wal.

Fixes #7161
2016-10-18 12:14:51 -06:00
Jonathan A. Sternberg 41e4e73d4e Reduce map allocations when computing the TagSets of a measurement
Instead of assigning a boolean value of true to the filter expressions
when there was no meaningful expression, this drops a boolean expression
of true from the filter expressions so we don't have to perform a map
assignment. This allows us to reduce allocations and assignments when a
`WHERE` clause only contains tag comparisons and no field comparisons.
2016-10-17 12:13:19 -05:00
Jason Wilder a5f871d62c Rework monitoring to avoid allocations 2016-10-10 11:42:15 -06:00
Jason Wilder bbecb3f03d Drop points that would execeed limits
This changes the behavior of the max-series-per-database and
max-values-per-tag limits to drop points that would exceed the limits
and allow the remaining points to be written.  Previously, the whole
batch would fail and return and 500 error to the client.

This now will write the allow points and return a `partial write`
error indicating some of the points were dropped, how many were
dropped and one of the problem measureent and tags.
2016-10-10 11:42:15 -06:00
Jason Wilder 8fce6bba48 Add tag value cardinality limit 2016-10-10 11:42:15 -06:00
Mark Rushakoff 5ae8cf8312 Speed up shutdown
On my machine with about 20 shards, it would take 10+ seconds to shut
down InfluxDB with SIGINT. After this change, it shuts down in nearly
instantly.

(*tsdb.Store).Close was shutting down each of its shards sequentially.
Each shard's engine would signal to its compaction goroutines to quit,
and because each compaction goroutine has a hardcoded 1-second sleep in
between checks, waiting for the goroutines would often block for up to a
second.

This change closes all of the TSDB store's shards in parallel. This
means it's possible that multiple close values could error at once, but
we're still only returning the first error, consistent with previous
behavior. That being said, the return value of (*tsdb.Store).Close is
ignored in (*cmd/influxd/run.Server).Close anyway.
2016-10-10 09:18:47 -07:00
Jason Wilder 798fa0a9f8 Return error with unknown field type
This will just panic when trying to snapshot the value because EmptyValue
can't be written to TSM files.
2016-10-03 16:30:21 -06:00
Jason Wilder 125f106956 Pre-size the values map when write points 2016-10-03 16:30:21 -06:00
Joe LeGasse 743946fafb models: Add FieldIterator type
The FieldIterator is used to scan over the fields of a point, providing
information, and delaying parsing/decoding the value until it is needed.
This change uses this new type to avoid the allocation of a map for the
fields which is then thrown away as soon as the points get converted
into columns within the datastore.
2016-10-03 16:30:21 -06:00
Jason Wilder 20f1fb3f7f Replace gotos with anonymous functions 2016-10-03 12:08:53 -06:00
Jason Wilder 750c8b3932 Reduce lock contention in cache.Values
The cache read lock was held for the whole duration of the call when it
only needs to be held at the beginning since entries have their
own locks.
2016-10-03 10:21:54 -06:00
Jason Wilder 1b462312a9 Re-use decoder pools
The decoders were held onto each iterator to avoid creating them all
the time.  Some of them have use quite a bit of memory so they can
be expensive to create when querying across many series.

Intead, more them to a re-usable pool where we create the minimum that
could active be in use.  This reduces garbage as well as makes the iterators
less expensive to create.
2016-10-03 10:21:54 -06:00
Jason Wilder f727effd7f Merge pull request #7385 from influxdata/jw-query-allocs
Reduce query planning allocations
2016-10-03 09:08:36 -06:00
Jason Wilder a15a416eaa Fix decoding RLE integer blocks with negative deltas
Integer blocks that were run length encoded could produce the wrong
value when read back out because the deltas were not zig zag decoded
before scaling the final value.  If the deltas were negative, as would
be seen in a counter that decrements by a constant value, the results
would be random with som negative and positive values.

Fixes #7391
2016-10-02 23:51:29 -06:00
Jason Wilder 68dd312bb1 Reduce allocations when calculating tagsets
The TagSets function was creating a lot of intermediate maps and
slices to calculate the sorted tag sets.  It first creates a map
to group tag sets with their series, it then created an equally
sized slice of the tag keys and sorted then.  Finally, it created
a new slice and added the tag sets in the original map by the ordering
of the sorted keys.  It was also recreating the tags map multiple time
creating extra garbage in the loop.

This simplifies the code to create one map for grouping and than adding
the distinct sets to a slice which is then sorted.  It also fixes the
multple tag maps getting created.
2016-09-29 16:02:29 -06:00
Mark Rushakoff 97c2f6f5c1 Add walPath tag to shard stats
Without the WAL path as a tag, the diskBytes field looked like it was
reporting the size of the data directory incorrectly.

Fixes #7382.
2016-09-29 10:19:11 -07:00
Jason Wilder dcb65865a2 Merge pull request #7376 from influxdata/jw-revert
Revert re-using byte slices during compactions
2016-09-28 08:24:35 -06:00
joelegasse 87ecd97e7b Merge pull request #7371 from influxdata/2016-09-27--rw--use-gotos-for-encoding-cleanup
Gotos to simplify uses of the new encoder pools.
2016-09-28 08:57:33 -04:00
Jason Wilder 1755f20d2a Revent re-using byte slices during compactions
This is causing a fatal error: fault panic when packing blocks.
2016-09-27 23:41:06 -06:00
Jonathan A. Sternberg e22e33d5fd Merge pull request #7374 from influxdata/merge-from-1.0.1
Merge tag 'v1.0.1'
2016-09-27 20:32:58 -05:00
Jonathan A. Sternberg 3afdf3cd94 Merge tag 'v1.0.1' 2016-09-27 17:53:33 -05:00
rw c3fc87b619 Remove dangling named return value. 2016-09-27 14:18:32 -07:00
rw fcd425c8c6 Incorporate style feedback from Joe. 2016-09-27 14:07:06 -07:00
rw 47c1c6763c Use encoder reset to save on allocs. 2016-09-27 13:31:35 -07:00
rw 9429a2f96a Gotos to simplify uses of the new encoder pools.
For maintainability.
2016-09-27 11:47:25 -07:00
Jason Wilder 5367372253 Merge pull request #7364 from influxdata/2016-09-26-fix-data-race-in-write-path
Fix data race in *tsdb.Shard write path.
2016-09-26 18:34:19 -06:00
rw f131d3cc77 Fix off-by-one error that could panic. 2016-09-26 17:03:03 -07:00
rw 3e0d3be461 Use pre-existing function. 2016-09-26 13:12:10 -07:00
rw bea010b5f3 Fix data race in *tsdb.Shard write path.
Ensure that the Shard's Index is read-locked before calculating the
count of its constituent series.
2016-09-26 12:42:35 -07:00
joelegasse a17d095aae Merge pull request #7350 from influxdata/2016-09-22-reduce-allocs-in-validate-series-and-fields
Remove a few short-lived string allocs. Thanks @rw
2016-09-26 15:01:53 -04:00
Jason Wilder 4b5d989905 Merge pull request #7335 from influxdata/jw-tsm-syscalls
Avoid stat syscall when planning compactions
2016-09-26 12:30:05 -06:00
rw 68c2212aac Shorten name of static-lifetime string var. 2016-09-26 11:26:24 -07:00
rw 02c86ea9db Remove unnecessary string constant. 2016-09-26 11:25:04 -07:00
Jason Wilder 139ef8062e Simplify encoder buffer usage 2016-09-26 12:19:16 -06:00
Jason Wilder 658149a6ff Removed commented out code 2016-09-26 12:19:15 -06:00
Jason Wilder 7f96d78b79 Make encoder re-usable
This allows encoders to be re-used and maintained in a pool to
avoid allocating new ones on every compactions and write of an encoded
block.  The pool used is not a sync.Pool to ensure that the encoders
will not be garbage collected.
2016-09-26 12:19:15 -06:00
Jason Wilder 0401527093 Pre-allocate cache store and entries
These were not sized so they always had to be grown causing
garbage to be created.
2016-09-26 12:19:15 -06:00
Jason Wilder 730ceeea46 Re-used allocated byte slices during compactions 2016-09-26 12:19:15 -06:00
Jason Wilder 6671ef00f0 Reduce allocations in idsForExpr 2016-09-26 08:36:59 -06:00
Jason Wilder c2cfd63091 Avoid stat syscall when planning compactions
When the planner runs, it needs to determine if any files have tombstones.
The code to determine if a tombstone existed involved stating the .tombstone
file.  Since the planner runs very frequently when there are many shards, this
causea a lot of system calls that are unnecessary.

Instead, cache the results of the stats calls and only refresh them when we
haven't checked at least once or we write new tombstone data.

This also caches the results of the TSMReader.Stats call to avoid creating
garbage.
2016-09-24 15:53:28 -06:00
rw b86885c5cd Remove a few short-lived string allocs.
(*tsdb.Shard).validateSeriesAndFields uses fewer string allocs in some
hot spots.
2016-09-22 17:55:57 -07:00
Jason Wilder 39ade11944 Unload index before closing shard
When deleting a shard, the shard is locked and then removed from the
index.  Removal from the index can be slow if there are a lot of
series.  During this time, the shard is still expected to exist by
the meta store and tsdb store so stats collections, queries and writes
could all be run on this shard while it's locked.  This can cause everything
to lock up until the unindexing completes and the shard can be unlocked.

Fixes #7226
2016-09-22 11:16:45 -06:00
Jason Wilder d06b28992d Unload index before closing shard
When deleting a shard, the shard is locked and then removed from the
index.  Removal from the index can be slow if there are a lot of
series.  During this time, the shard is still expected to exist by
the meta store and tsdb store so stats collections, queries and writes
could all be run on this shard while it's locked.  This can cause everything
to lock up until the unindexing completes and the shard can be unlocked.

Fixes #7226
2016-09-16 12:01:50 -06:00
Edd Robinson ed41122ade Pre-allocate map for performance 2016-09-15 18:28:46 +01:00
Jonathan A. Sternberg 477d6231db Update source files to pass vet checks for go 1.7
The vet checks for some files did not pass for go 1.7. As part of a
preliminary start to making go 1.7 work with this software, go vet
should pass.

Also updated the gogo/protobuf dependency which fixed the code generator
to work with go 1.7 too. Ran `go generate` on the entire repository to
ensure every file was up to date.
2016-09-14 15:01:22 -05:00
Edd Robinson 2a99ef751d Emit fieldsCreated stat in shard measurement 2016-09-13 16:41:11 +01:00
Jonathan A. Sternberg 46508cb8c9 Fix engine tags in stats 2016-09-09 17:16:53 -05:00
Jason Wilder 95682faec2 Merge branch '1.0' into jw-merge-10 2016-09-08 09:00:51 -06:00
Edd Robinson 5023419adc Ensure ErrFieldTypeConflict value returned 2016-09-05 13:34:35 +01:00
Jason Wilder 1a35c0a3fc Fix neverending full compactions
The full compaction planner could return a plan that only included
one generation.  If this happened, a full compaction would run on that
generation producing just one generation again.  The planner would then
repeat the plan.

This could happen if there were two generations that were both over
the max TSM file size and the second one happened to be in level 3 or
lower.

When this situation occurs, one cpu is pegged running a full compaction
continuously and the disks become very busy basically rewriting the
same files over and over again.  This can eventually cause disk and CPU
saturation if it occurs with more than one shard.

Fixes #7074
2016-09-03 17:35:14 -06:00
Jason Wilder a6f6fda415 Fix DeleteSeries when multiple fields exists
The logic for determining whether a series key was already in the
the set of TSM series was too restrictive.  It allowed only the first
field of a series to be added leaving all the remaing fields.
2016-08-31 20:53:10 -06:00
Jason Wilder 190537a557 Fix DeleteSeries when multiple fields exists
The logic for determining whether a series key was already in the
the set of TSM series was too restrictive.  It allowed only the first
field of a series to be added leaving all the remaing fields.
2016-08-31 20:35:35 -06:00
Jonathan A. Sternberg dc2527ce86 Merge branch '1.0' 2016-08-31 14:45:57 -05:00
Jonathan A. Sternberg 964341eb20 Optimize queries that compare a tag value to an empty string
The behavior for querying tag values with an empty string was originally
fixed in #6283, but it also added a performance problem when the
cardinality of the tag was high. Since a call to `Union()` or `Reject()`
would happen for every series key and it would be called N times for N
cardinality, the comparisons against a blank string were unnecessarily
slow with large memory allocations.

This optimizes these queries so it doesn't use those methods anymore.
Those methods are still useful and used when combining AND and OR
clauses, but they aren't useful when finding the series ids for a single
clause. These methods were unnecessary anyway because the series ids for
the tags were unique anyway and didn't have to be merged as a set.
2016-08-31 14:03:23 -05:00
Jonathan A. Sternberg f67558c2a7 Merge pull request #7236 from influxdata/js-7220-revert-limit-shard-concurrency
Revert "limit shard concurrency"
2016-08-29 13:41:46 -05:00
Jonathan A. Sternberg c05c7f6360 Revert "limit shard concurrency"
This reverts commit 6c7d56d4bc.
2016-08-29 12:39:52 -05:00
Jason Wilder 3d411371f2 Merge pull request #7233 from influxdata/jw-stats2
Write path stats
2016-08-29 10:15:23 -06:00
Jason Wilder d878d30d18 Fix shard write stats
* Rename *Fail to *Err for consistency with other metrics
* Use index Series count instead of sepaate counter
2016-08-29 09:46:11 -06:00
Jason Wilder e203323776 Add wal write success/error stats 2016-08-29 09:38:48 -06:00
Jason Wilder 83ca8c3867 Decrement cache memory stat when deleting series 2016-08-29 09:38:41 -06:00
Jason Wilder 03326f993f Add cache write success/error stats 2016-08-29 09:38:32 -06:00
Jason Wilder b31bf798f1 Fix runtime: goroutine stack exceeds 1000000000-byte limit
Fixes #7225
2016-08-29 09:26:48 -06:00
Jonathan A. Sternberg 8b234546a8 Merge pull request #7204 from influxdata/1.0
Merge 1.0 branch to master
2016-08-25 15:20:30 -05:00
Jonathan A. Sternberg 10029caf2f Support negative timestamps in the query engine
Negative timestamps are now supported. We also now refuse two
nanoseconds that are at the edge of the minimum time window. One of the
nanoseconds we do not accept is because we need MinInt64 to be used for
some internal comparisons in the TSM engine and it was causing an
underflow when we subtracted one from the minimum time. The second is so
we can have one minimum time that signifies the default minimum that
nobody can write to (so we can implicitly rewrite the timestamp on
aggregate queries) but still use the explicit timestamp if it is given
to us by the user. We aren't able to tell the difference between if the
user provided it or if it was implicit without those values being
different.

If the default minimum time is used with an aggregate query, we rewrite
the time to be the epoch for backwards compatibility since we believe
that's more important than supporting that extra nanosecond.
2016-08-25 12:52:41 -05:00