Commit Graph

92 Commits (fb7388cdfc169921aaac976234d5b7063677afc1)

Author SHA1 Message Date
Jason Wilder 11f264563a Fix 32bit alignment 2017-01-12 12:01:49 -07:00
Jason Wilder 06a8fd6ca2 Simplifications and cleanup 2017-01-12 09:55:38 -07:00
Edd Robinson 73ed864e1d Add cache tests 2017-01-12 16:27:16 +00:00
Jason Wilder 40b017f4a4 Fix Cache stats size collection
The memory stats as well as the size of the cache were not accurate.
There was also a problem where the cache size would be increased
optimisitically, but if the cache size limit was hit, it would not
be decreased.  This would cause the cache size to grow without
bounds with every failed write.
2017-01-11 17:54:51 -07:00
Jason Wilder ae838ef323 Simplify Cache.Snapshot
This simplifies the cache.Snapshot func to swap the hot cache to
the snapshot cache instead of copy and appending entries.  This
reduces the amount of time the cache is write locked which should
reduce cache contention for the read only code paths.
2017-01-11 11:12:02 -07:00
Mark Rushakoff a135906b43 Merge pull request #7747 from influxdata/mr-lint-cleanup
Miscellaneous lint cleanup
2017-01-10 08:22:00 -08:00
Mark Rushakoff 3b3604e362 Fix race in (*tsm1.Cache).values
Without this read lock, this race would happen during a concurrent
snapshot compaction and query.
2017-01-09 14:48:28 -08:00
Mark Rushakoff 153277c01d Merge pull request #7786 from influxdata/mr-cache-decrease-size
Use one atomic operation in (*Cache).decreaseSize
2017-01-06 10:17:01 -08:00
Mark Rushakoff 89a587e865 Use one atomic operation in (*Cache).decreaseSize
The previous implementation was susceptible to a race condition (of
correctness) since c.decreaseSize is called without a lock in
(*Cache).WriteMulti.

There were already tests which asserted the correctness of the result of
decreaseSize, so no tests were added or modified.
2017-01-04 13:13:31 -08:00
Mark Rushakoff 07b87f2630 Miscellaneous lint cleanup 2017-01-03 09:47:32 -08:00
Mark Rushakoff 41415cf2fb Update godoc for tsm1 package 2017-01-02 07:30:18 -08:00
Jason Wilder 2468347ffb Fix comment 2016-12-19 14:17:49 -07:00
Jason Wilder 0b6b9ea1cb Use atomics for cache.snapshotSize stat 2016-12-19 14:17:01 -07:00
Jason Wilder b7c1e625b0 Move needSort tracking to Deduplicate
This eliminates some *UnixNano() calls and also simplifies the cache
logic so that it does not need to worry about whether entries are
sorted.
2016-12-19 14:17:01 -07:00
Jonathan A. Sternberg ec57108520 Use proper uber-go/zap import path
It looks like the real import path to the project is go.uber.org/zap
instead of github.com/uber-go/zap since the example in the project
references that path.
2016-12-15 08:54:14 -06:00
Edd Robinson ec27c57127 Further optimisations and a race fix 2016-12-14 18:23:36 +00:00
Edd Robinson 05ec6ad9ad Add to index safely 2016-12-14 18:23:36 +00:00
Edd Robinson d78ca1a0f3 Fix some races 2016-12-14 18:23:36 +00:00
Edd Robinson d2923c7bf9 Add hints as to how to pre-allocate entry values
Currently, whenever a snapshot occurs the Cache is reset and so many
allocations are repeated, as the same type of data is re-added to
the Cache.

This commit allows the stores to keep track of the number of values
within an entry, and use that size as a hint when the same entry needs
to be recreated after a snapshot.

To avoid hints persisting over a long period of time they are deleting
after every snapshot, and rebuilt using the most recent entries only.
2016-12-14 18:23:36 +00:00
Edd Robinson f2b5c7f5be Reduce contention when adding entries 2016-12-14 18:23:36 +00:00
Edd Robinson 98f0392ca6 Update size using atomic 2016-12-14 18:23:36 +00:00
Edd Robinson 66edb32182 Sharded Cache using a hash ring 2016-12-14 18:23:36 +00:00
Jonathan A. Sternberg 21502a39e8 Switch logging to use structured logging everywhere
The logging library has been switched to use uber-go/zap. While the
logging has been changed to use structured logging, this commit does not
change any of the logging statements to take advantage of the new
structured log or new log levels. Those changes will come in future
commits.
2016-12-14 10:45:15 -06:00
Jason Wilder 3a5a01181b Switch all Value types from pointers 2016-11-15 16:13:55 -07:00
Jason Wilder 0b6f5441b9 Add config option to messages when limits exceeded
When a limit is exceeded, we return errors and sometimes log (if appropriate)
that a limit was exceeded.  The messages don't always provide an indication
as to where or how they are configured.

Instead, return the config option (easily searchable for) as well as the limit
currently set and the value that exceeded it when possible.
2016-10-28 14:54:45 -06:00
Jason Wilder b1ceb5e66d Add cache write OK, Dropped, Error stats
Adds a new dropped stat as well as fixes OK and error stats not
actually get collected and stored.
2016-10-28 12:15:50 -06:00
Jason Wilder 873189e0c2 Fix panic: interface conversion: tsm1.Value is *tsm1.FloatValue, not *tsm1.StringValue
If concurrent writes to the same shard occur, it's possible for different types to
be added to the cache for the same series.  The way the measurementFields map on the
shard is updated is racy in this scenario which would normally prevent this from occurring.
When this occurs, the snapshot compaction panics because it can't encode different types
in the same series.

To prevent this, we have the cache return an error a different type is added to existing
values in the cache.

Fixes #7498
2016-10-28 12:15:50 -06:00
Steven Hartland 3f16197243 Improve tsm1 cache performance
Reduce the cache lock contention by widening the cache lock scope in WriteMulti, while this sounds counter intuitive it was:
* 1 x Read Lock to read the size
* 1 x Read Lock per values
* 1 x Write Lock per values on race
* 1 x Write Lock to update the size

We now have:
* 1 x Write Lock

This also reduces contention on the entries Values lock too as we have the global cache lock.

Move the calculation of the added size before taking the lock as it takes time and doesn't need the lock.

This also fixes a race in WriteMulti due to the lock not being held across the entire operation, which could cause the cache size to have an invalid value if Snapshot has been run in the between the addition of the values and the size update.

Fix the cache benchmark which where benchmarking the creation of the cache not its operation and add a parallel test for more real world scenario, however this could still be improved.

Add a fast path newEntryValues values for the new case which avoids taking the values lock and all the other calculations.

Drop the lock before performing the sort in Cache.Keys().
2016-10-25 15:24:51 -06:00
Jason Wilder 750c8b3932 Reduce lock contention in cache.Values
The cache read lock was held for the whole duration of the call when it
only needs to be held at the beginning since entries have their
own locks.
2016-10-03 10:21:54 -06:00
Jason Wilder 0401527093 Pre-allocate cache store and entries
These were not sized so they always had to be grown causing
garbage to be created.
2016-09-26 12:19:15 -06:00
Jason Wilder 83ca8c3867 Decrement cache memory stat when deleting series 2016-08-29 09:38:41 -06:00
Jason Wilder 03326f993f Add cache write success/error stats 2016-08-29 09:38:32 -06:00
Jonathan A. Sternberg 837a9804cf Refactoring the monitor service to avoid expvar
Truncate the time interval output of the monitor service to be on even
time intervals rather than on every minute based on the start time. This
normalizes the output from the monitor service.
2016-07-07 11:13:58 -05:00
Jason Wilder 2f82d9a525 Truncate the slice when merging the caches 2016-07-05 12:12:21 -05:00
Jason Wilder fdf0bac717 Fix panic: runtime error: index out of range
Fixes #6829
2016-06-27 18:50:48 -06:00
Jason Wilder 838a29cca8 Fix race in cache
If cache.Deduplicate is called while writes are in-flight on the cache, a data race
could occur.

WARNING: DATA RACE
Write by goroutine 15:
  runtime.mapassign1()
      /usr/local/go/src/runtime/hashmap.go:429 +0x0
  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Cache).entry()
      /Users/jason/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/cache.go:482 +0x27e
  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Cache).WriteMulti()
      /Users/jason/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/cache.go:207 +0x3b2
  github.com/influxdata/influxdb/tsdb/engine/tsm1.TestCache_Deduplicate_Concurrent.func1()
      /Users/jason/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/cache_test.go:421 +0x73

Previous read by goroutine 16:
  runtime.mapiterinit()
      /usr/local/go/src/runtime/hashmap.go:607 +0x0
  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Cache).Deduplicate()
      /Users/jason/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/cache.go:272 +0x7c
  github.com/influxdata/influxdb/tsdb/engine/tsm1.TestCache_Deduplicate_Concurrent.func2()
      /Users/jason/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/cache_test.go:429 +0x69

Goroutine 15 (running) created at:
  github.com/influxdata/influxdb/tsdb/engine/tsm1.TestCache_Deduplicate_Concurrent()
      /Users/jason/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/cache_test.go:423 +0x3f2
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:473 +0xdc

Goroutine 16 (finished) created at:
  github.com/influxdata/influxdb/tsdb/engine/tsm1.TestCache_Deduplicate_Concurrent()
      /Users/jason/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/cache_test.go:431 +0x43b
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:473 +0xdc
2016-06-06 15:45:01 -06:00
Jason Wilder bc76048371 Fix panic in cache.DeleteRange
Deleting keys that did not exist in the cache could cause a panic
because the entry returned would be nil and was not checked.
2016-06-06 14:48:53 -06:00
rw dcec206f2e Dedup `.RUnlock` between two conditionals. 2016-05-29 10:20:58 -07:00
rw 1b160d1af0 Low-contention path for pre-existing cache entries.
This change appears to increase bulk ingestion throughput by 2x-3x in
multiprocessor environments.
2016-05-28 23:50:11 -07:00
thbourlove 22c2e7e1c5 fix cache memory size of engine 2016-05-09 21:29:34 +08:00
Jason Wilder a0ac754802 Fix loading huge series into RAM when points are overwritten
In some query scenarios, if there are a lot of points on disk spread
across many blocks in TSM files and a point is overwritten near the
begginning of the shard's timerange, the full series could be loaded
into RAM triggering OOMs and huge allocations.

The issue was that the KeyCursor code that handles overwriting points
had a simple implementation that just deduped the whole series in this
case.  This falls over when the series is quite large.

Instead, the KeyCursor has been changed to only decode blocks with
updated points.  It then keeps track of what section of the blocks
have been read so they are not re-read when the later points are
decoded.

Since the points in a block are always sorted, the code was also changed
to remove the Deduplicate calls since they end up
reallocating the slice.  Instead, we do a sorted merge and re-use
the slice as much as we can.
2016-05-05 09:34:44 -06:00
Ben Johnson f7af787aef
add DELETE query support
This commit adds query language support for deleting series with a
`DELETE` query.
2016-04-27 15:16:23 -06:00
Jason Wilder aefd2ad08b Add DeleteSeries and DeleteSeriesRange 2016-04-27 13:09:53 -06:00
Jason Wilder 0de21ade40 Add delete range of values support to WAL and cache loader 2016-04-27 13:09:53 -06:00
Jason Wilder 4d71d2b01f Add support for deleting cache values using time range 2016-04-27 13:09:52 -06:00
Jason Wilder 87ceb7426a Don't lock the cache while adding entries
Entries have their own locking so the cache doesn't need to be lock
when adding to them.
2016-04-20 16:08:58 -06:00
Jason Wilder fbaa7db54f Don't lock entry when scanning new values to add 2016-04-20 16:00:26 -06:00
Jason Wilder bfa225f149 Merge pull request #6430 from influxdata/jw-cache-load-size
Disable cache max memory size when reloading the cache
2016-04-20 14:35:23 -06:00
Stephen Gutekanst 9dc09c5257 Make logging output location more programmatically configurable (#6213)
This has various benefits:

- Users embedding InfluxDB within other Go programs can specify a different logger / prefix easily.
- More consistent with code used elsewhere in InfluxDB (e.g. services, other `run.Server.*` fields, etc).
- This is also more efficient, because it means `executeQuery` no longer allocates a single `*log.Logger` each time it is called.
2016-04-20 21:07:08 +01:00
Jason Wilder f679787080 Disable cache max memory size when reloading the cache
The cache max memory size is an approximate size and can prevent a
shard from loading at startup.  This change disable the max size
at startup to prevent this problem and sets the limt back after
reloading.

Fixes #6109
2016-04-20 10:41:30 -06:00