The memory stats as well as the size of the cache were not accurate.
There was also a problem where the cache size would be increased
optimisitically, but if the cache size limit was hit, it would not
be decreased. This would cause the cache size to grow without
bounds with every failed write.
This simplifies the cache.Snapshot func to swap the hot cache to
the snapshot cache instead of copy and appending entries. This
reduces the amount of time the cache is write locked which should
reduce cache contention for the read only code paths.
The previous implementation was susceptible to a race condition (of
correctness) since c.decreaseSize is called without a lock in
(*Cache).WriteMulti.
There were already tests which asserted the correctness of the result of
decreaseSize, so no tests were added or modified.
It looks like the real import path to the project is go.uber.org/zap
instead of github.com/uber-go/zap since the example in the project
references that path.
Currently, whenever a snapshot occurs the Cache is reset and so many
allocations are repeated, as the same type of data is re-added to
the Cache.
This commit allows the stores to keep track of the number of values
within an entry, and use that size as a hint when the same entry needs
to be recreated after a snapshot.
To avoid hints persisting over a long period of time they are deleting
after every snapshot, and rebuilt using the most recent entries only.
The logging library has been switched to use uber-go/zap. While the
logging has been changed to use structured logging, this commit does not
change any of the logging statements to take advantage of the new
structured log or new log levels. Those changes will come in future
commits.
When a limit is exceeded, we return errors and sometimes log (if appropriate)
that a limit was exceeded. The messages don't always provide an indication
as to where or how they are configured.
Instead, return the config option (easily searchable for) as well as the limit
currently set and the value that exceeded it when possible.
If concurrent writes to the same shard occur, it's possible for different types to
be added to the cache for the same series. The way the measurementFields map on the
shard is updated is racy in this scenario which would normally prevent this from occurring.
When this occurs, the snapshot compaction panics because it can't encode different types
in the same series.
To prevent this, we have the cache return an error a different type is added to existing
values in the cache.
Fixes#7498
Reduce the cache lock contention by widening the cache lock scope in WriteMulti, while this sounds counter intuitive it was:
* 1 x Read Lock to read the size
* 1 x Read Lock per values
* 1 x Write Lock per values on race
* 1 x Write Lock to update the size
We now have:
* 1 x Write Lock
This also reduces contention on the entries Values lock too as we have the global cache lock.
Move the calculation of the added size before taking the lock as it takes time and doesn't need the lock.
This also fixes a race in WriteMulti due to the lock not being held across the entire operation, which could cause the cache size to have an invalid value if Snapshot has been run in the between the addition of the values and the size update.
Fix the cache benchmark which where benchmarking the creation of the cache not its operation and add a parallel test for more real world scenario, however this could still be improved.
Add a fast path newEntryValues values for the new case which avoids taking the values lock and all the other calculations.
Drop the lock before performing the sort in Cache.Keys().
Truncate the time interval output of the monitor service to be on even
time intervals rather than on every minute based on the start time. This
normalizes the output from the monitor service.
If cache.Deduplicate is called while writes are in-flight on the cache, a data race
could occur.
WARNING: DATA RACE
Write by goroutine 15:
runtime.mapassign1()
/usr/local/go/src/runtime/hashmap.go:429 +0x0
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Cache).entry()
/Users/jason/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/cache.go:482 +0x27e
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Cache).WriteMulti()
/Users/jason/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/cache.go:207 +0x3b2
github.com/influxdata/influxdb/tsdb/engine/tsm1.TestCache_Deduplicate_Concurrent.func1()
/Users/jason/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/cache_test.go:421 +0x73
Previous read by goroutine 16:
runtime.mapiterinit()
/usr/local/go/src/runtime/hashmap.go:607 +0x0
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Cache).Deduplicate()
/Users/jason/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/cache.go:272 +0x7c
github.com/influxdata/influxdb/tsdb/engine/tsm1.TestCache_Deduplicate_Concurrent.func2()
/Users/jason/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/cache_test.go:429 +0x69
Goroutine 15 (running) created at:
github.com/influxdata/influxdb/tsdb/engine/tsm1.TestCache_Deduplicate_Concurrent()
/Users/jason/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/cache_test.go:423 +0x3f2
testing.tRunner()
/usr/local/go/src/testing/testing.go:473 +0xdc
Goroutine 16 (finished) created at:
github.com/influxdata/influxdb/tsdb/engine/tsm1.TestCache_Deduplicate_Concurrent()
/Users/jason/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/cache_test.go:431 +0x43b
testing.tRunner()
/usr/local/go/src/testing/testing.go:473 +0xdc
In some query scenarios, if there are a lot of points on disk spread
across many blocks in TSM files and a point is overwritten near the
begginning of the shard's timerange, the full series could be loaded
into RAM triggering OOMs and huge allocations.
The issue was that the KeyCursor code that handles overwriting points
had a simple implementation that just deduped the whole series in this
case. This falls over when the series is quite large.
Instead, the KeyCursor has been changed to only decode blocks with
updated points. It then keeps track of what section of the blocks
have been read so they are not re-read when the later points are
decoded.
Since the points in a block are always sorted, the code was also changed
to remove the Deduplicate calls since they end up
reallocating the slice. Instead, we do a sorted merge and re-use
the slice as much as we can.
This has various benefits:
- Users embedding InfluxDB within other Go programs can specify a different logger / prefix easily.
- More consistent with code used elsewhere in InfluxDB (e.g. services, other `run.Server.*` fields, etc).
- This is also more efficient, because it means `executeQuery` no longer allocates a single `*log.Logger` each time it is called.
The cache max memory size is an approximate size and can prevent a
shard from loading at startup. This change disable the max size
at startup to prevent this problem and sets the limt back after
reloading.
Fixes#6109