Based on @jwilder's alternative to the 'dirty' slice that featured
in previous iterations of this fix.
Suggested-by: Jason Wilder <jason@influxdb.com>
Signed-off-by: Jon Seymour <jon@wildducktheories.com>
Currently two compactors can execute Engine.WriteSnapshot at once.
This isn't thread safe since both threads want to make modifications to
Cache.snapshot at the same time.
This commit introduces a lock which is acquired during Snapshot() and
released during ClearSnapshot(), ensuring that at most one thread
executes within Engine.WriteSnapshot() at once.
To ensure that we always release this lock, but only release the
snapshot resources on a successful commit, we modify ClearSnapshot() to
accept a boolean which indicates whether the write was successful or not
and guarantee to call this function if Snapshot() has been called.
Signed-off-by: Jon Seymour <jon@wildducktheories.com>
There are two tests that show two different one vulnerability.
One test shows that Cache.Deduplicate modifies entries in a snapshot's
store without a lock while cache readers are deduplicating those same
entries while correctly locked.
A second test shows that two threads trying to execute the methods
that Engine.WriteSnapshot calls will cause concurrent, unsynchronized
mutating access to the snapshot's store and entries.
The tests fail at this commit and are fixed by subsequent commits.
Signed-off-by: Jon Seymour <jon@wildducktheories.com>
Fix for #5804.
The commit for #5789 rendered the semantics of snapshotCount statistic
useless. This commit restores semantics that have diagnostic value to
this statistic.
Signed-off-by: Jon Seymour <jon@wildducktheories.com>
The Cache had support for taking multiple snapshots to support writing
multiple snapshots to TSM files concurrently if that happened to be
a bottleneck. In practice, this is never a bottleneck and we only
run one snappshoting goroutine continously per shard which has worked
well for all workloads.
The multiple snapshot support introduces some unhandled failure scenarios
where wal segments could be removed without writing them to TSM files. If
a snapshot compaction fails to write due to transient disk errors, subsequent
snapshots will continue, but the failed one will not be retried. When the
subsequent ones succeeded, all closed wal segments are removed causing data
loss.
This change simplifies the snapshotting capability to ensure that there is only
ever one snapshot. If one fails, the next snapshot will update the existing
snapshot and retry all of old and new data.
Fixes#5686
The cache had some incorrect logic for determine when a series needed
to be deduplicated. The logic was checking for unsorted points and
not considering duplicate points. This would manifest itself as many
points (duplicate) points being returned from the cache and after a
snapshot compaction run, the points would disappear because snapshot
compaction always deduplicates and sorts the points.
Added a test that reproduces the issue.
Fixes#5719
The intent of this change is to avoid writing caches created for
snapshot cache instances into the tsm1_cache measurement. We can do
this by avoiding use of the NewCache constructor. All other methods
are only intended to be called from on the engine cache - never
on a snapshot.
Signed-off-by: Jon Seymour <jon@wildducktheories.com>
Since we are not locking but relying on atomic arithmetic,
use Add rather than Set. Will also result in slightly less garbage
being created.
Signed-off-by: Jon Seymour <jon@wildducktheories.com>
The intent of this change is to ensure that all statistic fields of the
resulting tsm1_cache measurement are initialized on initialization of
the cache. That way, any consumer of those measurements doesn't
have to deal with the null case.
Signed-off-by: Jon Seymour <jon@wildducktheories.com>
Complementing and extending the changes in #5758.
Add 2 level statistics:
* snapshotCount
* cacheAgeMs
Add 2 counter statistics
* cachedBytes
* WALCompactionTimeMs
snapshotCount can be used to measure transient write errors that are causing snapshots to accumulate
cacheAgeMs can be used to guage the level of write activity into the cache
The differences between cachedBytes stats sampled at different times can be used to calculate cache throughput rates
The ratio (cachedBytes-diskBytes)/WALCompactionTimeMs can be used calculate WAL compaction throughput.
The ratio of difference between first and last WAL compaction time over the interval
length is an estimate of percentage of cache throughput consumed.
Signed-off-by: Jon Seymour <jon@wildducktheories.com>
There was a fix in 5b1791, but is not present in the current branch likely due to a rebase issue.
The current code panics with a query like:
select value from cpu group by host order by time desc limit 1
This fixes the panic as well as prevents #5193 from re-occurring. The issue is that agressively
closing the cursors clears out the seeks slice so re-seeking will fail.
Aux iterators now ask the iterator creator what series will be returned
and determine which aux fields to create based on the results.
The `tsdb.Shards` struct also creates a call iterator around the
iterators returned from each shard.
Fill requires an additional function for IteratorCreator to retrieve the
series that will be returned from the iterator. When fill is required
for an aggregate, the IteratorCreator will be asked what series will be
returned by the created iterator.