Previously, we needed a write lock on the cache because it was the
only lock we had available to guard updates to entry.values and
entry.needSort.
However, now we have a entry-scoped lock for this purpose, we don't
need the cache write lock for this purpose. Since merged() doesn't
modify the .store or the c.snapshot.sort, there is no need for
a write lock on the cache to protect the cache.
So, we don't need to escalate here - we simply rely on the entry lock
to protect the entries we are iterating over.
Signed-off-by: Jon Seymour <jon@wildducktheories.com>
Based on @jwilder's alternative to the 'dirty' slice that featured
in previous iterations of this fix.
Suggested-by: Jason Wilder <jason@influxdb.com>
Signed-off-by: Jon Seymour <jon@wildducktheories.com>
Currently two compactors can execute Engine.WriteSnapshot at once.
This isn't thread safe since both threads want to make modifications to
Cache.snapshot at the same time.
This commit introduces a lock which is acquired during Snapshot() and
released during ClearSnapshot(), ensuring that at most one thread
executes within Engine.WriteSnapshot() at once.
To ensure that we always release this lock, but only release the
snapshot resources on a successful commit, we modify ClearSnapshot() to
accept a boolean which indicates whether the write was successful or not
and guarantee to call this function if Snapshot() has been called.
Signed-off-by: Jon Seymour <jon@wildducktheories.com>
There are two tests that show two different one vulnerability.
One test shows that Cache.Deduplicate modifies entries in a snapshot's
store without a lock while cache readers are deduplicating those same
entries while correctly locked.
A second test shows that two threads trying to execute the methods
that Engine.WriteSnapshot calls will cause concurrent, unsynchronized
mutating access to the snapshot's store and entries.
The tests fail at this commit and are fixed by subsequent commits.
Signed-off-by: Jon Seymour <jon@wildducktheories.com>
Fix for #5804.
The commit for #5789 rendered the semantics of snapshotCount statistic
useless. This commit restores semantics that have diagnostic value to
this statistic.
Signed-off-by: Jon Seymour <jon@wildducktheories.com>
The Cache had support for taking multiple snapshots to support writing
multiple snapshots to TSM files concurrently if that happened to be
a bottleneck. In practice, this is never a bottleneck and we only
run one snappshoting goroutine continously per shard which has worked
well for all workloads.
The multiple snapshot support introduces some unhandled failure scenarios
where wal segments could be removed without writing them to TSM files. If
a snapshot compaction fails to write due to transient disk errors, subsequent
snapshots will continue, but the failed one will not be retried. When the
subsequent ones succeeded, all closed wal segments are removed causing data
loss.
This change simplifies the snapshotting capability to ensure that there is only
ever one snapshot. If one fails, the next snapshot will update the existing
snapshot and retry all of old and new data.
Fixes#5686
This fixes a regression introduced in #5757 due to the node.ID getting
assigned by both the meta and data services. When both roles are active,
the data CreateDataNode path was not getting called because a node ID was
already assigned.
This fixes the issue by seeing if a DataNode already exists for our node
ID, and if it does not, we create one.
This fixes a couple of issues with starting meta-only nodes.
1. We were always calling CreateDataNode regardless of whether the the
node is running data services. We only call that now when node is
data enabled.
2. The node.json was created along-side creating the data node. Since
we are not creatinga a data node, this didn't happen anymore. There
wasn't a simple way to do this in one place so it's actually handle
for when creating a meta or a data node now. Since the ID assigned
to the node is the same regardless of role this works in all combinations
of roles.
3. The JoinMetaServer didn't return the ID of the joining node which
created some races when multiple nodes were joining. The join call now
returns that information to the caller.
Fixes#5754
The join option was incorrectly exposed on the meta config. It should
be at the top-level as a string and propogate down to the meta config
as a slice.
The cache had some incorrect logic for determine when a series needed
to be deduplicated. The logic was checking for unsorted points and
not considering duplicate points. This would manifest itself as many
points (duplicate) points being returned from the cache and after a
snapshot compaction run, the points would disappear because snapshot
compaction always deduplicates and sorts the points.
Added a test that reproduces the issue.
Fixes#5719
The intent of this change is to avoid writing caches created for
snapshot cache instances into the tsm1_cache measurement. We can do
this by avoiding use of the NewCache constructor. All other methods
are only intended to be called from on the engine cache - never
on a snapshot.
Signed-off-by: Jon Seymour <jon@wildducktheories.com>
Since we are not locking but relying on atomic arithmetic,
use Add rather than Set. Will also result in slightly less garbage
being created.
Signed-off-by: Jon Seymour <jon@wildducktheories.com>
The intent of this change is to ensure that all statistic fields of the
resulting tsm1_cache measurement are initialized on initialization of
the cache. That way, any consumer of those measurements doesn't
have to deal with the null case.
Signed-off-by: Jon Seymour <jon@wildducktheories.com>
Influx does not support fields with empty names or points
with no fields.
NewPoint is changed to validate that all field names are non-empty.
AddField is removed because we now require that all fields are
specified on construction.
NewPointFromByte is changed to return an error if a unmarshaled
binary point does not have any fields.
newFieldsFromBinary is changed to prevent an infinite loop that
can arise while attempting to parse corrupt binary point data.
TestNewPointsWithBytesWithCorruptData is changed to reflect the
change in the behaviour of NewPointFromByte.
Signed-off-by: Jon Seymour <jon@wildducktheories.com>
Complementing and extending the changes in #5758.
Add 2 level statistics:
* snapshotCount
* cacheAgeMs
Add 2 counter statistics
* cachedBytes
* WALCompactionTimeMs
snapshotCount can be used to measure transient write errors that are causing snapshots to accumulate
cacheAgeMs can be used to guage the level of write activity into the cache
The differences between cachedBytes stats sampled at different times can be used to calculate cache throughput rates
The ratio (cachedBytes-diskBytes)/WALCompactionTimeMs can be used calculate WAL compaction throughput.
The ratio of difference between first and last WAL compaction time over the interval
length is an estimate of percentage of cache throughput consumed.
Signed-off-by: Jon Seymour <jon@wildducktheories.com>