I believe this change address the issues with hinted-handoff not fully replicating all data to nodes that come back online after an outage.. A detailed explanation follows.
During testing of of hinted-handoff (HH) under various scenarios, HH stats showed that the HH Processor was occasionally encountering errors while unmarshalling hinted data. This error was not handled completely correctly, and in clusters with more than 3 nodes, this could cause the HH service to stall until the node was restarted. This was the high-level reason why HH data was not being replicated.
Furthermore by watching, at the byte-level, the hinted-handoff data it could be seen that HH segment block lengths were getting randomly set to 0, but the block data itself was fine (Block data contains hinted writes). This was the root cause of the unmarshalling errors outlined above. This, in turn, was tracked down to the HH system opening each segment file multiple times concurrently, which was not file-level thread-safe, so these mutiple open calls were corrupting the file.
Finally, the reason a segment file was being opened multiple times in parallel was because WriteShard on the HH Processor was checking for node queues in an unsafe manner. Since WriteShard can be called concurrently this was adding queues for the same node more than once, and each queue-addition results in opening segment files.
This change fixes the locking in WriteShard such the check for an existing HH queue for a given node is performed in a synchronized manner.
Without this change if hinted-handoff was disabled the service would
correctly reject writes, but it would process any data sitting in
hinted-handoff queues. With this change the service is completely
disabled.
With this change Graphite TCP connections are tracked on a per-service
basis. This allows a closing Graphite service to first shutdown any
active connections, thereby unblocking the rest of shutdowm.
This work exposed small shortcomings with the existing Diagnostics
system and that code has alse been tweaked.
Fixes issue #4017
With this change, the generic batcher used by many inputs can now be
buffered. Testing shows that this performance of the Graphite input by
10-100%, with the biggest improvements at lower numbers of connections.
If a timestamp was larger than the max epoch value was sent via
graphite it would cause the timestamp to overflow when it was
marshaled/unmarshaled back from the raft log. The overflow cause
the shard group to get created with the wrong timestamp which cause
a panic when writing the point. The panic was caused because the
timestamp that were supposed to exists in a map created by MapShards
did not actually exist so a nil ShardGroup was used.
The change prevents creating the point with an invalid timestamp. Since
graphite using a timestamp in seconds, the maximum range is known and
can be prevented. This also adds a check for the minimum range as well.
Fixes#3785
This change adds support for diagnostics by decomposing the existing
interface into two interfaces -- one for stats, and the other for
diags. It also adds some basic monitor of system, network, and the Go
runtime.