This change adds support for diagnostics by decomposing the existing
interface into two interfaces -- one for stats, and the other for
diags. It also adds some basic monitor of system, network, and the Go
runtime.
This commit converts meta.ShardInfo.OwnerIDs from a slice of ids
to a slice of objects. This is to support adding statuses for a
shard for a given node. For example, a node may have a shard
assigned to it but it is currently copying the shard and is not
ready to serve data for it.
The old `OwnerIDs` is marked as deprecated, however, the code
still supports loading from older protobuf-encoded data.
Running show measurements in a partially replicated cluster produces inconsistent
results due to the connection pooling. When running remote meta-data queries,
the cluster service ends ups keeping map shard request open but still checks the connection
back into the pool. This causes inconsistent results because data from the last request
interferes with the new request.
This removes the connection pool which fixes the issue. It also has the side effect of fixing
a nodes pool connections that have gone bad when a node restarts. For example, in a 3 node cluster
that has been responding to queries correctly, restarting 1 node will cause all the other to fail
to query that node indefinitely. This is now fixed as well.
Some nodes may receive requests for shards that they are assigned but
do not have the shards locally yet. Just return no results in this case
instead of panicing.
The TSDBStore was returning a nil mapper if the shard did not exist. The caller always
assumed the mapper would not be nil causing a panic. Instead, have the mapper skip the mapping
phase if it's shard reference is nil. This fixes queries against data-only nodes and against
shards that are not fully replicated in the cluster.
Fixes#3574
A write lock was being taken to read the memory size to determine if writes
should be paused. What happens is that writers get blocked indefintely when
trying to acquire a write lock which makes writes pause (or stop) for long periods
of time.
The log was deferring the release of the read lock on the WAL. This had
the affect that a read-lock was held until after the partition finished writing
(which maintains it's own locks). The read lock is only needed around the call
to pointsToPartions so it can get a consistent copy of the points to write. After
that calls returns, a lock is not needed so free it immediatedly.
addToCache is called in a goroutine and can panic if the server is closed while opening. If
part of the open func errors, it returns an error and immediately calls close. close sets
p.cache to nil which causes the goroutine trying to initialized the cache to panic as well. The
goroutine should run under a write lock to avoid this race/panic.
If LoadMetadataIndex() tries to log an error, it causes a panic because the
logger is not set until Open() is called, which is after LoadMetaDataIndex() returns.
Instead, just set the logger up when the WAL is created.
Writes could timeout and when adding new measurement names to the
index if the sort took a long time. The names slice was never
actually used (except a test) so keeping it in index wastes memory
and sort it wastes CPU and increases lock contention. The sorting
was happening while the shard held a write-lock.
Fixes#3869
If we've seeked a cursor, then we can be sure that there will be no
data between it and the point that was seeked to. Take advantage of
this fact to only seek when it would yield us a value that would be
different from the last.
In addition, only init the pointsheap when doing a raw query. For
aggregate queries, it is reinitialized on every time bucket, so no
need to seek through all the cursors
For a synthetic database where there was only entries for a tiny
slice of time, it cut queries from 112 seconds to 30 seconds doing
`select mean(value) from cpu where time > now - 2h group by time(1h)`
This commit changes the default block size from 64KB to 4KB for
bz1. This was lowered because small blocks were being uncompressed,
merged, recompressed, and inserted for a large portion of updates.
This became slower and slower over time until it reached the 64KB
threshold. We moved to the 4KB threshold in order to lower the
impact of this recompression.
Since we already got a pointsHeapItem, let's just reuse it instead
of allocating a new one. This cuts allocated memory of a 1 million
points aggregate query from 4881.97MB to 4139.86MB
The buffer allocation in bz1 was unused and I'm fairly certain that it
was harmful to performance if used. For queries that run through a bz1
block, needing to hold on to a 64kb block is expensive. Better to churn
on the allocator and have the blocks be released when they are unused
than to have 64kb hanging around for each series regardless of size.
Thanks to @jwilder for brainstorming this issue with me.