If an aggregate derivative query did not have a value in the first
time bucket, it would abort early and return a single row with value
of 0. Similarly, if either the current or previous value was nil,
it would skip the row and not append any values causing gaps and
no data to show up.
Instead, this will append a nil value if either the current or previous
valis is nil. This essentially allows nil values to carry through the
results as well as gives a more sensible value for rows where we cannot
compute a difference (instead of dropping the row as before).
Fixes#4237#4263
This commit changes `tsdb.mapFunc` to use `tsdb.MapInput` instead
of an iterator. This will make it easier and faster to pass blocks
of values from the new storage engine into the engine.
Also, fix it so that heap inserts sort on tags as well.
This change makes it easier to implement bottom and also fixes the test that the previous change broke.
Instead of rounding up the points, sorting and then slicing, keep a
heap that allows us to quickly see if the point needs to be in the
set. This cuts a top query on a dataset of 8 million points from 35
seconds to 11 seconds.
Instead of using closures and 2 different sort routines, have an
interface compare method that makes it easy to switch directions
for comparisons.
Note that this changes the sort order of distinct to match that of
top. While it is a change, I don't think it will break any code. The
important thing for distinct is just that the ordering is absolute,
not what the order is.
Don't declare distinct stat map for partitions. It's more useful to see
the stats collated together per-WAL. This may need further change in the
future.
If the memory gets 5x above the partition size threshold, the WAL will start returning write failures to the clients. This will allow them to backoff their write volume.
Also updated the stress script to track failed requests and output messages on failure and when it returns to success.
With this change, the generic batcher used by many inputs can now be
buffered. Testing shows that this performance of the Graphite input by
10-100%, with the biggest improvements at lower numbers of connections.
* Only fire a go routine to flush and compact if it isn't already running
* Have a sleep backoff time that scales up as the percentage of memory used goes up
This commit fixes `SelectMapper.Open()` so that it properly rolls back
transactions. Previously, this caused transactions to stay open
indefinitely which caused mmap resizes to hang indefinitely.
Start of a lower-level file inspection tool. This currently dumps
summary statistics for the shards, index and WAL that can be used to
understand the shape of the data is in the local shards. This util
operates on the shards itself and not through the server and is intended
more for debugging/troubleshooting.
This change adds support for diagnostics by decomposing the existing
interface into two interfaces -- one for stats, and the other for
diags. It also adds some basic monitor of system, network, and the Go
runtime.
This commit converts meta.ShardInfo.OwnerIDs from a slice of ids
to a slice of objects. This is to support adding statuses for a
shard for a given node. For example, a node may have a shard
assigned to it but it is currently copying the shard and is not
ready to serve data for it.
The old `OwnerIDs` is marked as deprecated, however, the code
still supports loading from older protobuf-encoded data.
Running show measurements in a partially replicated cluster produces inconsistent
results due to the connection pooling. When running remote meta-data queries,
the cluster service ends ups keeping map shard request open but still checks the connection
back into the pool. This causes inconsistent results because data from the last request
interferes with the new request.
This removes the connection pool which fixes the issue. It also has the side effect of fixing
a nodes pool connections that have gone bad when a node restarts. For example, in a 3 node cluster
that has been responding to queries correctly, restarting 1 node will cause all the other to fail
to query that node indefinitely. This is now fixed as well.
Some nodes may receive requests for shards that they are assigned but
do not have the shards locally yet. Just return no results in this case
instead of panicing.
The TSDBStore was returning a nil mapper if the shard did not exist. The caller always
assumed the mapper would not be nil causing a panic. Instead, have the mapper skip the mapping
phase if it's shard reference is nil. This fixes queries against data-only nodes and against
shards that are not fully replicated in the cluster.
Fixes#3574
A write lock was being taken to read the memory size to determine if writes
should be paused. What happens is that writers get blocked indefintely when
trying to acquire a write lock which makes writes pause (or stop) for long periods
of time.
The log was deferring the release of the read lock on the WAL. This had
the affect that a read-lock was held until after the partition finished writing
(which maintains it's own locks). The read lock is only needed around the call
to pointsToPartions so it can get a consistent copy of the points to write. After
that calls returns, a lock is not needed so free it immediatedly.
addToCache is called in a goroutine and can panic if the server is closed while opening. If
part of the open func errors, it returns an error and immediately calls close. close sets
p.cache to nil which causes the goroutine trying to initialized the cache to panic as well. The
goroutine should run under a write lock to avoid this race/panic.
If LoadMetadataIndex() tries to log an error, it causes a panic because the
logger is not set until Open() is called, which is after LoadMetaDataIndex() returns.
Instead, just set the logger up when the WAL is created.