I ran into an issue where the cache snapshotting seemed to stop
completely causing the cache to fill up and never recover. I believe
this is due to the the Timer being reused incorrectly. Instead,
use a Ticker that will fire more regularly and not require the resetting
logic (which was wrong).
The memory stats as well as the size of the cache were not accurate.
There was also a problem where the cache size would be increased
optimisitically, but if the cache size limit was hit, it would not
be decreased. This would cause the cache size to grow without
bounds with every failed write.
The CacheKeyIterator (used for snapshot compactions), iterated over
each key and serially encoded the values for that key as the TSM
file is written. With many series, this can be slow and will only
use 1 CPU core even if more are available.
This changes it so that the key space is split amongst a number of
goroutines that start encoding all keys in parallel to improve
throughput.
This simplifies the cache.Snapshot func to swap the hot cache to
the snapshot cache instead of copy and appending entries. This
reduces the amount of time the cache is write locked which should
reduce cache contention for the read only code paths.
Also, fix the `Iterators.Merge(IteratorOptions)` function so it consults
the `Ordered` attribute to determine which iterator it should use to
merge the input iterators.
The backup command can fail if a snapshot is running which silently
closes the connection. This causes the backup shard command to continue
on as if nothing failed.
This adds query syntax support for subqueries and adds support to the
query engine to execute queries on subqueries.
Subqueries act as a source for another query. It is the equivalent of
writing the results of a query to a temporary database, executing
a query on that temporary database, and then deleting the database
(except this is all performed in-memory).
The syntax is like this:
SELECT sum(derivative) FROM (SELECT derivative(mean(value)) FROM cpu GROUP BY *)
This will execute derivative and then sum the result of those derivatives.
Another example:
SELECT max(min) FROM (SELECT min(value) FROM cpu GROUP BY host)
This would let you find the maximum minimum value of each host.
There is complete freedom to mix subqueries with auxiliary fields. The only
caveat is that the following two queries:
SELECT mean(value) FROM cpu
SELECT mean(value) FROM (SELECT value FROM cpu)
Have different performance characteristics. The first will calculate
`mean(value)` at the shard level and will be faster, especially when it comes to
clustered setups. The second will process the mean at the top level and will not
include that optimization.
The previous implementation was susceptible to a race condition (of
correctness) since c.decreaseSize is called without a lock in
(*Cache).WriteMulti.
There were already tests which asserted the correctness of the result of
decreaseSize, so no tests were added or modified.
It looks like the real import path to the project is go.uber.org/zap
instead of github.com/uber-go/zap since the example in the project
references that path.
Currently, whenever a snapshot occurs the Cache is reset and so many
allocations are repeated, as the same type of data is re-added to
the Cache.
This commit allows the stores to keep track of the number of values
within an entry, and use that size as a hint when the same entry needs
to be recreated after a snapshot.
To avoid hints persisting over a long period of time they are deleting
after every snapshot, and rebuilt using the most recent entries only.
The logging library has been switched to use uber-go/zap. While the
logging has been changed to use structured logging, this commit does not
change any of the logging statements to take advantage of the new
structured log or new log levels. Those changes will come in future
commits.
Deduplicate is called from various places in the engine and can cause
a lot of garbage to get created. It first creates a map and then
adds each value to the map in order (1st alloc). It then creates a
new slice (2nd alloc) and appends everything from the map to the slice.
Finally, it sorted the new slice (3rd alloc).
This switches the algorithm to use stable sorting and resuing the existing
slice to avoid allocations.
NO-OP on platforms with unix path separator.
On Windows paths get converted to slashes before adding to archive and back to backslashes during restore.
This returns the LastModified time of the shard. The LastModified
time is the wall time when a change to the shards state occurred.
It uses the WAL or FileStore to determine the max mod time.