This allows encoders to be re-used and maintained in a pool to
avoid allocating new ones on every compactions and write of an encoded
block. The pool used is not a sync.Pool to ensure that the encoders
will not be garbage collected.
When the planner runs, it needs to determine if any files have tombstones.
The code to determine if a tombstone existed involved stating the .tombstone
file. Since the planner runs very frequently when there are many shards, this
causea a lot of system calls that are unnecessary.
Instead, cache the results of the stats calls and only refresh them when we
haven't checked at least once or we write new tombstone data.
This also caches the results of the TSMReader.Stats call to avoid creating
garbage.
When deleting a shard, the shard is locked and then removed from the
index. Removal from the index can be slow if there are a lot of
series. During this time, the shard is still expected to exist by
the meta store and tsdb store so stats collections, queries and writes
could all be run on this shard while it's locked. This can cause everything
to lock up until the unindexing completes and the shard can be unlocked.
Fixes#7226
When deleting a shard, the shard is locked and then removed from the
index. Removal from the index can be slow if there are a lot of
series. During this time, the shard is still expected to exist by
the meta store and tsdb store so stats collections, queries and writes
could all be run on this shard while it's locked. This can cause everything
to lock up until the unindexing completes and the shard can be unlocked.
Fixes#7226
The vet checks for some files did not pass for go 1.7. As part of a
preliminary start to making go 1.7 work with this software, go vet
should pass.
Also updated the gogo/protobuf dependency which fixed the code generator
to work with go 1.7 too. Ran `go generate` on the entire repository to
ensure every file was up to date.
The full compaction planner could return a plan that only included
one generation. If this happened, a full compaction would run on that
generation producing just one generation again. The planner would then
repeat the plan.
This could happen if there were two generations that were both over
the max TSM file size and the second one happened to be in level 3 or
lower.
When this situation occurs, one cpu is pegged running a full compaction
continuously and the disks become very busy basically rewriting the
same files over and over again. This can eventually cause disk and CPU
saturation if it occurs with more than one shard.
Fixes#7074
The logic for determining whether a series key was already in the
the set of TSM series was too restrictive. It allowed only the first
field of a series to be added leaving all the remaing fields.
The logic for determining whether a series key was already in the
the set of TSM series was too restrictive. It allowed only the first
field of a series to be added leaving all the remaing fields.
The behavior for querying tag values with an empty string was originally
fixed in #6283, but it also added a performance problem when the
cardinality of the tag was high. Since a call to `Union()` or `Reject()`
would happen for every series key and it would be called N times for N
cardinality, the comparisons against a blank string were unnecessarily
slow with large memory allocations.
This optimizes these queries so it doesn't use those methods anymore.
Those methods are still useful and used when combining AND and OR
clauses, but they aren't useful when finding the series ids for a single
clause. These methods were unnecessary anyway because the series ids for
the tags were unique anyway and didn't have to be merged as a set.
Negative timestamps are now supported. We also now refuse two
nanoseconds that are at the edge of the minimum time window. One of the
nanoseconds we do not accept is because we need MinInt64 to be used for
some internal comparisons in the TSM engine and it was causing an
underflow when we subtracted one from the minimum time. The second is so
we can have one minimum time that signifies the default minimum that
nobody can write to (so we can implicitly rewrite the timestamp on
aggregate queries) but still use the explicit timestamp if it is given
to us by the user. We aren't able to tell the difference between if the
user provided it or if it was implicit without those values being
different.
If the default minimum time is used with an aggregate query, we rewrite
the time to be the epoch for backwards compatibility since we believe
that's more important than supporting that extra nanosecond.
This commit fixes the `MaxSelectSeriesN` limit which was broken by
the implementation of lazy iterators. The setting previously limited
the total number of series but the new implementation limits the
concurrent number of series being processed.
This commit limits queries to only process one shard at a time.
However, within a shard, multiple series can still be processed in
parallel. Shard iterators are lazily instantiated during query
execution to limit the amount of memory a given query uses.
The path info only contained the file name which caused tombstone
files to not be removed if there were queries running against
a file that was compacted.
This is now consistent with the TSMReader.Path which returns the
full path info.
If they were left around, re-enabling them again could cause
future compactions to continuously fail. A restart of the
server would clean them up correctly though.
If there were multiple TSM files and a delete/drop was run,
we would write the delete series to the tombstone file N
times for each file. This occurred because FileStore.WalkKeys walks
every key in every TSM file which can return duplicate keys.
This issue caused TSM files to be much larger than they should be
and also cause large memory usage during the delete.
This keeps some memory bounds when reloading a TSM files tombstones
so that the heap does not grow exceedintly fast and stay there
after the deletes are applied.
Tombstone were read fully into memory at startup which could consume
a lot of RAM and OOM the process if there were a lot of deleted
series and many TSM files.
This now walks the tombstone file and iteratively applies the tombstone
which uses significantly less RAM. This may be slightly slower in the
generate cause, but should scale better.
The `SHOW MEASUREMENTS` and `SHOW TAG VALUES` cannot go through the
query engine to get the speed they need. They also only need access to
the database index and do not need access to specific shards. This
removes the query rewriting that was done to turn these two queries into
a select statement and reimplements them inside of the coordinator as an
interface on the TSDBStore.
Normally, compactions do not conflict on the files they are compacting.
If the full cold threshold is set very low, it can cause conflicts where
two compactions compact the same files. The full compaction was the
only place this could happen as it's planning is greedy.
To make this safer for concurrent execution, the compaction tracks which
files are current being compacted and prevents any new compactions from
starting if the file set overlaps.
Fixes#6595
If a query is interrupted via kill query, the tsm files managed
by the file store purger would never get removeed because
KeyCursor.Close was never called.
KeyCursor.Close should always be called now.
If a query was running against a file being compacted, we close the file
and the query would end wherever it had read up to. This could result
in queries that randomly lost data, but running them again showed the
full results.
We now use a reference counting approach and move the in-use files out
of the way in the filestore and allow the queries to complete against
the old tsm files. The new files are installed and new queries will
use them.
Fixes#5501
benchmark old ns/op new ns/op delta
BenchmarkBooleanDecoder_2048-4 9954 7846 -21.18%
benchmark old allocs new allocs delta
BenchmarkBooleanDecoder_2048-4 0 0 +0.00%
benchmark old bytes new bytes delta
BenchmarkBooleanDecoder_2048-4 0 0 +0.00%
There was a race where the same series would get added to the in-memory
index for a measurement more than once. This would result in the same
series being returned more than once during queries causing duplicate
results. The issue was that we check for the series under the read
lock, but did not check again under the write lock where there was
a small window where the series could be added by another goroutine.
We now check for the series under the write lock.
Fixes#6946
A slower disk can can cause excessive allocations to occur when
writing to the WAL because the slower encoding and compression occurs
before taking the write lock. The encoding/compression grabs a large
byte slice from a pool and ultimately waits until it can acquire the
write lock.
This adds a throttle to limit how many inflight WAL writes can be queued
up to prevent OOMing the processess with slower disks and heavy writes.
If a delete is issued while a compaction is running, the a newly
deleted series could re-appear after the compaction completed. This
could occur the compaction had already written the blocks for series
that were just deleted. When the compaction completes, the newly
written tombstone files would be deleted, essentially undeleting the
series.