The `SHOW MEASUREMENTS` and `SHOW TAG VALUES` cannot go through the
query engine to get the speed they need. They also only need access to
the database index and do not need access to specific shards. This
removes the query rewriting that was done to turn these two queries into
a select statement and reimplements them inside of the coordinator as an
interface on the TSDBStore.
Normally, compactions do not conflict on the files they are compacting.
If the full cold threshold is set very low, it can cause conflicts where
two compactions compact the same files. The full compaction was the
only place this could happen as it's planning is greedy.
To make this safer for concurrent execution, the compaction tracks which
files are current being compacted and prevents any new compactions from
starting if the file set overlaps.
Fixes#6595
If a query is interrupted via kill query, the tsm files managed
by the file store purger would never get removeed because
KeyCursor.Close was never called.
KeyCursor.Close should always be called now.
If a query was running against a file being compacted, we close the file
and the query would end wherever it had read up to. This could result
in queries that randomly lost data, but running them again showed the
full results.
We now use a reference counting approach and move the in-use files out
of the way in the filestore and allow the queries to complete against
the old tsm files. The new files are installed and new queries will
use them.
Fixes#5501
benchmark old ns/op new ns/op delta
BenchmarkBooleanDecoder_2048-4 9954 7846 -21.18%
benchmark old allocs new allocs delta
BenchmarkBooleanDecoder_2048-4 0 0 +0.00%
benchmark old bytes new bytes delta
BenchmarkBooleanDecoder_2048-4 0 0 +0.00%
There was a race where the same series would get added to the in-memory
index for a measurement more than once. This would result in the same
series being returned more than once during queries causing duplicate
results. The issue was that we check for the series under the read
lock, but did not check again under the write lock where there was
a small window where the series could be added by another goroutine.
We now check for the series under the write lock.
Fixes#6946
A slower disk can can cause excessive allocations to occur when
writing to the WAL because the slower encoding and compression occurs
before taking the write lock. The encoding/compression grabs a large
byte slice from a pool and ultimately waits until it can acquire the
write lock.
This adds a throttle to limit how many inflight WAL writes can be queued
up to prevent OOMing the processess with slower disks and heavy writes.
If a delete is issued while a compaction is running, the a newly
deleted series could re-appear after the compaction completed. This
could occur the compaction had already written the blocks for series
that were just deleted. When the compaction completes, the newly
written tombstone files would be deleted, essentially undeleting the
series.
Reduce the lock contention on tsdb.Store by taking a short lived
read-lock instead of a long write lock. Also close shards in parallel
and drop the whole RP dir in bulk instead of each shard dir.
Reduces the lock contention on the tsdb.Store by taking a short
read lock instead of a long write lock. Also processes shards
in parallel instead of serially.
Due to a bug in compactions, it's possible some blocks may have duplicate
points stored. If those blocks are decoded and re-compacted, an assertion
panic could trigger.
We now dedup those blocks if necessary to remove the duplicate points
and avoid the panic.
For larger datasets, it's possible for shards to get into a state where
many large, dense TSM files exist. While the shard is still hot for
writes, full compactions will skip these files since they are already
fairly optimized and full compactions are expensive. If the write volume
is large enough, the shard can accumulate lots of these files. When
a file is in this state, it's index can contain every series which
causes startup times to increase since each file must parse the full
set of series keys for every file. If the number of series is high,
the index can be quite large causing large amount of disk IO at startup.
To fix this, a optmize compaction is run when a full compaction planning
step decides there is nothing to do. The optimize compaction combines
and spreads the data and series keys across all files resulting in each
file containing the full series data for that shard and a subset of the
total set of keys in the shard.
This allows a shard to only store a series key once in the shard reducing
storage size as well allows a shard to only load each key once at startup.
Large files created early in the leveled compactions could cause
a shard to get into a bad state. This reworks the level planner
to handle those cases as well as splits large compactions up into
multiple groups to leverage more CPUs when possible.
Truncate the time interval output of the monitor service to be on even
time intervals rather than on every minute based on the start time. This
normalizes the output from the monitor service.