This speeds up time encoding and decoding by skipping the divisor
scaling if scaling by 1. Since division and multiplication are expensive
cpu and scaling by 1 has no effect, this just slows encoding and decoding
down.
Tombstone files would be written to all TSM files even if the deleted
keys or timerange did not exist in the TSM file. This had the side
effect of causing shards to get recompacted back to the same state. If
any shards or large numbers of TSM files existed, disk usage and CPU
utilization would spike causing issues.
This prevents tombstones being written for TSM files that could not
possiby contain the series keys being deleted or if the delted time
range is outside the range of the file.
When monitoring shards, a slice of measurements is allocated for
each shard. With many shards and measurements, these allocations
can be large. Since inmem shards share the same index, we only
need to do this once since the resulting slices are all the same.
This reduces memory usage when monitoring shard cardinality.
Since this is called more frequently now, the cleanup func was invoked
quite a bit which makes several syscalls per shard. This should only
be called the first time compactions are disabled.
Index.ForEachMeasurementTagKey held an RLock while call the fn,
if the fn made another call into the index which acquired an RLock
and after another goroutine tried to acquire a Lock, it would deadlock.
This was causing a shard to appear idle when in fact a snapshot compaction
was running. If the time was write, the compactions would be disabled and
the snapshot compaction would be aborted.
The monitor goroutine ran for each shard and updated disk stats
as well as logged cardinality warnings. This goroutine has been
removed by making the disks stats more lightweight and callable
direclty from Statisics and move the logging to the tsdb.Store. The
latter allows one goroutine to handle all shards.
Each shard has a number of goroutines for compacting different levels
of TSM files. When a shard goes cold and is fully compacted, these
goroutines are still running.
This change will stop background shard goroutines when the shard goes
cold and start them back up if new writes arrive.
The compactor prevents the same file from being compacted by different
compaction runs, but it can result in warning errors in the logs that
are confusing.
This adds compaction plan tracking to the planner so that files are
only part of one plan at a given time.
This limit allows the number of concurrent level and full compactions
to be throttled. Snapshot compactions are not affected by this limit
as then need to run continously.
This limit can be used to control how much CPU is consumed by compactions.
The default is to limit to the number of CPU available.
Compactions are enabled as soon as the shard is opened. This can
slow down startup or cause the system to spike in CPU usage at startup
if many shards need to be compacted.
This now delays compactions until after they are loaded.
The lazy sorting of series caused a deadlock since it can not take
a Lock when a caller may have already acquired an RLock.
filters should be called w/o any locks as the function already acquires
locks as needed.
Under high write load, the check for each series was done sequentially
which caused a lot of CPU time to acquire/release the RLock on LogFile.
This switches the code to check multiple series at once under an RLock
similar to the chang for inmem.
The current bytes.Pool will hold onto byte slices indefinitely. Large
writes can cause the pool to hold onto very large buffers over time.
Testing w/ sync/pool seems to perform similarly now so using a sync/pool
will allow these buffers to be GC'd when necessary.
The inmem index would call CreateSeriesIfNotExist for each series
which takes and releases and RLock to see if a series exists. Under
high write load, the lock shows up in profiles quite a bit. This
adds a filtering step that obtains a single RLock and checks all the
series and returns the non-existent series to contine though the slow
path.
Under high write load, the sync goroutine would startup, and end
very frequently. Starting a new goroutine so frequently adds a small
amount of latency which causes writes to take long and sometimes timeout.
This changes the goroutine to loop until there are no more waiters which
reduce the churn and latency.
If the sync waiters channel was full, it would block sending to the
channel while holding a the wal write lock. The sync goroutine would
then be stuck acquiring the write lock and could not drain the channel.
This increases the buffer to 1024 which would require a very high write
load to fill as well as retuns and error if the channel is full to prevent
the blocking.
The Point is intended to be immutable after being parsed since it
is shared by several goroutines. When dropping a field (e.g. time),
corrupted data can result if one goroutine is delete the field
while another is marshaling the underlying byte slices.
To avoid this, the shard will just skip invalid fields and series
instead of trying to mutate them by deleting them.
Removing series while trying to maintain the sorted series list
does not perform well when removing many series. This causes
drop DB, RP, series, to be very slow in some cases.
Instead, lazily create a sorted series list when first requested and
invalidate it when dropping series.
This reworks drop measurement to use a sorted list of series keys
instead of creating an intermediate map. It remove allocations
and some extra garbage that is created during drop measurement.
WalkKeys serially walked each TSM file and invoked fn for each key.
Caller needed to handle duplicate calls to fn with the same key
because the same key could exist in multiple TSM files. The serial
execution was also slower.
Since the series keys are already sorted, we can iterate over all
files in parallel and skip duplicates using a sorted merge. This
fixes the duplicate invocation issue as well as speeds up walking
all keys.
This can significant improve startup performance when many TSM files
exists that may not have been fully compacted. This also has benefits
for deletes (measurements/series) since duplicates are removed saving
extra allocations and work. This may also allow for the optimize
compaction to be removed provided startup times are fast enough.
When using the inmem index, if one drops a database, and then creates it
again, the previous index object will be reused. This includes the
previous cardinality estimation sketches, leading to inaccurate
cardinality estimations.
The previous version was very innefficient due to the benchmarks used
to optimize it having a bug. This version always allocates a new
slice, but is O(n).
This switches compactions to use type values (FloatValues) from the
generic Values type. It avoids a bunch of allocations where each value
much be converted from a specific type to an interface{}.
This code was added to address some slow startup issues. It is believed
to be the cause of some segfault panic's that occur at query time when
the underlying MMAP array has been unmapped. The current structure of
code makes this change unnecessary now.