Removing series while trying to maintain the sorted series list
does not perform well when removing many series. This causes
drop DB, RP, series, to be very slow in some cases.
Instead, lazily create a sorted series list when first requested and
invalidate it when dropping series.
This reworks drop measurement to use a sorted list of series keys
instead of creating an intermediate map. It remove allocations
and some extra garbage that is created during drop measurement.
WalkKeys serially walked each TSM file and invoked fn for each key.
Caller needed to handle duplicate calls to fn with the same key
because the same key could exist in multiple TSM files. The serial
execution was also slower.
Since the series keys are already sorted, we can iterate over all
files in parallel and skip duplicates using a sorted merge. This
fixes the duplicate invocation issue as well as speeds up walking
all keys.
This can significant improve startup performance when many TSM files
exists that may not have been fully compacted. This also has benefits
for deletes (measurements/series) since duplicates are removed saving
extra allocations and work. This may also allow for the optimize
compaction to be removed provided startup times are fast enough.
This commits adds a caching mechanism to the Data object, such that
when large numbers of users exist in the system, the cost of determining
if there is at least one admin user will be low.
To ensure that previously marshalled Data objects contain the correct
cached admin user value, we exhaustively determine if there is an admin
user present whenever we unmarshal a Data object.
The Window function will now check before it adjusts the offset whether
it is going to overflow or underflow. If it is going to do either, it
sets the start or end time to MinTime or MaxTime.
When using the inmem index, if one drops a database, and then creates it
again, the previous index object will be reused. This includes the
previous cardinality estimation sketches, leading to inaccurate
cardinality estimations.
The following functions require ordered input but were not guaranteed to
received ordered input:
* `distinct()`
* `sample()`
* `holt_winters()`
* `holt_winters_with_fit()`
* `derivative()`
* `non_negative_derivative()`
* `difference()`
* `moving_average()`
* `elapsed()`
* `cumulative_sum()`
* `top()`
* `bottom()`
These function calls have now been modified to request that their input
be ordered by the query engine. This will prevent the improper output
that could have been caused by multiple series being merged together or
multiple shards being merged together potentially incorrectly when no
time grouping was specified.
Two additional functions were already correct to begin with (so there
are no bugs with these two, but I'm including their names for
completeness).
* `median()`
* `percentile()`
If there were multiple selectors and math, the query engine would
mistakenly think it was the only selector in the query and would not
match their timestamps.
Fixed the query engine to pass whether the selector should be treated as
a selector so queries like `max(value) * 1, min(value) * 1` will match
the timestamps of the result.