This instructs the kernel that it can release memory used by mmap'd
TSM files when they are not actively being used. It the mappings are
use, the kernel will fault the pages back in. On linux, this causes
RES memory to drop immediately when run.
A cold shard that suddenly receives a lot of writes could get a very
big cache that takes a long time to snapshot or causes the cache
max memory limit to be hit more quickly. This re-enables the compactions
if necessary during writes so we don't have to wait for the shard monitor
goroutine to re-enable them.
The ConditionExpr function is more accurate because it parses the
condition and ensures that time conditions are actually used correctly.
That means that attempting to combine conditions with OR will not result
in the query silently pretending it's an AND and nested conditions work
correctly so there is only one way to read the query.
It also extracts the non-time conditions into a separate condition so we
can stop attempting to parse around the time conditions in lower layers
of the storage engine. This change does not remove those hacks, but a
following commit should be able to sanitize the condition and remove
them.
The tag cardinality checks were run for all inmem shards. Since inmem
shards share the same index, a lot of the work is redundant. Inmem shards
also need to sort their measurmenet and tag keys which can be CPU intensive
with many shards or higher cardinality.
This changes the monitoring to just check one shard in each database which
should lower CPU usage due to excessive sorting. The longer term solution
is to use TSI which would not have this check or required sorting.
There was a change to speed up deleting and dropping measurements
that executed the deletes in parallel for all shards at once. #7015
When TSI was merged in #7618, the series keys passed into Shard.DeleteMeasurement
were removed and were expanded lower down. This causes memory to blow up
when a delete across many shards occurs as we now expand the set of series
keys N times instead of just once as before.
While running the deletes in parallel would be ideal, there have been a number
of optimizations in the delete path that make running deletes serially pretty
good. This change just limits the concurrency of the deletes which keeps memory
more stable.
When monitoring shards, a slice of measurements is allocated for
each shard. With many shards and measurements, these allocations
can be large. Since inmem shards share the same index, we only
need to do this once since the resulting slices are all the same.
This reduces memory usage when monitoring shard cardinality.
The monitor goroutine ran for each shard and updated disk stats
as well as logged cardinality warnings. This goroutine has been
removed by making the disks stats more lightweight and callable
direclty from Statisics and move the logging to the tsdb.Store. The
latter allows one goroutine to handle all shards.
Each shard has a number of goroutines for compacting different levels
of TSM files. When a shard goes cold and is fully compacted, these
goroutines are still running.
This change will stop background shard goroutines when the shard goes
cold and start them back up if new writes arrive.
This limit allows the number of concurrent level and full compactions
to be throttled. Snapshot compactions are not affected by this limit
as then need to run continously.
This limit can be used to control how much CPU is consumed by compactions.
The default is to limit to the number of CPU available.
Compactions are enabled as soon as the shard is opened. This can
slow down startup or cause the system to spike in CPU usage at startup
if many shards need to be compacted.
This now delays compactions until after they are loaded.
Removing series while trying to maintain the sorted series list
does not perform well when removing many series. This causes
drop DB, RP, series, to be very slow in some cases.
Instead, lazily create a sorted series list when first requested and
invalidate it when dropping series.
This reworks drop measurement to use a sorted list of series keys
instead of creating an intermediate map. It remove allocations
and some extra garbage that is created during drop measurement.
When using the inmem index, if one drops a database, and then creates it
again, the previous index object will be reused. This includes the
previous cardinality estimation sketches, leading to inaccurate
cardinality estimations.
They rebased a revision we were previously relying upon that allowed us
to use the vanity name so we are reverting back to an older version with
the old import path.
Adds a `tsdb.Series.ForEachTag()` function for safely iterating
over a series' tags within the context of a lock. This preverts
tags from being dereferenced during iteration which can cause
a seg fault.
This change adds some very basic name validation with the following
plain-english description: names must be non-zero sequence of printable
characters that do not contain slashes ('/' or '\') and are not equal to
either "." or "..".
The intent is that, since we currently just use database and retention
policy names directly as path elements, these rules will hopefully leave
us with names that should be at least close to valid directory names.
Ideally, we would restrict names even further or not use them as path
elements directly, but this should be a step towards the former without
restricting names "too much"
Fixes#7822
This change first ensures that databases and retention policies exist
before attempting to remove them from the Store. It also adds some
checks in the `DeleteDatabase` and `DeleteRetentionPolicy` to ensure
that maliciously named entries won't remove anything outside of the
configured data directory.
This adds query syntax support for subqueries and adds support to the
query engine to execute queries on subqueries.
Subqueries act as a source for another query. It is the equivalent of
writing the results of a query to a temporary database, executing
a query on that temporary database, and then deleting the database
(except this is all performed in-memory).
The syntax is like this:
SELECT sum(derivative) FROM (SELECT derivative(mean(value)) FROM cpu GROUP BY *)
This will execute derivative and then sum the result of those derivatives.
Another example:
SELECT max(min) FROM (SELECT min(value) FROM cpu GROUP BY host)
This would let you find the maximum minimum value of each host.
There is complete freedom to mix subqueries with auxiliary fields. The only
caveat is that the following two queries:
SELECT mean(value) FROM cpu
SELECT mean(value) FROM (SELECT value FROM cpu)
Have different performance characteristics. The first will calculate
`mean(value)` at the shard level and will be faster, especially when it comes to
clustered setups. The second will process the mean at the top level and will not
include that optimization.