tsdb.Engine.IsIdle and tsdb.Engine.Digest now return a reason string for why the engine & shard are not idle.
Callers can then use this string for logging, if desired. The returned reason does not allocate memory, so the
caller may want to add the shard ID and path for more information in the log. This is intended to be used in
calls from the anti-entropy service in Enterprise.
(cherry picked from commit bf45841359)
fixes https://github.com/influxdata/influxdb/issues/21448
(cherry picked from commit c8da9bafbf)
closes https://github.com/influxdata/influxdb/issues/21894
Under heavy write load creating new fields and measurements
the rewrite of the fields.idx file is a bottleneck. This
enhancement combines multiple writes into a single one and
shares any error return value with all of the combined
invocations. MeasurementFieldSet and the new
MeasurementFieldSetWriter must both now be explicitly
closed.
Closes#21577
(cherry picked from commit f64be286be)
Closes https://github.com/influxdata/influxdb/issues/21598
fields.idx frequent writes cause lock contention and fields.idx is recreated
when a field or measurement is added in a WritePointsWithContext()
This eliminates locking during the actual file rewrite, and limits it to
the times when the MeasurementFieldSet is actually being read or written
in memory and when the new file is being renamed.
Test verification of correct behavior by checking the fields.idx
file matches the in-memory copy after heavily parallel measurement addition.
Co-authored-by: davidby-influx <72418212+davidby-influx@users.noreply.github.com>
This fixes multi measurement queries that go through the storage service
to correctly pick up all series that apply with the filter. Previously,
negative queries such as `!=`, `!~`, and predicates attempting to match
empty tags did not work correctly with the storage service when multiple
measurements or `OR` conditions were included.
This was because these predicates would be categorized as "multiple
measurements" and then it would attempt to use the field keys iterator
to find the fields for each measurement. The meta queries for these did
not correctly account for negative equality operators or empty tags when
finding appropriate measurements and those could not be changed because
it would cause a breaking change to influxql too.
This modifies the storage service to use new methods that correctly
account for the above situations rather than the field keys iterator.
Some queries that appeared to be single measurement queries also get
considered as multiple measurement queries. Any query with an `OR`
condition will be considered a multiple measurement query.
This bug did not apply to single measurement queries where one
measurement was selected and all of the logical operators were `AND`
values. This is because it used a different code path that correctly
handled these situations.
Removes cloning measurement fields on writes, instead atomically swaps out
measurement field sets when fields are added (with new overhead of copying
existing fields whenever a new one is added).
This commit adds an `indexType` key to the shard sections of the
`/debug/vars` endpoint, as well as the `_internal` shard statistics.
The tag will be reported as `"indexType": "inmem"` or `"indexType":
"tsi1"`.
If it's known that the read request only needs to use a single
measurement, then we can avoid the need to get field keys via the query
engine.
However, that means that a new method of getting the field keys for a
measurement would be needed. This commit exposes a method to efficiently
get field key names for a measurement across multiple shards.
name
This is the start of per-series validation that occurs in the
Engine write path. It uses an in-memory radix tree to reduce
memory usage and is re-built on demand the first time a series
is written.
No appreciable changes in benchmark results. It seems like this
function is less than 4% of cpu time in the write workloads in the
benchmarks at least.
This moves the time range to delete to be returned by the predicate
func in DeleteSeriesRangeWithPredicate. It allows for a single delete
to delete different ranges of times per series instead of a single
range of time for all series.
This change makes it so that we simplify the math engine so it doesn't
use a complicated set of nested iterators. That way, we have to change
math in one fewer place.
It also greatly simplifies the query engine as now we can create the
necessary iterators, join them by time, name, and tags, and then use the
cursor interface to read them and use eval to compute the result. It
makes it so the auxiliary iterators and all of their complexity can be
removed.
This also makes use of the new eval functionality that was recently
added to the influxql package.
No math functions have been added, but the scaffolding has been included
so things like trigonometry functions are just a single commit away.
This also introduces a small breaking change. Because of the call
optimization, it is now possible to use the same selector multiple times
as a selector. So if you do this:
SELECT max(value) * 2, max(value) / 2 FROM cpu
This will now return the timestamp of the max value rather than zero
since this query is considered to have only a single selector rather
than multiple separate selectors. If any aspect of the selector is
different, such as different selector functions or different arguments,
it will consider the selectors to be aggregates like the old behavior.
This commit adds initial empty sketches back to the tsi1 index, as well
as ensuring that ephemeral sketches in the index `LogFile` are updated
accordingly.
The commit also adds a test that verifies that the merged sketches at
the store level produce the correct results under writes, deletions and
re-opening of the store.
This commit does not provide working sketches for post-compaction on the
tsi1 index.
If the fields.idx was corrupted in someway, it would cause the shard
to fail to load. Deleting the file will allow it to be rebuilt.
This change handles this automatically so it's rebuilt if necessary
without user intervention.
Series should only be removed from the series file when they're no
longer present in any shard. This commit ensures that during a shard
rollover, the series local to the shard are checked against all other
series in the database.
Series that are no longer present in any other shards' bitsets, are then
marked as deleted in the series file.