This is a backport of #14262 to the 1.x storage engine. The 1.x storage
engine is now the primary engine for open source so when we switched we
regressed to the old behavior.
This also fixes `go generate` for the tsm1 package by running `tmpl`
with `go run` instead of assuming the correct one is installed in the
path.
This fixes multi measurement queries that go through the storage service
to correctly pick up all series that apply with the filter. Previously,
negative queries such as `!=`, `!~`, and predicates attempting to match
empty tags did not work correctly with the storage service when multiple
measurements or `OR` conditions were included.
This was because these predicates would be categorized as "multiple
measurements" and then it would attempt to use the field keys iterator
to find the fields for each measurement. The meta queries for these did
not correctly account for negative equality operators or empty tags when
finding appropriate measurements and those could not be changed because
it would cause a breaking change to influxql too.
This modifies the storage service to use new methods that correctly
account for the above situations rather than the field keys iterator.
Some queries that appeared to be single measurement queries also get
considered as multiple measurement queries. Any query with an `OR`
condition will be considered a multiple measurement query.
This bug did not apply to single measurement queries where one
measurement was selected and all of the logical operators were `AND`
values. This is because it used a different code path that correctly
handled these situations.
* refactor: Replace ctx.Done() with ctx.Err()
Prior to this commit we checked for context cancellation with a select
block and context.Context.Done() without multiplexing over any other
channel like:
select {
case <-ctx.Done():
// handle cancellation
default:
// fallthrough
}
This commit replaces those type of blocks with a simple check of
ctx.Err(). This has the following benefits:
* Calling ctx.Err() is much faster than entering a select block.
* ctx.Done() allocates a channel when called for the first time.
* Testing the result of ctx.Err() is a reliable way of determininging if
a context.Context value has been canceled.
* fix: Fix data race in execDeleteTagValueEntry()
This commit adds `mincore.Limiter` which throttles page faults caused
by mmap() data. It works by periodically calling `mincore()` to determine
which pages are not resident in memory and using `rate.Limiter` to
throttle accessing using a token bucket algorithm.
* chore: remove tsi1 testdata and add go generate file to download
* chore: fix testdata url and rename gen file
* fix: add testdata generate command to Makefile
* chore: add testdata dir to gitignore
* refactor(tsdb): improve error message when missing testdata
* refactor(tsdb): tagged testdata and avoid stacktrace when missing
Checking a channel too regularly could cause
context switching to other goroutines. In tight loops,
it is prudent to check, but to do so less frequently so
as to avoid thrashing.
* feat(tsdb): SHOW TAG KEYS (no time) query using only TSI data.
* fix(tsdb): Allow for earlier return when scanning during show tag keys.
* fix(tsdb): Speed things up by using the key merger to reduce allocs.
* chore(tsm1): Fix golint.
* fix(tsdb): Remove sorting, because these keys should already be sorted.
* fix(tsdb): Remove dead code to placate the linter.
* fix(tsm1): delimit tsmKeyPrefix with appended comma
Fixes#7589.
Append a comma to the TSM key prefix when matching a full measurement name to avoid erroneously matching other measurement names that include the prefix in their own name. For example, this prevents matching a measurement "cpu1" when targeting "cpu" by updating the prefix to "cpu,". This relies on the fact that tag key-value pairs are separated by commas.
* fix(tsm1): regression tests for tsmKeyPrefix comma delimiting
* fix(storage): Push-down a predicate to match tags for SHOW MEASUREMENTS calls.
* chore: Address feedback.
* fix(tsm1): Split behavior based on existence of predicate for show measurements.
* fix(tsm1): Allow parenthesis expression on the LHS of a predicate.
* fix(tsm1): Create a separate tag predicate verifier that rejects negative comparisons.
* fix(tsm1): Additional test cases for show measurements with predicate.
This commit adds ref counting for files that we pull tag keys from.
Previously, files were only ref counted during the time we extracted
tag keys but this commit adds additional ref counting for the life of
the `Engine.tagKeysNoPredicate()` function.
This commit
* adds new request and response data types for schema gRPC calls
* adds fmt.Stringer implementation to cursors.FieldType
* adds APIs to sort a slice of MeasurementField values,
* upgrades the gogo protobuf package to v1.3.1, which
includes improvements to serialization.
This commit adds a new API to `Cache` to address data races
with the `TagKeys` and `TagValues` APIs.
`Cache` and `entry` provide `AppendTimestamps`, which
appends the current timestamps to the provided slice
to reduce allocations. As noted in the documentation,
it is the responsibility of the caller to sort and deduplicate
the values, if required.
The `cursors.TimestampArray` type was extended to permit
use of the `sort.Sort` API.
This commit introduces a new API for finding the maximum
timestamp of a series when iterating over the keys in a
set of TSM files.
This API will be used to determine the field type of a single
field key by selecting the series with the maximum timestamp.
It has also refactored the common functionality for iterating
TSM keys into `timeRangeBlockReader`, which is shared
between `TimeRangeIterator` and `TimeRangeMaxTimeIterator`.
These APIs require a measurement, permitting an additional optimization
to reduce the search space against the TSM index. Specifically, the
search key prefix is extended from `org+bucket` to
`org+bucket,\x00=<measurement>`
* MeasurementNames
* MeasurementTagKeys
* MeasurementTagValues
* Adds an api to the models package for efficiently parsing the
measurement tag (\x00) from a normalized series key
The root cause is that the Unsigned data type has no representation
in the valueType function in the cache and falls back to the default
case of 0.
0 is also a sentinel value in the entry#add function that will
result in skipping the value type check.
It therefore is possible that unsigned values followed by some other
data type is stored in the cache.
It is suspected that the write may be rejected before reaching the
cache, and therefore may not occur in practice. Specifically, the
series file stores the data types on a per-series basis and would
reject the write.
This commit turns the value types into explicit constants and
ensures all existing block types are represented. In addition,
it adds a mapping function to convert these to a known Block type,
which will be used by the `MeasurementFields` schema request to
determine the type of a series in the cache.
* refactor(storage): move type ByTagKey to the only package that uses it
* refactor(tsdb): use types in tsdb/cursors
* refactor(tsdb): remove unused type SeriesIDElems
* refactor(tsdb): inline only use of tsdb.ReadAllSeriesIDIterator
* refactor(tsdb): move series file to its own package
* refactor(storage): remove platform->influxdb aliases
* feat(backup): `influx backup` creates data backup
* feat(backup): initial restore work
* feat(restore): initial restore impl
Adds a restore tool which does offline restore of data and metadata.
* fix(restore): pr cleanup
* fix(restore): fix data dir creation
* fix(restore): pr cleanup
* chore: amend CHANGELOG
* fix: restore to empty dir fails differently
* feat(backup): backup and restore credentials
Saves the credentials file to backups and restores it from backups.
Additionally adds some logging for errors when fetching backup files.
* fix(restore): add missed commit
* fix(restore): pr cleanup
* fix(restore): fix default credentials restore path
* fix(backup): actually copy the credentials file for the backup
* fix: dirs get 0777, files get 0666
* fix: small review feedback
Co-authored-by: tmgordeeva <tanya@influxdata.com>
This commit adds numerous tests for ascending and descending cursors
that generate merged blocks across multiple files, which exceed the
default fixed buffer size used by the array cursors (MaxPointsPerBlock).
Tests cover two scenarios
1. Each file has one block and the block from the second file is
entirely contained within the first block of the first file.
When merging, the new block is 1200 values, which exceeds the
MaxPointsPerBlock.
2. Each file has multiple blocks, and the blocks have a mixture of
values which interleave and overwrite.
This commit prevents multiple blocks for the same series key having
values truncated when they are being read into an empty buffer.
The current cursor reader code has an optimisation that incorrectly
assumes the incoming array will be limited to 1,000 values (the maximum
block size), but arrays can contain values from multiple matching
blocks.
Fixes#15817
This commit addresses several data-races on the `tsm1.Predicate` type
that were causing a live-lock or similar in rare cases during a delete.
Because `tsm1/FileStore.Apply` executes concurrently across TSM files
the state of the delete's predicate was being unsafely mutated.
This commit adds a `Clone` method to the `influxdb.Predicate` type,
which should be used whenever an `influxdb.Predicate` implementation
needs to be used concurrently.
* chore: Remove several instances of WithLogger
* chore: unexport Logger fields
* chore: unexport some more Logger fields
* chore: go fmt
chore: fix test
chore: s/logger/log
chore: fix test
chore: revert http.Handler.Handler constructor initialization
* refactor: integrate review feedback, fix all test nop loggers
* refactor: capitalize all log messages
* refactor: rename two logger to log
Fixes#15916.
If a predicate was passed in with multiple key/value matches for the
same tag key, then the value index would be incorrect. This ensures that
each tag key can only be added to the location map once.
Fixes#15859
This commit fixes a defect in the TSI index where a filter using the
negated equality operator would result in no matching series being
returned for series stored within the `IndexFile` portions of the index.
The root cause of this was due to missing legacy-handling code in the
index for this particular iterator.
* fix(storage): add failing test for array cursor iterator stats
* fix(storage): make arrayCursorIterator.Stats() return stats of in-focus cursor
* fix(storage): add failing test to assert arrayCursorIterator.Stats() returns accumulated result
* fix(storage): assumulate stats in arrayCursorIterator.Stats() call across all observed cursors
By default this feature is disabled; the full compaction behaviour does
not change. When this feature is enabled compactions can be limited
across multiple storage engines running in multiple processes.
The mechanism by which this happens is not part of the abstraction added
here.
Previously the TSI partition would panic if a compaction was
started while `Wait()` was waiting. This commit removes the previous
wait group and replaces it with a simple counter. The `Wait()`
function now polls the counter until it reaches zero.
The cache is essentially a set of maps, where a key in each map is a
series key, and the value is a slice of values associated with that key.
The cache is sharded and series keys are hashed to determine which shard
(map) they live in.
When deleting from the cache we have to check each key to see if it
matches the delete command (predicate and timestamp). If it does then
the entries for that range are removed. As part of this work we check if
the entries are already empty (already removed) and if so we don't check
if the key is valid.
This involved a lot of mutex grabbing, which has now been replaced with
atomic operations.
Benchmarking this commit against the previous commit in this branch
shows a 9% improvement:
name old time/op new time/op delta
Engine_DeletePrefixRange_Cache/exists-24 113ms ± 8% 102ms ±11% -9.40% (p=0.000 n=10+10)
Engine_DeletePrefixRange_Cache/not_exists-24 95.6ms ± 2% 97.1ms ± 4% ~ (p=0.089 n=10+10)
name old alloc/op new alloc/op delta
Engine_DeletePrefixRange_Cache/exists-24 29.6MB ± 1% 25.5MB ± 1% -13.71% (p=0.000 n=10+10)
Engine_DeletePrefixRange_Cache/not_exists-24 24.3MB ± 2% 23.9MB ± 1% -1.48% (p=0.000 n=10+10)
name old allocs/op new allocs/op delta
Engine_DeletePrefixRange_Cache/exists-24 334k ± 0% 305k ± 1% -8.67% (p=0.000 n=8+10)
Engine_DeletePrefixRange_Cache/not_exists-24 302k ± 1% 299k ± 1% -1.25% (p=0.000 n=10+9)
Raw benchmarks on a 24T / 32GB / NVME machine:
goos: linux
goarch: amd64
pkg: github.com/influxdata/influxdb/tsdb/tsm1
BenchmarkEngine_DeletePrefixRange_Cache/exists-24 200 91035525 ns/op 25557809 B/op 305258 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/exists-24 200 99416796 ns/op 25385052 B/op 303584 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/exists-24 100 100149484 ns/op 25570062 B/op 305761 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/exists-24 100 100222516 ns/op 25474372 B/op 303089 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/exists-24 200 101868258 ns/op 25531572 B/op 304736 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/exists-24 100 106268683 ns/op 25648213 B/op 306768 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/exists-24 100 102905477 ns/op 25572314 B/op 305798 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/exists-24 100 108742857 ns/op 25483068 B/op 304788 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/exists-24 100 103292149 ns/op 25401388 B/op 303401 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/exists-24 100 107178026 ns/op 25573602 B/op 305821 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/not_exists-24 200 95082692 ns/op 23942491 B/op 299116 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/not_exists-24 200 96088487 ns/op 23957028 B/op 298545 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/not_exists-24 200 94279165 ns/op 23620981 B/op 294536 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/not_exists-24 200 94509000 ns/op 23989593 B/op 299453 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/not_exists-24 200 98530062 ns/op 23935846 B/op 299237 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/not_exists-24 200 98008093 ns/op 23821683 B/op 297875 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/not_exists-24 200 97603172 ns/op 23878336 B/op 298350 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/not_exists-24 200 96867920 ns/op 23782588 B/op 296236 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/not_exists-24 200 99148908 ns/op 23997702 B/op 299277 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/not_exists-24 100 100866840 ns/op 24019916 B/op 300339 allocs/op
PASS
ok github.com/influxdata/influxdb/tsdb/tsm1 1144.213s
This command performs verification of TSM blocks
* expected and actual CRC-32 checksums match
* expected and actual min and max timestamps match decoded
data