When using queries like 'select count(_seriesKey) from bigmeasurement`, we
should iterate over the tsi structures to serve the query instead of loading
all the series into memory up front.
Closes#20543
Loop with backoff in (*Engine).CreateSnapshot() to retry
(*Engine).WriteSnapshot() up to 3 times if
ErrSnapshotInPrgress is returned. Then continue
on no error or on SnapshotInProgress if skipCacheOk is
true.
https://github.com/influxdata/plutonium/issues/3227
(cherry picked from commit dfa6aa8cea)
Test the skipCacheOk flag to tsdb.Shard.CreateSnapshot() and
tsdb.Engine.CreateSnapshot()
A value of true allows the backup to proceed even if a cache
snapshot cannot be taken.
https://github.com/influxdata/plutonium/issues/3227
This fix adds a skipCacheOk flag to
tsdb.Store.CreateShardSnapshot() and tsdb.Shard.CreateSnapshot()
to pass to tsdb.Engine.CreateSnapshot()
A value of true allows the backup to proceed even if a cache snapshot
cannot be taken.
This flag is set to true in tsm1.Engine.Backup(), the OSS backup code path
This flag is set to false in tsm1.Engine.Export()
https://github.com/influxdata/plutonium/issues/3227
When an InfluxDB database is very busy writing new points the backup
the process can fail because it can not write a new snapshot.
The error is: operation timed out with error: create snapshot: snapshot in progress.
This happens because InfluxDB takes almost "continuously" a snapshot
from the cache caused by the high number of points ingested.
The fix for this was https://github.com/influxdata/influxdb/pull/16627
but it was for OSS only, and was not in the code path for backups
in clusters.
This fix adds a skipCacheOk flag to tsdb.Engine.CreateSnapshot().
A value of true allows the backup to proceed even if a cache snapshot
cannot be taken.
This flag is set to true in tsm1.Engine.Backup(), the OSS backup code path
and in tsdb.Shard.CreateSnapshot(), the cluster backup code path.
This flag is set to false in tsm1.Engine.Export()
https://github.com/influxdata/plutonium/issues/3227
This feature allows compaction to be disabled on a per-shard basis by
creating a file named do_not_compact in a shard's directory. When
disabled, a message is logged every 15 minutes with the reason for
compaction being disabled (existance of the file). This makes it easy to
know if compaction has been disabled for any shards by searching the log
for "compaction disabled" or running "find path/to/data -type f -name
do_not_compact".
The original version of verifyVersion() reads into a byte slice,
manually ensures its byte order, then converts it to a type comparable
with Version and MagicNumber.
This patch hides those details by calling binary.Read() and reading
values into properly typed variables.
This adds a bit of overhead but this code isn't in the hot-path and this
patch greatly simplifies the code.
verifyVersion() originally accepted an io.ReadSeeker. It is only called
in once place and that function immediately calls seek after
verifyVersion(), therefore it is probably safe to call Seek() BEFORE
verifyVersion().
The benefit is that verifyVersion() is easier to test since we can pass
it a bytes.Buffer.
This patch adds a test for verifyVersion() as well as a benchmark.
benchmark old ns/op new ns/op delta
BenchmarkVerifyVersion-8 73.5 123 +67.35%
Finally, this commit moves verifyVersion() from writer.go to reader.go
which is where it is actually used.
* feat(engine/tsm1): Add WritePointsWithContext()
Add WritePontsWithContext() and make WritePoints() a thin wrapper for
it.
The purpose is to add statistics context values that we'll use to
propagate the number of fields and points written to calls up the call
chain.
* feat(tsdb): Add WriteToShardWithContext()
When applied, this patch adds WriteToShardWithContext() and wraps it
with WriteToShard() to preserve the API.
The the purpose of this addition is to propagate a context.Context value
to Shard.WritePointsWithContext().
* feat(tsdb/shard): Add WritePointsWithContext()
The purpose of adding WritePointsWithContext() is to propage context
values down to engine code and propage statistics via the context.Value
up to callers.
This patch also adds values written statistics to the shard.
* feat(http): Gather values written stats
WritePointsWithContext() was added to propagate context values down to
the engine and communicate stats to the caller.
* feat(http): Gather values written stats
WritePointsWithContext() was added to propagate context values down to
the engine and communicate stats to the caller.
* refactor: Change MetricKey to ContextKey
This patch gives the type we're useing for context keys a better name.
When applied this patch will:
* log snapshot directory removal errors
Prior to this patch, errors when removing temporary snapshot
directories happens silently.
This patch ensures that errors are logged when os.RemoveAll() fails.
* refactor tsm1: Declare error value in condition
Save a line of code and limits the scope of an error value.
* refactor tsm1: Add MakeSnapshotLinks()
This commit adds (*FileStore).MakeSnapshotLinks(). The code in this
function was originally part of CreateSnapshot().
That code was hoisted out and into MakeSnapshotLinks() becuase there
are two points of failure that require cleanup -- we have to delete a
temporary directory on failure.
Placing the code in one function allows us to check its returned error
value and perform cleanup in only once place.
In short, we hoisted code out of CreateSnapshot() to simplify error
handling.
On error, we remove any directories we created.
* fix(tsdb): address staticcheck ST1006
This patch addresses staticcheck warning "receiver name should not be an
underscore, omit the name if it is unused (ST1006)" for 6 methods.
* fix: verify precision parameter in write requests
This change updates the HTTP endpoints that service v1 and v2 writes to
verify the values passed in the precision parameter.
* fix(tsm1): Fix temp directory search bug
The original code's intention is to scan a directory for the directory
with the higest value when converted to an integer.
So directories may be in the form:
0.tmp
1.tmp
2.tmp
30.tmp
...
100.tmp
The loop should scan the directory, strip the basename and extension
from the file name to leave just a number, then store the higest number
it finds.
Before this patch, there is a bug that has the code only store the
higest value if there is an error converting the numeric value into an
integer.
This patch primarily fixes that logic.
In addition, this patch will save an indent level by inverting logic in
two places:
Instead if checkig if a file is a directory and has a suffix of ".tmp",
it is probably better to test if a file is NOT a directory OR does NOT
have an extension of ".tmp" then continue.
Also, instead of testig if len(ss) == 2, we can test if len(ss) != 2 and
continue if so.
Both of these save an indent level and keeps our "happy path" to the
left.
Finally, this patch will use string concatination instead of calling
fmt.Sprintf() to add periods to "tmp" and "tsm" extension.
Co-authored-by: David Norton <dgnorton@gmail.com>
fixes#17440
While encoding or decoding corrupt data, the current behaviour is to `panic`.
This commit replaces the `panic` with `error` to be propagated up to the calling `iterator`.
To avoid overwriting other `error`, iterators now wraps a `TSMErrors` which contains ALL the encountered errors.
TSMErrors itself implements `Error()`, the returned string contains all the error msgs, separated by "," delimiter.
When an InfluxDB database is very busy writing new points the backup
the process can fail because it can not write a new snapshot.
The error is: `operation timed out with error: create snapshot: snapshot in progress`.
This happens because InfluxDB takes almost "continuously" a snapshot
from the cache caused by the high number of points ingested.
This PR skips snapshots if the `snapshotter` does not come available
after three attempts when a backup is requested.
The backup won't contain the data in the cache or WAL.
Signed-off-by: Gianluca Arbezzano <gianarb92@gmail.com>
This commit prevents multiple blocks for the same series key having
values truncated when they are being read into an empty buffer.
The current cursor reader code has an optimisation that incorrectly
assumes the incoming array will be limited to 1,000 values (the maximum
block size), but arrays can contain values from multiple matching
blocks.
* fix(storage): skip TSM files with block read errors
When we find a bad TSM file during compaction, propagate the error up and move
the bad file aside. The engine will disregard the file so the next compaction
will not hit the same error.
This change adds a lock around digest creation so that it is safe for
concurrent calls. Prior to this change, calls from multiple goroutines
resulted in "Digest aborted, problem renaming tmp digest" errors.
StringArrayEncodeAll will panic if the total length of strings
contained in the src slice is > 0xffffffff. This change adds a unit
test to replicate the issue and an associated fix to return an error.
This also raises an issue that compactions will be unable to make
progress under the following condition:
* multiple string blocks are to be merged to a single block and
* the total length of all strings exceeds the maximum block size that
snappy will encode (0xffffffff)
The observable effect of this is errors in the logs indicating a
compaction failure.
Fixes#13687
StringArrayEncodeAll will panic if the total length of strings
contained in the src slice is > 0xffffffff. This change adds a unit
test to replicate the issue and an associated fix to return an error.
This also raises an issue that compactions will be unable to make
progress under the following condition:
* multiple string blocks are to be merged to a single block and
* the total length of all strings exceeds the maximum block size that
snappy will encode (0xffffffff)
The observable effect of this is errors in the logs indicating a
compaction failure.
Fixes#13687
Scanner objects and iterators often need a ValuerEval. This
object is created, often with a function call, and has at
least one interface in it, so it allocates storage. Then it's
dropped again right away. The only part of it that might be
subject to change is usually a map. While the map's contents
change over time, the actual map doesn't change for the
lifetime of the object.
So, in both iterators and scanners, stash the ValuerEval
and continue reusing it. On a query returning a fair number
of data points, this produces a small (<5% in practice)
improvement in observed performance, visible as a significant
reduction in time spent in runtime (mallocgc, newobject,
etcetera).
The performance improvement isn't big, but it's reasonably
easy to evaluate it and establish that it's a safe change
to make.
Signed-off-by: seebs <seebs@seebs.net>
This commit limits the number of files that can be compacted in
a single group when forcing a full compaction or when a shard
becomes cold. This is to prevent too many files being compacted
at the same time.
Before this, if you deleted everything with `delete where true`
for example, then you would be left with all of your measurements
in the fields index. That would cause ghost fields to reappear
if someone reinserted to the measurement.
This fixes that by making it so the deepest most delete code
checks if the measurement was removed from the index, and if so
cleaning it up out of the fields index.
Additionally, it fixes bugs in that cleanup code where if you had
a measurement like "m1" and "m10", when iterating over the cache
or file store, "m1" would match "m10" due to it only checking the
prefix. This also has it check the character right after the
measurement to be either a comma because tags started, or the first
character of the field separator.
This change fixes#10511 that manifests when a shard is considered cold
faster than its cache is snapshotted. This can happen if WAL is enabled
because previously the code only considered the last modification of
compacted tsm1 files. Instead Engine.LastModified() also takes the WAL
into account if necessary.
This change makes the digest reader read and discard the manifest if
needed. Not all readers of a digest are interested in the manifest.
This change also makes it a requirement for the writer to write a
manifest because it is a non-optional part of a digest file.
Encode the compressed data at the start internal buffer. This ensures
the returned slice maintains the entire capacity and is available for
subsequent use.
When we pool / reuse string buffers, this will help considerably.
Improvements over previous commit:
```
name old time/op new time/op delta
EncodeStrings/10/batch-8 542ns ± 1% 355ns ± 2% -34.53% (p=0.008 n=5+5)
EncodeStrings/100/batch-8 5.29µs ± 1% 3.58µs ± 2% -32.20% (p=0.008 n=5+5)
EncodeStrings/1000/batch-8 48.6µs ± 0% 36.2µs ± 2% -25.40% (p=0.008 n=5+5)
name old alloc/op new alloc/op delta
EncodeStrings/10/batch-8 704B ± 0% 0B -100.00% (p=0.008 n=5+5)
EncodeStrings/100/batch-8 9.47kB ± 0% 0.00kB -100.00% (p=0.008 n=5+5)
EncodeStrings/1000/batch-8 90.1kB ± 0% 0.0kB -100.00% (p=0.008 n=5+5)
name old allocs/op new allocs/op delta
EncodeStrings/10/batch-8 0.00 0.00 ~ (all equal)
EncodeStrings/100/batch-8 1.00 ± 0% 0.00 -100.00% (p=0.008 n=5+5)
EncodeStrings/1000/batch-8 1.00 ± 0% 0.00 -100.00% (p=0.008 n=5+5)
```
This commit adds a tsm1 function for encoding a batch of booleans into a
provided buffer.
The following benchmarks compare the performance of the existing
iterator based encoders, and the new batch oriented encoders using
randomly generated sets of booleans.
This commit adds a tsm1 function for encoding a batch of strings into a
provided buffer. The new function also shares the buffer between the
input data and the snappy encoded output, reducing allocations.
The following benchmarks compare the performance of the existing
iterator based encoders, and the new batch oriented encoders using
randomly generated strings.
name old time/op new time/op delta
EncodeStrings/10 2.14µs ± 4% 1.42µs ± 4% -33.56% (p=0.000 n=10+10)
EncodeStrings/100 12.7µs ± 3% 10.9µs ± 2% -14.46% (p=0.000 n=10+10)
EncodeStrings/1000 132µs ± 2% 114µs ± 2% -13.88% (p=0.000 n=10+9)
name old alloc/op new alloc/op delta
EncodeStrings/10 657B ± 0% 704B ± 0% +7.15% (p=0.000 n=10+10)
EncodeStrings/100 6.14kB ± 0% 9.47kB ± 0% +54.14% (p=0.000 n=10+10)
EncodeStrings/1000 61.4kB ± 0% 90.1kB ± 0% +46.66% (p=0.000 n=10+10)
name old allocs/op new allocs/op delta
EncodeStrings/10 3.00 ± 0% 0.00 -100.00% (p=0.000 n=10+10)
EncodeStrings/100 3.00 ± 0% 1.00 ± 0% -66.67% (p=0.000 n=10+10)
EncodeStrings/1000 3.00 ± 0% 1.00 ± 0% -66.67% (p=0.000 n=10+10)
This commit adds a tsm1 function for encoding a batch of floats into a
buffer. Further, it replaces the `bitstream` library used in the
existing encoders (and all the current decoders) with inlined bit
expressions within the encoder, significantly reducing the function call
overhead for larger batches.
The following benchmarks compare the performance of the existing
iterator based encoders, and the new batch oriented encoders. They look
at a sequential input slice and a randomly generated input slice.
name old time/op new time/op delta
EncodeFloats/10_seq 1.14µs ± 3% 0.24µs ± 3% -78.94% (p=0.000 n=10+10)
EncodeFloats/10_ran 1.69µs ± 2% 0.21µs ± 3% -87.43% (p=0.000 n=10+10)
EncodeFloats/100_seq 7.07µs ± 1% 1.72µs ± 1% -75.62% (p=0.000 n=7+9)
EncodeFloats/100_ran 15.8µs ± 4% 1.8µs ± 1% -88.60% (p=0.000 n=10+9)
EncodeFloats/1000_seq 50.2µs ± 3% 16.2µs ± 2% -67.66% (p=0.000 n=10+10)
EncodeFloats/1000_ran 174µs ± 2% 16µs ± 2% -90.77% (p=0.000 n=10+10)
name old alloc/op new alloc/op delta
EncodeFloats/10_seq 0.00B 0.00B ~ (all equal)
EncodeFloats/10_ran 0.00B 0.00B ~ (all equal)
EncodeFloats/100_seq 0.00B 0.00B ~ (all equal)
EncodeFloats/100_ran 0.00B 0.00B ~ (all equal)
EncodeFloats/1000_seq 0.00B 0.00B ~ (all equal)
EncodeFloats/1000_ran 0.00B 0.00B ~ (all equal)
name old allocs/op new allocs/op delta
EncodeFloats/10_seq 0.00 0.00 ~ (all equal)
EncodeFloats/10_ran 0.00 0.00 ~ (all equal)
EncodeFloats/100_seq 0.00 0.00 ~ (all equal)
EncodeFloats/100_ran 0.00 0.00 ~ (all equal)
EncodeFloats/1000_seq 0.00 0.00 ~ (all equal)
EncodeFloats/1000_ran 0.00 0.00 ~ (all equal)
If there was an error after the cache has been snapshotted to one or
more TSM files, but before the cache and WAL are cleaned up, then the
cache would be repeatedly snapshotted, generated duplicate level 1 TSM
files.
This commit attempts to clean those files up by removing the temporary
TSM file(s). The snapshot will be retried.
FloatBatchDecodeAll behaves the same as the iterator-based float
decoder, returning an empty slice and no error when passed a buffer
with no encoded float values.
Fixes#10270