This commit quiets staticcheck's warnings about "unnecessary use of
fmt.Sprintf" and "unnecessary use of fmt.Sprint".
Prior to this commit we were wrapping simple constant strings without
any formatting verbs with fmt.Sprintf().
* fix(tsdb): address staticcheck ST1006
This patch addresses staticcheck warning "receiver name should not be an
underscore, omit the name if it is unused (ST1006)" for 6 methods.
Before this commit, the to and from variables were being re-declared in
a block in such a way that the values were not being used.
This patch uses regular assignment so that the values are visable
outside of the block where they're set.
Closes: 18128
* fix: verify precision parameter in write requests
This change updates the HTTP endpoints that service v1 and v2 writes to
verify the values passed in the precision parameter.
* fix(tsm1): Fix temp directory search bug
The original code's intention is to scan a directory for the directory
with the higest value when converted to an integer.
So directories may be in the form:
0.tmp
1.tmp
2.tmp
30.tmp
...
100.tmp
The loop should scan the directory, strip the basename and extension
from the file name to leave just a number, then store the higest number
it finds.
Before this patch, there is a bug that has the code only store the
higest value if there is an error converting the numeric value into an
integer.
This patch primarily fixes that logic.
In addition, this patch will save an indent level by inverting logic in
two places:
Instead if checkig if a file is a directory and has a suffix of ".tmp",
it is probably better to test if a file is NOT a directory OR does NOT
have an extension of ".tmp" then continue.
Also, instead of testig if len(ss) == 2, we can test if len(ss) != 2 and
continue if so.
Both of these save an indent level and keeps our "happy path" to the
left.
Finally, this patch will use string concatination instead of calling
fmt.Sprintf() to add periods to "tmp" and "tsm" extension.
Co-authored-by: David Norton <dgnorton@gmail.com>
fixes#17440
While encoding or decoding corrupt data, the current behaviour is to `panic`.
This commit replaces the `panic` with `error` to be propagated up to the calling `iterator`.
To avoid overwriting other `error`, iterators now wraps a `TSMErrors` which contains ALL the encountered errors.
TSMErrors itself implements `Error()`, the returned string contains all the error msgs, separated by "," delimiter.
We were seing segfaults in Roaring bitmaps sometimes, under very
high load with networked drives. This may reduce risk of segfault by
forcing marshalling to copy the data.
* fix: access tsi active log file with READ lock
The activeLogFile pointer may be altered by other routine so the READ
lock is needed.
* Merge pull request #16384 from foobar/tsi-partition-lock
fix: access tsi active log file with READ lock
Co-authored-by: Tristan Su <sooqing@gmail.com>
Co-authored-by: David Norton <dgnorton@gmail.com>
When an InfluxDB database is very busy writing new points the backup
the process can fail because it can not write a new snapshot.
The error is: `operation timed out with error: create snapshot: snapshot in progress`.
This happens because InfluxDB takes almost "continuously" a snapshot
from the cache caused by the high number of points ingested.
This PR skips snapshots if the `snapshotter` does not come available
after three attempts when a backup is requested.
The backup won't contain the data in the cache or WAL.
Signed-off-by: Gianluca Arbezzano <gianarb92@gmail.com>
Prior to this change, new series would be added to the series file
before checking the series cardinality limit. If the limit was exceeded,
the write was rejected even though the series had already been added to
the series file.
This commit prevents multiple blocks for the same series key having
values truncated when they are being read into an empty buffer.
The current cursor reader code has an optimisation that incorrectly
assumes the incoming array will be limited to 1,000 values (the maximum
block size), but arrays can contain values from multiple matching
blocks.
* fix(storage): skip TSM files with block read errors
When we find a bad TSM file during compaction, propagate the error up and move
the bad file aside. The engine will disregard the file so the next compaction
will not hit the same error.
This change adds a lock around digest creation so that it is safe for
concurrent calls. Prior to this change, calls from multiple goroutines
resulted in "Digest aborted, problem renaming tmp digest" errors.
Fixes#15859
This commit fixes a defect in the TSI index where a filter using the
negated equality operator would result in no matching series being
returned for series stored within the `IndexFile` portions of the index.
The root cause of this was due to missing legacy-handling code in the
index for this particular iterator.
This upgrades the flux version to v0.50.2.
The secret service, which is used for alerts, is not included. The
`to()` function is also still not included.
Fixes#10052
This commit fixes an issue where field keys would reappear in results
when querying previously dropped measurements.
The issue manifests itself when duplicates of a new series are inserted
into the `inmem` index. In this case, a map that tracks the number of
series belonging to a measurement was incorrectly incremented once for
each duplication of the series. Then, when it came time to drop the
measurement, the index assumed there were several series belonging to
the measurement left in the index (because the counter was higher than
it should be). The result of that was that the `fields.idx` file (which
stores a mapping between measurements and field keys) was not truncated
and rebuilt. This left old field keys in that file, which were then
returned in subsequent queries over all field keys.
The flux in influxdb has been upgraded to use v0.33.2. A lot of
interfaces for the storage engine were changed during this so code had
to change to accomodate the new interfaces and remove the old ones.
Included in this commit is a patch file for the changes that were made.
A patch was generated for the following packages:
* `flux/stdlib/influxdata/influxdb`
* `storage/reads`
* `tsdb/cursors`
These are the three packages that are in common with version 2 of the
database and the first of these packages contains the specific
implementations that are used for version 1.
It is very possible that the next time we upgrade this, the patch will
not apply cleanly just like it wouldn't have applied cleanly to this
update. The patch is mostly meant to document exactly what changed
during the copy over to help ensure we don't forget things when adapting
the interfaces.
Add a patch file to hopefully make this easier in the future
StringArrayEncodeAll will panic if the total length of strings
contained in the src slice is > 0xffffffff. This change adds a unit
test to replicate the issue and an associated fix to return an error.
This also raises an issue that compactions will be unable to make
progress under the following condition:
* multiple string blocks are to be merged to a single block and
* the total length of all strings exceeds the maximum block size that
snappy will encode (0xffffffff)
The observable effect of this is errors in the logs indicating a
compaction failure.
Fixes#13687
StringArrayEncodeAll will panic if the total length of strings
contained in the src slice is > 0xffffffff. This change adds a unit
test to replicate the issue and an associated fix to return an error.
This also raises an issue that compactions will be unable to make
progress under the following condition:
* multiple string blocks are to be merged to a single block and
* the total length of all strings exceeds the maximum block size that
snappy will encode (0xffffffff)
The observable effect of this is errors in the logs indicating a
compaction failure.
Fixes#13687
This integrates the influxdb 1.x series to the latest version of Flux
and updates the code to use it. It also removes the dependency on
platform and copies the necessary code from storage into the 1.x series
so the dependency is unneeded.
The flux functions specific to 1.x have been moved to the same structure
that flux changed to with having a `stdlib` directory instead of a
`functions` directory. It also adds a `databases()` function that
returns the databases from the meta client.
Previously it was possible to set IDs on a `nil` entry which would
in turn cause a panic. If this panic was recovered by the server
then it would result in a mutex in the `inmem` index staying locked
indefinitely.
We're not allowed to access the s.epochs map without holding the
mutex against shard creation and deletion, so create a copy of
all of the epoch trackers we will need while we hold the mutex.
Scanner objects and iterators often need a ValuerEval. This
object is created, often with a function call, and has at
least one interface in it, so it allocates storage. Then it's
dropped again right away. The only part of it that might be
subject to change is usually a map. While the map's contents
change over time, the actual map doesn't change for the
lifetime of the object.
So, in both iterators and scanners, stash the ValuerEval
and continue reusing it. On a query returning a fair number
of data points, this produces a small (<5% in practice)
improvement in observed performance, visible as a significant
reduction in time spent in runtime (mallocgc, newobject,
etcetera).
The performance improvement isn't big, but it's reasonably
easy to evaluate it and establish that it's a safe change
to make.
Signed-off-by: seebs <seebs@seebs.net>
In the case of caching TSI bitmaps belonging to immutable .tsi files,
the underlying bitset data can be mmapped. It is possible, though rare,
for this data to be unmapped (e.g., via a TSI compaction) but for the
cached bitmap to be subsequently read. This leads to a segfault.
This only happens when copy-on-write is set to true on the roaring
bitmap, because in that case only the internal pointers are cloned.
This change will reduce the TSI cache performance by around 10%, which I
have deemed to account for only a few microseconds typically.
This commit adds a config option to the tsdb Config allowing the size of
the bitset cached in the TSI index to be specified.
Setting the cache size to 0 will disable the cache.
This commit limits the number of files that can be compacted in
a single group when forcing a full compaction or when a shard
becomes cold. This is to prevent too many files being compacted
at the same time.
Before this, if you deleted everything with `delete where true`
for example, then you would be left with all of your measurements
in the fields index. That would cause ghost fields to reappear
if someone reinserted to the measurement.
This fixes that by making it so the deepest most delete code
checks if the measurement was removed from the index, and if so
cleaning it up out of the fields index.
Additionally, it fixes bugs in that cleanup code where if you had
a measurement like "m1" and "m10", when iterating over the cache
or file store, "m1" would match "m10" due to it only checking the
prefix. This also has it check the character right after the
measurement to be either a comma because tags started, or the first
character of the field separator.
This change fixes#10511 that manifests when a shard is considered cold
faster than its cache is snapshotted. This can happen if WAL is enabled
because previously the code only considered the last modification of
compacted tsm1 files. Instead Engine.LastModified() also takes the WAL
into account if necessary.
There are some problematic races that occur when deletes happen
against writes to the same points at the same time. This change
introduces guards and an epoch based system to coordinate these
modifications.
A guard matches a point based on the time, measurement name, and
some conditions loaded from an influxql expression. The intent
is to be as precise as possible without allowing any false
neagatives: if a point would be deleted, the guard must match it.
We are allowed to match more points than necessary, at the cost
of slowing down writes.
The epoch based system keeps track of outstanding writes and
deletes and their associated guards. When a delete operation
is going to start, it waits until all current writes are
done, and installs its guard, blocking all future writes that
contain points that may conflict with the delete. This allows
writes to disjoint points to proceed uncontended, and the
implementation is optimized for assuming there are few
outstanding deletes. For example, in the case that there are no
deletes, a write just has to take a mutex, bump a counter, and
compare a value against zero. The epoch trackers are per shard,
so that different shards never have to contend with one another.
TSI1 and inmem indexes have different properties during deletes.
Specifically, inmem shares a global index across all shards, where
every tsi1 index is contained to a specific shard. When deleting
a series, it may cause the last reference to the series across all
shards to be dropped, necessitating a removal from the series file.
Since the inmem index shares the index across all shards, removing
the series when it's removed from the series file is sufficient.
However, in the case of a mixed index database, if the last shard
is a TSI1 shard, the other inmem indexes are not available when we
discover that it was the last reference to the series. This ends
up leaving the series in the inmem index without a series id in
the series file, causing all sorts of misbehavior.
Rather than continue curling ourselves into a ball to try to fix
this unsupported mode, give a helpful error message to the user
that they must run their database in a non-mixed index mode to
allow deletes.
Removes cloning measurement fields on writes, instead atomically swaps out
measurement field sets when fields are added (with new overhead of copying
existing fields whenever a new one is added).
We already make copies when no expression is provided, because
the backing slices may go away if the shard they came from is
closed. This fixes the other spot where some backing slices
would be returned.
This change makes the digest reader read and discard the manifest if
needed. Not all readers of a digest are interested in the manifest.
This change also makes it a requirement for the writer to write a
manifest because it is a non-optional part of a digest file.
This commit adds an `indexType` key to the shard sections of the
`/debug/vars` endpoint, as well as the `_internal` shard statistics.
The tag will be reported as `"indexType": "inmem"` or `"indexType":
"tsi1"`.
Encode the compressed data at the start internal buffer. This ensures
the returned slice maintains the entire capacity and is available for
subsequent use.
When we pool / reuse string buffers, this will help considerably.
Improvements over previous commit:
```
name old time/op new time/op delta
EncodeStrings/10/batch-8 542ns ± 1% 355ns ± 2% -34.53% (p=0.008 n=5+5)
EncodeStrings/100/batch-8 5.29µs ± 1% 3.58µs ± 2% -32.20% (p=0.008 n=5+5)
EncodeStrings/1000/batch-8 48.6µs ± 0% 36.2µs ± 2% -25.40% (p=0.008 n=5+5)
name old alloc/op new alloc/op delta
EncodeStrings/10/batch-8 704B ± 0% 0B -100.00% (p=0.008 n=5+5)
EncodeStrings/100/batch-8 9.47kB ± 0% 0.00kB -100.00% (p=0.008 n=5+5)
EncodeStrings/1000/batch-8 90.1kB ± 0% 0.0kB -100.00% (p=0.008 n=5+5)
name old allocs/op new allocs/op delta
EncodeStrings/10/batch-8 0.00 0.00 ~ (all equal)
EncodeStrings/100/batch-8 1.00 ± 0% 0.00 -100.00% (p=0.008 n=5+5)
EncodeStrings/1000/batch-8 1.00 ± 0% 0.00 -100.00% (p=0.008 n=5+5)
```
This commit adds a tsm1 function for encoding a batch of booleans into a
provided buffer.
The following benchmarks compare the performance of the existing
iterator based encoders, and the new batch oriented encoders using
randomly generated sets of booleans.
This commit adds a tsm1 function for encoding a batch of strings into a
provided buffer. The new function also shares the buffer between the
input data and the snappy encoded output, reducing allocations.
The following benchmarks compare the performance of the existing
iterator based encoders, and the new batch oriented encoders using
randomly generated strings.
name old time/op new time/op delta
EncodeStrings/10 2.14µs ± 4% 1.42µs ± 4% -33.56% (p=0.000 n=10+10)
EncodeStrings/100 12.7µs ± 3% 10.9µs ± 2% -14.46% (p=0.000 n=10+10)
EncodeStrings/1000 132µs ± 2% 114µs ± 2% -13.88% (p=0.000 n=10+9)
name old alloc/op new alloc/op delta
EncodeStrings/10 657B ± 0% 704B ± 0% +7.15% (p=0.000 n=10+10)
EncodeStrings/100 6.14kB ± 0% 9.47kB ± 0% +54.14% (p=0.000 n=10+10)
EncodeStrings/1000 61.4kB ± 0% 90.1kB ± 0% +46.66% (p=0.000 n=10+10)
name old allocs/op new allocs/op delta
EncodeStrings/10 3.00 ± 0% 0.00 -100.00% (p=0.000 n=10+10)
EncodeStrings/100 3.00 ± 0% 1.00 ± 0% -66.67% (p=0.000 n=10+10)
EncodeStrings/1000 3.00 ± 0% 1.00 ± 0% -66.67% (p=0.000 n=10+10)
This commit adds a tsm1 function for encoding a batch of floats into a
buffer. Further, it replaces the `bitstream` library used in the
existing encoders (and all the current decoders) with inlined bit
expressions within the encoder, significantly reducing the function call
overhead for larger batches.
The following benchmarks compare the performance of the existing
iterator based encoders, and the new batch oriented encoders. They look
at a sequential input slice and a randomly generated input slice.
name old time/op new time/op delta
EncodeFloats/10_seq 1.14µs ± 3% 0.24µs ± 3% -78.94% (p=0.000 n=10+10)
EncodeFloats/10_ran 1.69µs ± 2% 0.21µs ± 3% -87.43% (p=0.000 n=10+10)
EncodeFloats/100_seq 7.07µs ± 1% 1.72µs ± 1% -75.62% (p=0.000 n=7+9)
EncodeFloats/100_ran 15.8µs ± 4% 1.8µs ± 1% -88.60% (p=0.000 n=10+9)
EncodeFloats/1000_seq 50.2µs ± 3% 16.2µs ± 2% -67.66% (p=0.000 n=10+10)
EncodeFloats/1000_ran 174µs ± 2% 16µs ± 2% -90.77% (p=0.000 n=10+10)
name old alloc/op new alloc/op delta
EncodeFloats/10_seq 0.00B 0.00B ~ (all equal)
EncodeFloats/10_ran 0.00B 0.00B ~ (all equal)
EncodeFloats/100_seq 0.00B 0.00B ~ (all equal)
EncodeFloats/100_ran 0.00B 0.00B ~ (all equal)
EncodeFloats/1000_seq 0.00B 0.00B ~ (all equal)
EncodeFloats/1000_ran 0.00B 0.00B ~ (all equal)
name old allocs/op new allocs/op delta
EncodeFloats/10_seq 0.00 0.00 ~ (all equal)
EncodeFloats/10_ran 0.00 0.00 ~ (all equal)
EncodeFloats/100_seq 0.00 0.00 ~ (all equal)
EncodeFloats/100_ran 0.00 0.00 ~ (all equal)
EncodeFloats/1000_seq 0.00 0.00 ~ (all equal)
EncodeFloats/1000_ran 0.00 0.00 ~ (all equal)
This commit deletes most of the code to service reads from influxdb
and pulls it in from platform instead.
Of note, the models.Tag and models.Tags types are now aliases to the
platform models.Tag and models.Tags types. Additionally, many types
in the tsdb package relating to cursors are also aliases to the same
types in the platform cursors package.
This updates the platform and flux repos to the current master in the
Gopkg.lock.
This commit fixes an issue with the series file compaction process
where tombstones are lost after compaction and series existence
checks are not correct. This commit also fixes some smaller flushing
issues within the series file that mainly related to testing.
If there was an error after the cache has been snapshotted to one or
more TSM files, but before the cache and WAL are cleaned up, then the
cache would be repeatedly snapshotted, generated duplicate level 1 TSM
files.
This commit attempts to clean those files up by removing the temporary
TSM file(s). The snapshot will be retried.
Since all tag sets are materialised to strings before this method
returns, a large number of allocations can be avoided by carefully
resuing buffers and containers.
This commit reduces allocations by about 75%, which can be very
significant for high cardinality workloads.
The benchmark results shown below are for a benchmark that asks for all
series keys matching `tag5=value0'. There are 100K matching series keys.
benchmark old ns/op new ns/op delta
BenchmarkIndexSet_TagSets/1M_series/inmem-8 10959963 11144345 +1.68%
BenchmarkIndexSet_TagSets/1M_series/tsi1-8 23632757 18768888 -20.58%
BenchmarkIndexSet_TagSets/1M_series/inmem-8 10496303 10380551 -1.10%
BenchmarkIndexSet_TagSets/1M_series/tsi1-8 24344359 19020234 -21.87%
BenchmarkIndexSet_TagSets/1M_series/inmem-8 10359864 10818296 +4.43%
BenchmarkIndexSet_TagSets/1M_series/tsi1-8 23453357 19027445 -18.87%
BenchmarkIndexSet_TagSets/1M_series/inmem-8 10479519 10400619 -0.75%
BenchmarkIndexSet_TagSets/1M_series/tsi1-8 26364965 19023749 -27.84%
BenchmarkIndexSet_TagSets/1M_series/inmem-8 10437794 10557066 +1.14%
BenchmarkIndexSet_TagSets/1M_series/tsi1-8 23126946 19196955 -16.99%
benchmark old allocs new allocs delta
BenchmarkIndexSet_TagSets/1M_series/inmem-8 51 51 +0.00%
BenchmarkIndexSet_TagSets/1M_series/tsi1-8 80067 20071 -74.93%
BenchmarkIndexSet_TagSets/1M_series/inmem-8 51 51 +0.00%
BenchmarkIndexSet_TagSets/1M_series/tsi1-8 80067 20071 -74.93%
BenchmarkIndexSet_TagSets/1M_series/inmem-8 51 51 +0.00%
BenchmarkIndexSet_TagSets/1M_series/tsi1-8 80067 20071 -74.93%
BenchmarkIndexSet_TagSets/1M_series/inmem-8 51 51 +0.00%
BenchmarkIndexSet_TagSets/1M_series/tsi1-8 80067 20071 -74.93%
BenchmarkIndexSet_TagSets/1M_series/inmem-8 51 51 +0.00%
BenchmarkIndexSet_TagSets/1M_series/tsi1-8 80067 20071 -74.93%
benchmark old bytes new bytes delta
BenchmarkIndexSet_TagSets/1M_series/inmem-8 3556728 3556728 +0.00%
BenchmarkIndexSet_TagSets/1M_series/tsi1-8 12677328 5157992 -59.31%
BenchmarkIndexSet_TagSets/1M_series/inmem-8 3556728 3556728 +0.00%
BenchmarkIndexSet_TagSets/1M_series/tsi1-8 12677328 5157992 -59.31%
BenchmarkIndexSet_TagSets/1M_series/inmem-8 3556728 3556728 +0.00%
BenchmarkIndexSet_TagSets/1M_series/tsi1-8 12677328 5157992 -59.31%
BenchmarkIndexSet_TagSets/1M_series/inmem-8 3556728 3556728 +0.00%
BenchmarkIndexSet_TagSets/1M_series/tsi1-8 12677328 5157992 -59.31%
BenchmarkIndexSet_TagSets/1M_series/inmem-8 3556728 3556728 +0.00%
BenchmarkIndexSet_TagSets/1M_series/tsi1-8 12677328 5157992 -59.31%