Commit Graph

2670 Commits (1.8)

Author SHA1 Message Date
davidby-influx d3be25b251
fix: detect misquoted tag values and return an error (#22754) (#22787)
SHOW TAG KEYS FROM "foo" where bar="misquoted" is
erroneous, because the tag value must be enclosed
in single, not double, quotes. Although this
correctly returns no tag keys, it is very
inefficient and has cause out-of-memory failures
at a customer. This fix short-circuits the query.

closes https://github.com/influxdata/influxdb/issues/22755

(cherry picked from commit af9e89a4d4)

closes https://github.com/influxdata/influxdb/issues/22786
2021-10-27 14:28:42 -07:00
davidby-influx 1dd2c0c12f
fix(restore): parameter validation, Windows temp file deletion (#22561)
fix(restore): enforce the -db parameter when -newdb used

closes https://github.com/influxdata/influxdb/issues/15901

(cherry picked from commit 1dde65bb75)

closes https://github.com/influxdata/influxdb/issues/22560

fix: for Windows, copy snapshot files being backed up

On Windows, make copies of files for snapshots, because
Go does not support the FILE_SHARE_DELETE flag which
allows files (and links) to be deleted while open. This
causes temporary directories to be left behind after
backups.

closes https://github.com/influxdata/influxdb/issues/16289

(cherry picked from commit 3702fe8e76)

closes https://github.com/influxdata/influxdb/issues/22559
2021-09-22 13:28:46 -07:00
Sam Arnold 51600c8300
chore: update protobuf library versions and remove influx_tsm (#21882) (#21891)
* chore: update protobuf library versions and remove influx_tsm (#21882)

* chore: update protobufs

* fix: run codegen during build

* fix: fully remove influx_tsm

* fix: codegen works

* fix: add tools.go for 1.8
2021-07-20 15:16:19 -04:00
davidby-influx bf45841359
chore(ae): add more logging (#21381)
tsdb.Engine.IsIdle and tsdb.Engine.Digest now return a reason string for why the engine & shard are not idle.
Callers can then use this string for logging, if desired. The returned reason does not allocate memory, so the
caller may want to add the shard ID and path for more information in the log. This is intended to be used in
calls from the anti-entropy service in Enterprise.
2021-05-07 12:55:58 -07:00
davidby-influx e234aa7ff0
fix: Anti-Entropy loops endlessly with empty shard (#21275) (#21290)
The anti-entropy service will loop trying to copy an empty shard to a
data node missing that shard.  This fix is one of two changes that
correctly create an empty shard on a new node. This fix will set the
LastModified date of an empty shard directory to the modification time
of that directory, instead of to the Unix epoch.

Fixes: https://github.com/influxdata/influxdb/issues/21273
(cherry picked from commit 7f300dc248)
2021-04-26 10:13:49 -07:00
Sam Arnold 79168cf671
refactor: separate coarse and fine permission interfaces (#20996) (#21035)
(cherry picked from commit b7e7de24d6)
2021-03-23 08:29:52 -04:00
Sam Arnold e95a6a4a4b
feat(inspect): Add report-disk for disk usage by measurement (#20917)
* feat(inspect): Add report-disk for disk usage by measurement

(cherry picked from commit a6152e8ac1)

* fix(inspect): bad pattern matching

(cherry picked from commit 3a31e2370e)

* chore: fix goimports

* chore: update changelog
2021-03-11 14:25:55 -05:00
davidby-influx 283ea0e1ec
fix(tsdb): minimize lock contention when adding new fields or measure (#20912)
fields.idx frequent writes cause lock contention and fields.idx is recreated
when a field or measurement is added in a WritePointsWithContext()
This eliminates locking during the actual file rewrite, and limits it to
the times when the MeasurementFieldSet is actually being read or written
in memory and when the new file is being renamed.

Test verification of correct behavior by checking the fields.idx
file matches the in-memory copy after heavily parallel measurement addition.

Fixes https://github.com/influxdata/influxdb/issues/20500

(cherry picked from commit fe3af66c54)
2021-03-09 16:10:08 -08:00
Daniel Moran 55fefdd2e4
fix(tsm1): fix data race when accessing tombstone stats (#20909) 2021-03-09 18:07:18 -05:00
davidby-influx 9c6e401372
feat: Make meta queries respect QueryTimeout values (#20910)
Meta queries (SHOW TAG VALUES, SHOW TAG KEYS, SHOW SERIES CARDINALITY, etc.) do not respect
the QueryTimeout config parameter. Meta queries should check the query context when possible
to allow cancellation and timeout. This will not be as frequent as regular queries, which
use iterators, because meta queries return data in batches.

Add a context.Context to
(*Store).MeasurementNames()
(*Store).MeasurementsCardinality()
(*Store).SeriesCardinality()
(*Store).TagValues()
(*Store).TagKeys()
(*Store).SeriesSketches()
(*Store).MeasurementsSketches()
which is tested for timeout or cancellation
to allow limitation of time spent in meta queries

https://github.com/influxdata/influxdb/issues/20736
(cherry picked from commit 092c7a9976)

* chore: move context.Context to first argument in methods per convention

(cherry picked from commit a8b2129df5)
2021-03-09 14:40:50 -08:00
davidby-influx 54d8d0180d
chore: run goimports on 1.8 branch to bring it up to new check-in standards (#20907)
Also manually edit imports section to meet our more granular conventions within the strictures of goimports.
2021-03-09 12:08:26 -08:00
davidby-influx b2cb862484
fix(error): SELECT INTO doesn't return error with unsupported value (#20429) (#20432)
* fix(error): SELECT INTO doesn't return error with unsupported value (#20429)

When a SELECT INTO query generates an illegal value that cannot be inserted,
like +/- Inf, it should return an error, rather than failing silently.
This adds a boolean parameter to the [data] section of influxdb.conf:
* strict-error-handling
When false, the default, the old behavior is preserved.  When true,
unsupported values will return an error from SELECT INTO queries

Fixes https://github.com/influxdata/influxdb/issues/20426

(cherry picked from commit 9e33be2619)

Fixes https://github.com/influxdata/influxdb/issues/20427
2020-12-30 19:18:58 -08:00
davidby-influx 4406b97250 chore(tsm1): fix formatting
Failed to format code before commit.
2020-11-15 21:42:31 -08:00
davidby-influx dfa6aa8cea fix(tsm1): "snapshot in progress" error during backup
Loop with backoff in (*Engine).CreateSnapshot() to retry
(*Engine).WriteSnapshot() up to 3 times if
ErrSnapshotInPrgress is returned.  Then continue
on no error or on SnapshotInProgress if skipCacheOk is
true.

https://github.com/influxdata/plutonium/issues/3227
2020-11-15 21:02:00 -08:00
davidby-influx 07a9c0e240 fix(tsm1): "snapshot in progress" error during backup
Test the skipCacheOk flag to tsdb.Shard.CreateSnapshot() and
tsdb.Engine.CreateSnapshot()
A value of true allows the backup to proceed even if a cache
snapshot cannot be taken.

https://github.com/influxdata/plutonium/issues/3227
(cherry picked from commit 0dcff81f56)
2020-11-12 21:13:16 -08:00
davidby-influx 196f60046e fix(tsm1): "snapshot in progress" error during backup
This fix adds a skipCacheOk flag to
tsdb.Store.CreateShardSnapshot() and tsdb.Shard.CreateSnapshot()
to pass to tsdb.Engine.CreateSnapshot()
A value of true allows the backup to proceed even if a cache snapshot
cannot be taken.
This flag is set to true in tsm1.Engine.Backup(), the OSS backup code path
This flag is set to false in tsm1.Engine.Export()

https://github.com/influxdata/plutonium/issues/3227
(cherry picked from commit 6ec446f422)
2020-11-12 21:12:53 -08:00
davidby-influx 0b1ee04f9f fix(tsm1): "snapshot in progress" error during backup
When an InfluxDB database is very busy writing new points the backup
the process can fail because it can not write a new snapshot.
The error is: operation timed out with error: create snapshot: snapshot in progress.
This happens because InfluxDB takes almost "continuously" a snapshot
from the cache caused by the high number of points ingested.
The fix for this was https://github.com/influxdata/influxdb/pull/16627
but it was for OSS only, and was not in the code path for backups
in clusters.
This fix adds a skipCacheOk flag to tsdb.Engine.CreateSnapshot().
A value of true allows the backup to proceed even if a cache snapshot
cannot be taken.
This flag is set to true in tsm1.Engine.Backup(), the OSS backup code path
and in tsdb.Shard.CreateSnapshot(), the cluster backup code path.
This flag is set to false in tsm1.Engine.Export()

https://github.com/influxdata/plutonium/issues/3227
(cherry picked from commit 23be20bf1b)
2020-11-12 21:12:32 -08:00
David Norton 86118d24f2 feat: allow disable compaction per shard
This feature allows compaction to be disabled on a per-shard basis by
creating a file named do_not_compact in a shard's directory. When
disabled, a message is logged every 15 minutes with the reason for
compaction being disabled (existance of the file). This makes it easy to
know if compaction has been disabled for any shards by searching the log
for "compaction disabled" or running "find path/to/data -type f -name
do_not_compact".
2020-10-19 15:36:13 -04:00
Ayan George 6b05f15d42
Merge pull request #19420 from influxdata/fix-unlocked-map-access (#19612)
fix: lock map before writes

Co-authored-by: David Norton <dgnorton@gmail.com>
2020-09-22 12:23:28 -04:00
Ayan George 7ff5b1c26a
chore: Quiet static analysis tools (#19509) (#19512)
* Remove redundant type in slice/array declarations.
* Call t.Fatal() from test-functions, not non-test go-routines.
* Remove unnecessary empty value operator from ranges.
* Call defer .Close() methods only after checking for error on Open().
2020-09-22 11:27:14 -04:00
Jonathan A. Sternberg ceead88bd5
fix(services/storage): multi measurement queries return all applicable series (#19592)
This fixes multi measurement queries that go through the storage service
to correctly pick up all series that apply with the filter. Previously,
negative queries such as `!=`, `!~`, and predicates attempting to match
empty tags did not work correctly with the storage service when multiple
measurements or `OR` conditions were included.

This was because these predicates would be categorized as "multiple
measurements" and then it would attempt to use the field keys iterator
to find the fields for each measurement. The meta queries for these did
not correctly account for negative equality operators or empty tags when
finding appropriate measurements and those could not be changed because
it would cause a breaking change to influxql too.

This modifies the storage service to use new methods that correctly
account for the above situations rather than the field keys iterator.

Some queries that appeared to be single measurement queries also get
considered as multiple measurement queries. Any query with an `OR`
condition will be considered a multiple measurement query.

This bug did not apply to single measurement queries where one
measurement was selected and all of the logical operators were `AND`
values. This is because it used a different code path that correctly
handled these situations.

Backport of #19566.
2020-09-21 14:09:07 -05:00
Ayan George 9d26f53d79
feat: Collect values written stats (#19187) (#19445)
* feat(engine/tsm1): Add WritePointsWithContext()

Add WritePontsWithContext() and make WritePoints() a thin wrapper for
it.

The purpose is to add statistics context values that we'll use to
propagate the number of fields and points written to calls up the call
chain.

* feat(tsdb): Add WriteToShardWithContext()

When applied, this patch adds WriteToShardWithContext() and wraps it
with WriteToShard() to preserve the API.

The the purpose of this addition is to propagate a context.Context value
to Shard.WritePointsWithContext().

* feat(tsdb/shard): Add WritePointsWithContext()

The purpose of adding WritePointsWithContext() is to propage context
values down to engine code and propage statistics via the context.Value
up to callers.

This patch also adds values written statistics to the shard.

* feat(http): Gather values written stats

WritePointsWithContext() was added to propagate context values down to
the engine and communicate stats to the caller.

* feat(http): Gather values written stats

WritePointsWithContext() was added to propagate context values down to
the engine and communicate stats to the caller.

* refactor: Change MetricKey to ContextKey

This patch gives the type we're useing for context keys a better name.
2020-08-26 13:37:45 -04:00
David Norton 6903a1bf02 fix(tsdb): Revert disable series id set cache size by default.
This reverts commit 9c41e12ee4.
2020-08-07 14:18:44 -04:00
Ayan George fe59e34940 fix: Handle snapshot related errors (#18710)
When applied this patch will:

* log snapshot directory removal errors

  Prior to this patch, errors when removing temporary snapshot
  directories happens silently.

  This patch ensures that errors are logged when os.RemoveAll() fails.

* refactor tsm1: Declare error value in condition

  Save a line of code and limits the scope of an error value.

* refactor tsm1: Add MakeSnapshotLinks()

  This commit adds (*FileStore).MakeSnapshotLinks().  The code in this
  function was originally part of CreateSnapshot().

  That code was hoisted out and into MakeSnapshotLinks() becuase there
  are two points of failure that require cleanup -- we have to delete a
  temporary directory on failure.

  Placing the code in one function allows us to check its returned error
  value and perform cleanup in only once place.

  In short, we hoisted code out of CreateSnapshot() to simplify error
  handling.

  On error, we remove any directories we created.
2020-07-02 17:22:48 -04:00
Ben Johnson 601dc7346a
Merge pull request #18687 from influxdata/batch-write-tombstones-when-deleting-1.8
perf(tsi1): batch write tombstone entries when dropping/deleting
2020-06-25 08:15:20 -06:00
dengzhi.ldz fbd0161954 perf(tsi1): batch write tombstone entries when dropping/deleting 2020-06-24 10:12:37 -06:00
dengzhi.ldz 7a084b9fc8 fix(tsi1): wait deleting epoch before dropping shard 2020-06-24 09:34:37 -06:00
Ben Johnson 90b7ea9c88 fix(tsdb): Disable series id set cache size by default.
This commit changes `DefaultSeriesIDSetCacheSize` to zero so that the
tag value cache is disabled by default. There is a rare known bug where
the cache can cause a segfault which crasheds the process. The cache
is being disabled instead of removed as some users may still need the
cache for performance reasons.
2020-05-29 08:28:59 -06:00
Ben Johnson d05fe9c218 fix(tsdb): Defer closing of underlying SeriesIDSetIterators
This commit changes the SeriesIDSet merge/union/intersect functions
to attach the underlying iterators as closers so that files can be
retained until the data is no longer in use. The roaring operations
can leave containers pointing at mmap data in the resulting bitmap
so we have to track underlying file usage until the data is finished
with.
2020-05-26 08:37:39 -06:00
Mustafa e49ff0a221
fix(tsdb): Replace panic with error while de/encoding corrupt data (#17570)
fixes #17440

While encoding or decoding corrupt data, the current behaviour is to `panic`.
This commit replaces the `panic` with `error` to be propagated up to the calling `iterator`.
To avoid overwriting other `error`, iterators now wraps a `TSMErrors` which contains ALL the encountered errors.
TSMErrors itself implements `Error()`, the returned string contains all the error msgs, separated by "," delimiter.
2020-04-03 01:57:00 +02:00
Ben Johnson 3f61a0d880 fix(tsdb): Revert "fix: remove some unsafe marshalling to reduce risk of segfault"
This reverts commit 30dab03310.
2020-03-31 15:31:33 -06:00
David Norton 25381f97c8
Merge pull request #15952 from influxdata/er-verify-tombstone
feat(inspect): add influx_inspect verify-tombstone tool
2020-03-11 15:56:37 -04:00
docmerlin (j. Emrys Landivar) 30dab03310 fix: remove some unsafe marshalling to reduce risk of segfault
We were seing segfaults in Roaring bitmaps sometimes, under very
high load with networked drives.  This may reduce risk of segfault by
forcing marshalling to copy the data.
2020-03-11 13:47:29 -04:00
Ayan George 5f47c388df
chore(influxdb): Forward port 16999 (#17032)
* fix: access tsi active log file with READ lock

The activeLogFile pointer may be altered by other routine so the READ
lock is needed.

* Merge pull request #16384 from foobar/tsi-partition-lock

fix: access tsi active log file with READ lock

Co-authored-by: Tristan Su <sooqing@gmail.com>
Co-authored-by: David Norton <dgnorton@gmail.com>
2020-03-04 16:20:58 -05:00
Edd Robinson c3f4382ed8
Merge pull request #16606 from influxdata/BP-1.8-er-tsm-block-fix
fix(storage): ensure all block data returned
2020-03-03 14:32:15 +00:00
Ben Johnson 7a9eb1420c fix(tsdb): Fix -compact-series-file flag 2020-02-06 13:40:19 -07:00
Gianluca Arbezzano a3e37f417d
Merge pull request #16627 from influxdata/feature/skip-wal-cache
chore(tsm1): skip WriteSnapshot during backup if snapshotter is busy
2020-02-04 21:44:48 +01:00
Gianluca Arbezzano 30621ca9ec chore(tsm1): skip WriteSnapshot during backup if snapshotter is busy
When an InfluxDB database is very busy writing new points the backup
the process can fail because it can not write a new snapshot.

The error is: `operation timed out with error: create snapshot: snapshot in progress`.

This happens because InfluxDB takes almost "continuously" a snapshot
from the cache caused by the high number of points ingested.

This PR skips snapshots if the `snapshotter` does not come available
after three attempts when a backup is requested.

The backup won't contain the data in the cache or WAL.

Signed-off-by: Gianluca Arbezzano <gianarb92@gmail.com>
2020-02-04 20:09:50 +01:00
Edd Robinson 34c0fdafc0 feat(storage): Offline series file compaction 2020-02-03 13:57:31 -07:00
David Norton 962cf6f4e7
Merge pull request #16595 from influxdata/fix/show-series-cardinality
fix(tsm1): improve series cardinality limit
2020-01-21 18:08:13 -05:00
David Norton 903d2c2d28 fix(tsm1): improve series cardinality limit
Prior to this change, new series would be added to the series file
before checking the series cardinality limit. If the limit was exceeded,
the write was rejected even though the series had already been added to
the series file.
2020-01-21 16:45:13 -05:00
Edd Robinson 22798fa290 fix(storage): ensure all block data returned
This commit prevents multiple blocks for the same series key having
values truncated when they are being read into an empty buffer.

The current cursor reader code has an optimisation that incorrectly
assumes the incoming array will be limited to 1,000 values (the maximum
block size), but arrays can contain values from multiple matching
blocks.
2020-01-21 18:00:11 +00:00
Sean Brickley fe55d728f0 fix(tsm1): Compaction log error 2020-01-13 20:05:05 -05:00
tmgordeeva f1d26652e9
fix(storage): skip TSM files with block read errors (#15885)
* fix(storage): skip TSM files with block read errors

When we find a bad TSM file during compaction, propagate the error up and move
the bad file aside. The engine will disregard the file so the next compaction
will not hit the same error.
2019-12-13 15:05:39 -08:00
Edd Robinson f7f19c5904 refactor(storage): add tombstone extension 2019-11-17 14:43:38 +00:00
Edd Robinson 196e46982b
Merge pull request #15861 from influxdata/BP-1.8-er-tsi-not-equal
fix(tsi1): index defect with negated equality filters
2019-11-13 18:15:01 +00:00
David Norton 102fcd671b fix(tsm1): make Digest() safe for concurrent use
This change adds a lock around digest creation so that it is safe for
concurrent calls. Prior to this change, calls from multiple goroutines
resulted in "Digest aborted, problem renaming tmp digest" errors.
2019-11-12 18:02:41 -05:00
Edd Robinson cac4c8956c fix(tsi1): index defect with negated equality filters
Fixes #15859

This commit fixes a defect in the TSI index where a filter using the
negated equality operator would result in no matching series being
returned for series stored within the `IndexFile` portions of the index.

The root cause of this was due to missing legacy-handling code in the
index for this particular iterator.
2019-11-12 15:10:42 +00:00
Jonathan A. Sternberg 40a7a577fc
Update flux version to v0.50.2
This upgrades the flux version to v0.50.2.

The secret service, which is used for alerts, is not included. The
`to()` function is also still not included.
2019-10-29 16:19:14 -05:00
elbehery a4bb1083f2 fix(storage): Renaming corrupt data files fails
fixes#14107
2019-10-28 17:32:58 +01:00