* fix: prevent retention service from hanging
Fix issue that can cause the retention service to hang waiting on a
`Shard.Close` call. When this occurs, no other shards will be deleted
by the retention service. This is usually noticed as an increase in
disk usage because old shards are not cleaned up.
The fix adds to new methods to `Store`, `SetShardNewReadersBlocked`
and `InUse`. `InUse` can be used to poll if a shard has active readers,
which the retention service uses to skip over in-use shards to prevent
the service from hanging. `SetShardNewReadersBlocked` determines if
new read access may be granted to a shard. This is required to prevent
race conditions around the use of `InUse` and the deletion of shards.
If the retention service skips over a shard because it is in-use, the
shard will be checked again the next time the retention service is run.
It can be deleted on subsequent checks if it is no longer in-use. If
the shards is stuck in-use, the retention service will not be able to
delete the shards, which can be observed in the logs for manual
intervention. Other shards can still be deleted by the retention service
even if a shard is stuck with readers.
closes: #25054
ArrayCursors were ignoring errors, which led to panics when nil
cursors were operated on. This fix passes errors back up the stack
and uses them to enforce healthy cursor creation.
Closes https://github.com/influxdata/influxdb/issues/24789
---------
Co-authored-by: Stuart Carnie <stuart.carnie@gmail.com>
* chore: upgrade Go to 1.19.3
This re-runs ./generate.sh and ./checkfmt.sh to format and update
source code (this is primarily responsible for the huge diff.)
* fix: update tests to reflect sorting algorithm change
Instead of writing out the complete fields.idx
file when it changes, write out incremental
changes that will be applied to the file on
close and startup.
closes https://github.com/influxdata/influxdb/issues/23653
If os.Link fails with syscall.ENOTSUP, then the file
system does not support links, and we must make copies
to snapshot files for backup. We also automatically make
copies instead of link on Windows, because although it
makes links, their semantics are different from Linux.
closes https://github.com/influxdata/influxdb/issues/16739
On Windows, make copies of files for snapshots, because
Go does not support the FILE_SHARE_DELETE flag which
allows files (and links) to be deleted while open. This
causes temporary directories to be left behind after
backups.
closes https://github.com/influxdata/influxdb/issues/16289
The tsmBatchKeyIterator discards excessive errors to avoid
out-of-memory crashes when compacting very corrupt files.
Any error beyond DefaultMaxSavedErrors (100) will be
discarded instead of appended to the error slice.
closes https://github.com/influxdata/influxdb/issues/22328
When the compaction planner runs, if it cannot acquire
a lock on the files it plans to compact, it returns a
nil list of compaction groups. This, in turn, sets the
engine statistics for compactions queues to zero,
which is incorrect. Instead, use the length of pending
files which would have been returned.
closes https://github.com/influxdata/influxdb/issues/22138
Compaction logging will generate intermediate information on
volume of data written and output files created, as well as
improve some of the anti-entropy messages related to compaction.
This will also apply to `influx_tools compact`
Closes https://github.com/influxdata/influxdb/issues/21704
tsm1.DigestWithOptions closes its network connection
twice. This may cause broken pipe errors on concurrent
invocations of the same procedure, by closing a reused
i/o descriptor. This fix also captures errors from TSM
file closures, which were previously ignored.
Closes https://github.com/influxdata/influxdb/issues/21656
Under heavy write load creating new fields and measurements
the rewrite of the fields.idx file is a bottleneck. This
enhancement combines multiple writes into a single one and
shares any error return value with all of the combined
invocations. MeasurementFieldSet and the new
MeasurementFieldSetWriter must both now be explicitly
closed.
Closes#21577
tsdb.Engine.IsIdle and tsdb.Engine.Digest now return a reason string for why the engine & shard are not idle.
Callers can then use this string for logging, if desired. The returned reason does not allocate memory, so the
caller may want to add the shard ID and path for more information in the log. This is intended to be used in
calls from the anti-entropy service in Enterprise.
(cherry picked from commit bf45841359)
fixes https://github.com/influxdata/influxdb/issues/21448
* fix: backport tsdb fix for window pushdowns
From https://github.com/influxdata/influxdb/pull/19855
* fix(storage): cursor requests are [start, stop] instead of [start, stop)
The cursors were previously [start, stop) to be consistent with how flux
requests data, but the underlying storage file store was [start, stop]
because that's how influxql read data. This reverts back the cursor
behavior so that it is now [start, stop] everywhere and the conversion
from [start, stop) to [start, stop] is performed when doing the cursor
request to get the next cursor.
cherry-pick from #21318
Co-authored-by: Sam Arnold <sarnold@influxdata.com>
(cherry picked from commit 7766672797)
* chore: fix formatting
Co-authored-by: Jonathan A. Sternberg <jonathan@influxdata.com>
The anti-entropy service will loop trying to copy an empty shard to a
data node missing that shard. This fix is one of two changes that
correctly create an empty shard on a new node. This fix will set the
LastModified date of an empty shard directory to the modification time
of that directory, instead of to the Unix epoch.
Fixes: https://github.com/influxdata/influxdb/issues/21273
This is a backport of #14262 to the 1.x storage engine.
This also ports the table tests that existed with the pre-beta version of the
storage engine to the one that is now used in the production version.
A few of the tests are skipped. These are portions of the storage engine
that have not been ported over. They should be unskipped when that
functionality is ported over.
Co-authored-by: Jonathan A. Sternberg <jonathan@influxdata.com>
* feat(query): hyper log log counting in query engine
In addition to helping with normal queries, this can improve the 'SHOW CARDINALITY'
meta-queries:
time influx -database mydb -execute 'select count_hll(sum_hll(_seriesKey)) from big'
name: big
time count_hll
---- ---------
0 200767781
influx -database mydb -execute 0.06s user 0.12s system 0% cpu 8:49.99 total
Extending the context instead of fixing the API breaks type safety.
For tracking the number of points / values written, it is much clearer
to pass an explicit tracker.
When using queries like 'select count(_seriesKey) from bigmeasurement`, we
should iterate over the tsi structures to serve the query instead of loading
all the series into memory up front.
Closes#20543
Loop with backoff in (*Engine).CreateSnapshot() to retry
(*Engine).WriteSnapshot() up to 3 times if
ErrSnapshotInPrgress is returned. Then continue
on no error or on SnapshotInProgress if skipCacheOk is
true.
https://github.com/influxdata/plutonium/issues/3227
(cherry picked from commit dfa6aa8cea)
Test the skipCacheOk flag to tsdb.Shard.CreateSnapshot() and
tsdb.Engine.CreateSnapshot()
A value of true allows the backup to proceed even if a cache
snapshot cannot be taken.
https://github.com/influxdata/plutonium/issues/3227
This fix adds a skipCacheOk flag to
tsdb.Store.CreateShardSnapshot() and tsdb.Shard.CreateSnapshot()
to pass to tsdb.Engine.CreateSnapshot()
A value of true allows the backup to proceed even if a cache snapshot
cannot be taken.
This flag is set to true in tsm1.Engine.Backup(), the OSS backup code path
This flag is set to false in tsm1.Engine.Export()
https://github.com/influxdata/plutonium/issues/3227
When an InfluxDB database is very busy writing new points the backup
the process can fail because it can not write a new snapshot.
The error is: operation timed out with error: create snapshot: snapshot in progress.
This happens because InfluxDB takes almost "continuously" a snapshot
from the cache caused by the high number of points ingested.
The fix for this was https://github.com/influxdata/influxdb/pull/16627
but it was for OSS only, and was not in the code path for backups
in clusters.
This fix adds a skipCacheOk flag to tsdb.Engine.CreateSnapshot().
A value of true allows the backup to proceed even if a cache snapshot
cannot be taken.
This flag is set to true in tsm1.Engine.Backup(), the OSS backup code path
and in tsdb.Shard.CreateSnapshot(), the cluster backup code path.
This flag is set to false in tsm1.Engine.Export()
https://github.com/influxdata/plutonium/issues/3227
This feature allows compaction to be disabled on a per-shard basis by
creating a file named do_not_compact in a shard's directory. When
disabled, a message is logged every 15 minutes with the reason for
compaction being disabled (existance of the file). This makes it easy to
know if compaction has been disabled for any shards by searching the log
for "compaction disabled" or running "find path/to/data -type f -name
do_not_compact".
The original version of verifyVersion() reads into a byte slice,
manually ensures its byte order, then converts it to a type comparable
with Version and MagicNumber.
This patch hides those details by calling binary.Read() and reading
values into properly typed variables.
This adds a bit of overhead but this code isn't in the hot-path and this
patch greatly simplifies the code.
verifyVersion() originally accepted an io.ReadSeeker. It is only called
in once place and that function immediately calls seek after
verifyVersion(), therefore it is probably safe to call Seek() BEFORE
verifyVersion().
The benefit is that verifyVersion() is easier to test since we can pass
it a bytes.Buffer.
This patch adds a test for verifyVersion() as well as a benchmark.
benchmark old ns/op new ns/op delta
BenchmarkVerifyVersion-8 73.5 123 +67.35%
Finally, this commit moves verifyVersion() from writer.go to reader.go
which is where it is actually used.
* feat(engine/tsm1): Add WritePointsWithContext()
Add WritePontsWithContext() and make WritePoints() a thin wrapper for
it.
The purpose is to add statistics context values that we'll use to
propagate the number of fields and points written to calls up the call
chain.
* feat(tsdb): Add WriteToShardWithContext()
When applied, this patch adds WriteToShardWithContext() and wraps it
with WriteToShard() to preserve the API.
The the purpose of this addition is to propagate a context.Context value
to Shard.WritePointsWithContext().
* feat(tsdb/shard): Add WritePointsWithContext()
The purpose of adding WritePointsWithContext() is to propage context
values down to engine code and propage statistics via the context.Value
up to callers.
This patch also adds values written statistics to the shard.
* feat(http): Gather values written stats
WritePointsWithContext() was added to propagate context values down to
the engine and communicate stats to the caller.
* feat(http): Gather values written stats
WritePointsWithContext() was added to propagate context values down to
the engine and communicate stats to the caller.
* refactor: Change MetricKey to ContextKey
This patch gives the type we're useing for context keys a better name.
When applied this patch will:
* log snapshot directory removal errors
Prior to this patch, errors when removing temporary snapshot
directories happens silently.
This patch ensures that errors are logged when os.RemoveAll() fails.
* refactor tsm1: Declare error value in condition
Save a line of code and limits the scope of an error value.
* refactor tsm1: Add MakeSnapshotLinks()
This commit adds (*FileStore).MakeSnapshotLinks(). The code in this
function was originally part of CreateSnapshot().
That code was hoisted out and into MakeSnapshotLinks() becuase there
are two points of failure that require cleanup -- we have to delete a
temporary directory on failure.
Placing the code in one function allows us to check its returned error
value and perform cleanup in only once place.
In short, we hoisted code out of CreateSnapshot() to simplify error
handling.
On error, we remove any directories we created.