* chore: loadShards changes to more cleanly support 2.x feature (#25513)
* chore: move shardID parsing and shard filtering into walkShardsAndProcess
* chore: make it impossible to miss sending shardResponse or marking shard as complete
* chore: always count number of shards (preparation for 2.x related feature)
* chore: explicitly load series files and create indices serially
Explicitly load series files and create indices serially. Also
avoid passing them to work functions that don't need them.
* chore: rework loadShards for changes necessary to cancel loading process
* chore: comment improvements
* fix: fix race conditions in TestStore_StartupShardProgress and TestStore_BadShardLoading
* chore: avoid logging nil error
* chore: refactor shard loading and shard walking
Refactor loadShards and CreateShard to use a common shardLoader class that
makes thread-safety easier. Refactor walkShardsAndProcess into findShards.
* chore: improve comment
* chore: rename OpenShard to ReopenShard and implement with shardLoader
Rename Store.OpenShard to Store.ReopenShard and implement using a
shardLoader object. Changes to tests as necessary.
* chore: avoid resetting shard options and locking on Reopen
Avoid resetting shard options when reopening a shard.
Proper mutex locker in Shard.ReopenShard.
* chore: fix formatting issue
* chore: warn on mixed index types in Store.CreateShard
* chore: change from info to warn when invalid shard IDs found in path
* chore: use coarser locking in Store.ReopenShard
* chore: fix typo in comment
* chore: code simplification
(cherry picked from commit 0bc167bbd7)
* chore: fix logging issues in Store.loadShards
Fix reporting shards not opening correctly when they actually did.
Fix race condition with logging in loadShards.
(cherry picked from commit 65683bf166)
* chore: remove unnecessary fmt.Sprintf calls
Remove unnecessary fmt.Sprintf calls for static code checks in main-2.x.
(cherry picked from commit 8497fbf0af)
* chore: remove unnecessary blank identifier
* chore: remove unnecessary blank identifier
* feat: add option to flush WAL on shutdown
Add `--storage-wal-flush-on-shutdown` to flush WAL on database shutdown.
On successful shutdown, all WAL data will be committed to TSM files and the
WAL directories will not contain any .wal files.
Closes: #25422
* fix: prevent retention service from hanging (#25055)
Fix issue that can cause the retention service to hang waiting on a
`Shard.Close` call. When this occurs, no other shards will be deleted
by the retention service. This is usually noticed as an increase in
disk usage because old shards are not cleaned up.
The fix adds to new methods to `Store`, `SetShardNewReadersBlocked`
and `InUse`. `InUse` can be used to poll if a shard has active readers,
which the retention service uses to skip over in-use shards to prevent
the service from hanging. `SetShardNewReadersBlocked` determines if
new read access may be granted to a shard. This is required to prevent
race conditions around the use of `InUse` and the deletion of shards.
If the retention service skips over a shard because it is in-use, the
shard will be checked again the next time the retention service is run.
It can be deleted on subsequent checks if it is no longer in-use. If
the shards is stuck in-use, the retention service will not be able to
delete the shards, which can be observed in the logs for manual
intervention. Other shards can still be deleted by the retention service
even if a shard is stuck with readers.
This is a port of ad68ec8 from master-1.x to main-2.x.
closes: #25076
(cherry picked from commit b4bd607eef)
* fix: improve delete speed when a measurement is part of the predicate
* test: add test for deleting measurement by predicate
* chore: improve error messaging and capturing
* chore: set goland to use the right formatting style
Clamp the value of Store.MeasurementsCardinality so that it can not be less
than 0. This primarily shows up as a negative numMeasurements value in
/debug/vars under some circumstances.
refs #23285
(cherry picked from commit 160cf678d5)
* fix: remove nats for scraper processing
Scrapers now use go channels instead of NATS and interprocess communication.
This should fix#23085 .
Additionally, found and fixed#23106 .
* chore: fix formatting
* chore: fix static check and go.mod
* test: fix some flaky tests
* fix: mark NATS arguments as deprecated
* feat: Add WITH KEY to show tag keys
* fix: add tests for multi measurement tag value queries
* chore: fix linter problems
* chore: revert influxql changes to keep WITH KEY disabled
* chore: add TODO for moving flux tests to flux repo
Co-authored-by: Sam Arnold <sarnold@influxdata.com>
* feat: works with custom iterator
* feat: works with existing iterators
* chore: cleanup
* test: consistent assertions for tests
* fix: better log message if trying to filter on the value of a field key
* fix: comment for handling boolean literal; handle false boolean as well
* fix: make time range checking inclusive
tsdb.Engine.IsIdle and tsdb.Engine.Digest now return a reason string for why the engine & shard are not idle.
Callers can then use this string for logging, if desired. The returned reason does not allocate memory, so the
caller may want to add the shard ID and path for more information in the log. This is intended to be used in
calls from the anti-entropy service in Enterprise.
(cherry picked from commit bf45841359)
fixes https://github.com/influxdata/influxdb/issues/21448
(cherry picked from commit c8da9bafbf)
closes https://github.com/influxdata/influxdb/issues/21894
There are some problematic races that occur when deletes happen
against writes to the same points at the same time. This change
introduces guards and an epoch based system to coordinate these
modifications.
A guard matches a point based on the time, measurement name, and
some conditions loaded from an influxql expression. The intent
is to be as precise as possible without allowing any false
neagatives: if a point would be deleted, the guard must match it.
We are allowed to match more points than necessary, at the cost
of slowing down writes.
The epoch based system keeps track of outstanding writes and
deletes and their associated guards. When a delete operation
is going to start, it waits until all current writes are
done, and installs its guard, blocking all future writes that
contain points that may conflict with the delete. This allows
writes to disjoint points to proceed uncontended, and the
implementation is optimized for assuming there are few
outstanding deletes. For example, in the case that there are no
deletes, a write just has to take a mutex, bump a counter, and
compare a value against zero. The epoch trackers are per shard,
so that different shards never have to contend with one another.
TSI1 and inmem indexes have different properties during deletes.
Specifically, inmem shares a global index across all shards, where
every tsi1 index is contained to a specific shard. When deleting
a series, it may cause the last reference to the series across all
shards to be dropped, necessitating a removal from the series file.
Since the inmem index shares the index across all shards, removing
the series when it's removed from the series file is sufficient.
However, in the case of a mixed index database, if the last shard
is a TSI1 shard, the other inmem indexes are not available when we
discover that it was the last reference to the series. This ends
up leaving the series in the inmem index without a series id in
the series file, causing all sorts of misbehavior.
Rather than continue curling ourselves into a ball to try to fix
this unsupported mode, give a helpful error message to the user
that they must run their database in a non-mixed index mode to
allow deletes.
This commit ensures that any orphaned series (series that are to be
removed and no longer are referenced anywhere in the database) are
removed from the `inmem` index when a shard is dropped.
PR #9204 introduced a maximum default concurrent compaction limit of 4.
The idea was to reduce IO utilisation on large systems with many cores,
and high write load. Often on these systems, disks were not scaled
appropriately to to the write volume, and while the write path could
keep up, compactions would saturate disks.
In #9225 work was done to reduce IO saturation by limiting the
compaction throughput. To some extent, both #9204 and #9225 work towards
solving the same problem.
We have recently begun to notice larger clusters to suffer from
situations where compactions are not keeping up because they have been
scaled up, but the limit of 4 has stayed in place. While users can
manually override the setting, it seems more user friendly if we remove
the limit by default, and set it manually in cases where compactions are
causing too much IO on large boxes.