By default this feature is disabled; the full compaction behaviour does
not change. When this feature is enabled compactions can be limited
across multiple storage engines running in multiple processes.
The mechanism by which this happens is not part of the abstraction added
here.
* test(storage): ensure multiple engines can run concurrently
* feat(storage): expose control over retention run
Fixes#15134.
This commit adds the ability to inject a functional option into a
storage.Engine for controlling when the retention enforcer can run.
Previously, retention enforcers ran on an interval; if you ran multiple
storage engines (as we do in some environments) then it was not possible
to coordinate when engines ran retention. Often they would synchronise
because they started at the same time.
This change will let you specify a blocking function to control when the
retention enforcer can run.
A simple function for serialising retention enforcement across multiple
storage engines could look like:
```go
var mu sync.Mutex
func f() (done func()) {
mu.Lock()
return func() { mu.Unlock() }
}
```
Adds the ability to set the current generation to use when compacting
the cache only. Previously, we used the current generation for all
files but this causes issues and we should only use the current
generation for level 1 compaction.
I don't see anywhere obvious that an engine would be closed twice, but
if it was, the RLock would have been held permanently, such that a Lock
could not be taken later.
Running go test ./storage/... did not trigger a double-close.
The storage engine will now drop any points that contain invalid tag
data. Special tag keys for the measurement and field key will be
excepted from this validation.
We will want to validate that all tag key/value data is valid unicode.
This commit changes the validation helper to only validate provided
tags, since measurements are currently very likely to contain invalid
utf-8 characters.
There are two exceptions to the tag validation: the validation of the
special tag keys for measurements and field keys.
When the WAL was moved up, the validation that happened at the cache
was skipped. This moves the field type validation for a batch of
points up ahead of the WAL again.
It turns out that LastModified and DiskSize are unused, and so it
was easy to change to not care about the WAL.
This hooks up metrics and starts the WAL again.
At the cost of some nil checks, we don't have to have an interface, defend against
subtle bugs with nils in non-nil interfaces, an empty implementation, etc.
Also, the tsm1 engine is losing the WAL anyway.
Because the WAL relies on the tsm1.Value type, we move that into its own
tsm1/value package and set up some aliases forwarding them into tsm1. This
also required adding some methods and changing consumers to avoid the
unexported fields. I imagine this step will be useful one day when we make
the write path more efficient with respect to consuming points.
This commit additionally fixes some issues with generation. The iterator.tmpldata
and generation for array_cursor_* were removed accidentally when removing
iterators, making those generated files stale. Restore that and regenerate.
No change in functionality.
This commit improves the performance of a mass delete on the TSI index
by deleting at the measurement level instead of deleting each series
individually.
I did this with a dumb editor macro, so some comments changed too.
Also rename root package from platform to influxdb.
In interest of minimizing risk, anyone importing the root package has
now aliased it to "platform" so that no changes beyond imports were
necessary in those files.
Lastly, replace the old platform module to local path /dev/null so that
nobody can accidentally reintroduce a platform dependency while
migrating platform code to influxdb.