When splitting windows into their own individual tables, do not wait for
empty tables to be read. Empty tables may never be read so waiting for
them can result in a deadlock.
This was found while writing a test case for in progress work and so a
test case for this condition will follow in a future commit.
This commit adds `mincore.Limiter` which throttles page faults caused
by mmap() data. It works by periodically calling `mincore()` to determine
which pages are not resident in memory and using `rate.Limiter` to
throttle accessing using a token bucket algorithm.
* feat(tsdb): SHOW TAG KEYS (no time) query using only TSI data.
* fix(tsdb): Allow for earlier return when scanning during show tag keys.
* fix(tsdb): Speed things up by using the key merger to reduce allocs.
* chore(tsm1): Fix golint.
* fix(tsdb): Remove sorting, because these keys should already be sorted.
* fix(tsdb): Remove dead code to placate the linter.
Previously windowed aggregate tables would construct the _start and
_stop arrays before checking if the underlying array cursor was empty.
The table would therefore report itself as non-empty even though
there were no points within the table's time range. Now we check if
the underlying array cursor is empty before we construct any output
arrays. If the cursor is empty, meaning the series is empty, the
table will be dropped.
Fixes https://github.com/influxdata/influxdb/issues/18704.
* refactor: migrator and introduce Store.(Create|Delete)Bucket
feat: kvmigration internal utility to create / managing kv store migrations
fix: ensure migrations applied in all test cases
* chore: update kv and migration documentation
* fix(storage): Push-down a predicate to match tags for SHOW MEASUREMENTS calls.
* chore: Address feedback.
* fix(tsm1): Split behavior based on existence of predicate for show measurements.
* fix(tsm1): Allow parenthesis expression on the LHS of a predicate.
* fix(tsm1): Create a separate tag predicate verifier that rejects negative comparisons.
* fix(tsm1): Additional test cases for show measurements with predicate.
Storage should not have a dependency on libflux. This commit removes
some constants that were used from the flux universe package and
replaces them with local copies so that storage no longer has a
dependecy on the universe package.
This commit adds the `WithWritePointsValidationEnabled()` option
to disable validation in `Engine.WritePoints()` as the same
validation is performed earlier in the call stack by cloud.
* feat(storage): first array cursor
* feat: add first and last to rpc messages
* test(launcher): push down group first and group last
* feat(storage): window first array cursor
* test(launcher): push down bare first and bare last
* feat(storage): add capabilities for group first and group last
* refactor: rename first to limit
* refactor: make zero value for every period meaningful
* refactor: standardize launcher pushdown tests
The tables produced by `storage/flux` didn't previously pass our table
tests. The `Empty()` call is supposed to return false if the table was
ever not empty, but reading the table or calling `Done()` would cause
the table implementations here to return that they were always empty.
This messes up the csv encoder which then believes that it just emitted
an empty table.
The table tests for valid table implementations states that this is an
error for the table implementation. This change introduces a simple test
for `ReadFilter` and also runs the table tests on the filter iterator.
This enables a new rule that will push down the full `aggregateWindow`
query including the `duplicate` and `window(every: inf)` that recombines
the tables. When the full rule is used, the table is not split into
tables for each window and instead retains itself as a single table. The
start or stop column is renamed to `_time` and `_start` and `_stop` will
be the boundaries of the query.
* feat: flags for pushing down new aggregates
* refactor: grouped aggregate rewrite rules
The storage operation ReadGroup aggregates per series on the storage
side. The planner will rewrite grouped aggregate queries to call
ReadGroup, which will perform a partial aggregation, followed by
another operation that will perform the rest of the aggregation on
the compute side.
* feat: storage capabilities for grouped aggregates
* fix: changes from review
* feat: group read operation name should include aggregate
This implements create empty for the window table reader and allows this
table read function to be used when it is specified. It will pass down
the create empty flag from the original window call into the storage
read function.
This also fixes the window table reader so it properly creates
individual tables for each window. Previously, it was constructing one
table for an entire series instead of one table per window.
Tests have been added to verify three edge case behaviors. The first is
the normal read operation where all values are present. The second is
when create empty is specified so null values may be created. The third
is with truncated boundaries to ensure that storage is read from and the
start and stop timestamps get correctly truncated.
The ResultSetToLineProtocol test class was not generating correct
line protocol for string output (appending `i`)
In addition, the PR improves the mock.NewResultSetFromSeriesGenerator
type with options. The one option added is `WithGeneratorMaxValues`,
to limit the total number of values produced by the SeriesGenerator.
The tags cache was not thread safe when called from multiple goroutines
at the same time. It was intended that it would be, but the locking was
done incorrectly and in too complicated a way. There was an assumption
that the LRU would only be updated from a single thread which wasn't
true at all.
The tags cache has now been updated to include some test cases that test
for race conditions and data validity. The tags cache itself has been
changed to follow a simpler algorithm.
1. Obtain a read lock.
2. Check if the cached array can be used.
3. Release the read lock.
4. If the above was unusable or did not exist, create an array for the
tag.
5. Obtain a write lock.
6. Check if the cached array should be replaced and replace if needed.
7. Move the entry to the front of the LRU.
8. Release the write lock.
This simpler algorithm should ensure that this code is correct and that
creating the array is still done outside of the lock since creating the
array is the most expensive operation of the ones above.
The capabilities interface will now return a mapping of capabilities to
a capabilities object. The capabilities object will contain a list of
features supported by the capability.
This modifies the read window aggregate interfaces to future-proof it
if and when we add additional capabilities to the method. Previously,
the interface was all or nothing. If we modified the RPC call itself, we
would have to make a new interface to denote the change to the Go code.
This changes the interface so now a `WindowAggregateCapability` exists.
This way, we can modify the struct to include things like:
```
type WindowAggregateCapability struct {
WindowPeriodCapability bool
MeanAggregateCapability bool
}
```
This way we can learn if the RPC call itself supports some specific
option. If the first iteration doesn't support a mean aggregate or the
mean aggregate is only supported by single server implementations, the
window aggregate can tell the caller that it won't be able to compute
the mean aggregate.
Since it fills in a struct with these capabilities, the struct can
safely introduce new values. If a downstream consumer wants to take
advantage of that functionality, then all interfaces in the chain have
to be updated to consume the upstream capabilities.
Added an interface for an additional storage capability. This interface
will allow for checking if the reader supports the window aggregate call
and another method for invoking the call if it does.
This is implemented using a single interface. If the reader implements
the interface, it indicates that the client is capable of reading the
response. The `HasXXX` method is intended to check if the store supports
the operation. This method also takes a context because it could require
a remote call or to wait for one.
This commit
* adds new request and response data types for schema gRPC calls
* adds fmt.Stringer implementation to cursors.FieldType
* adds APIs to sort a slice of MeasurementField values,
* upgrades the gogo protobuf package to v1.3.1, which
includes improvements to serialization.
The storage filters are modified to use the predicates directly so we do
not have to pass `semantic.FunctionExpression` around. Instead, since
simple expressions are all that are supported anyway, we transform
suitable function expressions into predicates as part of the push down
rule and this simplifies the influxdb reader code.
This also moves the storage predicate conversion code into the standard
library package as it is the only location that uses this code now that
the predicate conversion is done as part of the push down rule.
This refactor was prompted by another refactor of the
`semantic.FunctionExpression` that would cause it to always contain a
`semantic.Block`. Since the push down filter needs the expressions and
to combine them, this refactor allows us not do construct a combined
filter inside of blocks which allows us to have better type safety.
Filter cursors buffer points in between calls to Next() if the number
of read points exceeds 1000. Previously, this buffer was being cleared
out before being iterated over which caused queries to return a resultset
which had a number of rows divisable by 1000.
This change moves the clearing of the buffer until after the points have
been read. This change affects any queries which read more than 1000 points
from a single series & have a filter that can be successfully applied to at
least one of those points.