When partition pruning is possible, it skips sending the data for
partitions that have no affect on the query outcome.
This commit does the same for the partition metadata - these frames can
form a significant portion of the query response when the row count is
low, and for pruned partitions have no bearing on the query result.
Benchmark the performance of concurrent queries against a single
partition, varying the number of concurrent queries and size of buffered
data in the partition.
Instead of refreshing every metric in the System every 10 seconds,
refresh only the disk statistics for the disk we're interested in.
Additionally resolve the parent disk for the directory path once,
instead of each loop.
* the metric attributes are hardcoded to the path
* the duration (frequency) of the background task is hardcoded
* the tick.await now occurs after the first metric recording, such that the test doesn't have to wait 15 seconds.
This adds extra test coverage for the ingester's WAL replay & RPC write
paths, as well as the WAL E2E tests, to ensure that all sequence numbers
present in a WriteOperation/WalOperation are encoded and present when
decoded.
This commit asks the oracle for a new sequence number for each table
batch of a write operation (and thus each partition of a write) when
handling an RPC write operation before appending the operation to the
WAL. The ingester now honours the sequence numbers per-partition when
WAL replay is performed.
This commit removes the op-level sequence number from the proto
definition, now reading and writing solely to the per table (and thus
per partition) sequence number map. Tables/partitions within the same
write op are still assigned the same number for now, so there should be
no semantic different
Prior to this change projection pushdown was implemented as a filter,
which meant a query using it would take the following steps:
* Query arrives
* Find necessary partition data
* Copy all the partition data into a RecordBatch
* Filter that RecordBatch to apply the projection
* Return results to caller
This is far from ideal, as the underlying partition data is copied in
its entirety and then the unneeded columns discarded - a pure waste!
After this PR, the projection is pushed down to the point of RecordBatch
generation:
* Query arrives
* Find necessary partition data
* Copy only the projected columns to a RecordBatch
* Return results to the caller
This minimises the amount of data copying, which for large amounts of
data should lead to a meaningful performance improvement when querying
for a subset of columns. It also uses a slightly more efficient
projection implementation by using a single pass over the columns (still
O(n) but less constant overhead).
Configure the partition pruning test to use a partition template that
partitions on the "region" field. This will allow it to be used for
pruning at query time.
Similar to #8109.
This was once implemented by the RUB but as it stands right now, no
chunk implements this anymore.
If we ever want to bring this back, we should use the output of
`QueryChunk::data` instead (i.e. use a data-based implementation instead
of a per-chunk one).
Closes#8096.
This interface was once specially implemented by the RUB. The only
actual implementation of it is within the querier that just forwards it
to a simple schema scan. Lift this semantic to `iox_query_influxrpc`
instead so all the chunks can use it.
If we ever want to optimize this again, we should use `QueryChunk::data`
instead (i.e. instead of implementing it within the chunk it should use
the data method and do something smart based on that).
First half of #8096.