Adds a QueryExec decorator that transparently injects instrumentation
into an Ingester query response stream.
This captures the wall-clock duration of time a query stream has taken
to be read to completion (or aborted) by the caller, faceted by
stream completed / dropped and batch error / all OK.
Also records the distribution of row, record batch, and partition count
per query to quantify the amount of data being read per query.
This lets us use a SystemProvider without wrapping it in an Arc to
satisfy a Clone bound.
There's no reason to be wrapping this in an Arc and maintaining
refcounts for a stateless trait impl struct that doesn't have any data
to reference count or drop.
Remove an extraneous heap allocation / dynamic dispatch for each query -
the result type never changes, so there's no benefit to boxing the
returned stream.
* chore: Move to inline snapshots
* chore: Container for the DataFusion and IOx schema
* chore: Simplify using logical expression helper functions
* feat: Rewrite conditional expressions using InfluxQL rules
* feat: Add tests to validation conditional expression rewriting
* feat: Rewrite column expressions
* chore: Rewrite expression to use false when possible
This allows the planner to optimise away the entire logical plan to an
empty plan in many cases.
* feat: Complete cast postfix operator support
Added `unsigned` postfix operator, as the feature was mostly complete.
Closes#6895
* chore: Remove redundant attribute
Mocking out query responses requires constructing a PartitionResponse
containing the set of PartitionStream, itself a stream of RecordBatch.
This nested stream of structures is required to enable a pull-based /
streaming query response, but makes testing difficult because the types
are hard to initialise.
This commit adds a helper macro make_partition_stream! which when
combined with make_batch! to initialise the inner RecordBatch instances,
reduces the developer burden when writing test code that interacts with
query responses:
let stream = make_partition_stream!(
PartitionId::new(1) => [
make_batch!(
Int64Array("a" => vec![1, 2, 3, 4, 5]),
Float32Array("b" => vec![4.1, 4.2, 4.3, 4.4, 5.0]),
),
make_batch!(
Int64Array("c" => vec![1, 2, 3, 4, 5]),
),
],
PartitionId::new(2) => [
make_batch!(
Float32Array("d" => vec![1.1, 2.2, 3.3, 4.4, 5.5]),
),
],
);
The above yields a PartitionStream containing two partitions, with their
respective RecordBatch instances.
I always find it tedious initialising a RecordBatch (including its
schema) with a given set of rows/columns - this macro simplifies it to:
let (batch, schema) = make_batch!(
Int64Array("a" => vec![1, 2, 3, 4]),
Float32Array("b" => vec![4.1, 4.2, 4.3, 4.4]),
);
Resulting in a 4 row, 2 column ("a" and "b") RecordBatch & Schema.
* feat: initial implementation of the split
* feat: split many L0 files in groups and compact them into new and fewer L0 files
* test: remove iappropriate AllAtOnce test
* refactor: move file classification for initial target to its own function
* fix: pop the branch from start to end
* chore: address review comments
* feat: support splitting to many L1 files
* feat: only add extra round to compact level-n files to same level-n files if their files plus overlapped level-n-plus-1 over limit
* chore: Apply suggestions from code review
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
* chore: final cleanup and address comments
* chore: run fmt
---------
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* chore: Update datafusion
* chore: update the plans
* fix: update some plans
* chore: Update plans and port some explain plans to use insta snapshots
* fix: another plan
* chore: Run cargo hakari tasks
---------
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>