* feat(influxql): support TOP and BOTTOM functions
Add support for the TOP and BOTTOM functions which return the first
n rows in some ordered data set.
* fix: clippy
* refactor(influxql): use window aggregates for selectors
Change the implentation of ProjectionType::Selector to use a window
aggregate, rather than an aggregate with a custom selector function.
This is in preparation for implementing PERCENTILE.
* feat(influxql): PERCENTILE selector
Add a selector for the row containing the nth percentile of a
partition. This is the behaviour used when a single selector function
is used in an influxql query.
* feat(influxql): PERCENTILE aggregator
Add the PERCENTILE aggregation function for when the PERCENTILE
function is used in an aggregating projection. This implementation
buffers all non-null field values in memory in order to perform the
operation and therefore could be an expensive operation. This is
necessary for compatibility with earlier influxdb versions.
* refactor(influxql): move PERCENTILE implementation out of plan
The plan module is getting rather full of user-defined function
implementations. This breaks the new functions used to implement
percentile into some new top-level modules for aggregate and window
UDFs.
* fix: doc-lint
* chore: refactor `find_enumerated`
* chore: use `s` in format string
* chore: include the unexpected selector function in the error
* chore(influxql): review suggestions
Added some addition comments to help understanding.
Changed the handling os slector functions such that FIRST, LAST,
MAX & MIN behave the same as they did before PERCENTILE was added.
* chore(influxql): make percent_row_number a window UDF
Now that user-defined window functions are available make the
percent_row_number function be one of those. this allows the values
to be calculated for the entire window partition in one go.
For some reason the user-defined window function cannot return NULL
values. This function uses 0 where it would otherwise use NULL, as
row numbering starts at 1.
---------
Co-authored-by: Stuart Carnie <stuart.carnie@gmail.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: add `influxdb_iox debug build-catalog` command
* fix: tests
* fix: Use info! logs instead of println for status
* fix: Set partition_hash_id as well
* fix: remove leftover code
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
`println!` is:
- confusing: like why is everything colored but this few lines are
special?!
- annoying: you cannot filter them which esp. for debugging is a pain
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* refactor(iox_query_influxql): expand select projection
Change the SELECT projection in the planner to make it clearer how
each projection type works.
* feat(influxql): support TOP and BOTTOM functions
Add support for the TOP and BOTTOM functions which return the first
n rows in some ordered data set.
* fix: clippy
* chore: Use array / slice destructuring
* chore: review suggestion in iox_query_influxql/src/plan/planner.rs
Co-authored-by: Stuart Carnie <stuart.carnie@gmail.com>
---------
Co-authored-by: Stuart Carnie <stuart.carnie@gmail.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
This commit removes the op-level sequence number from the proto
definition, now reading and writing solely to the per table (and thus
per partition) sequence number map. Tables/partitions within the same
write op are still assigned the same number for now, so there should be
no semantic different
This makes using the tokio_unstable feature optional - the build
configuration defines what is enabled, rather than the other way around.
By default, and for prod builds and users who have not set a RUSTFLAGS,
the tokio_unstable is enabled and this is leveraged to provide the
metrics exposed by the flag. For builds where tokio_unstable is not
enabled, these metrics are not included, without causing a compilation
error.
CI will against the flag-enabled/prod code by virtue of being the
default.
* feat: metrics for main tokio runtime
* feat: instrument executor tokio runtime
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Do not (ab)use per-chunk delete predicates for the retention policy.
Instead use a per-table predicate.
This makes the code way cleaner, since the scoping is correct (i.e.
delete predicates are a table-wide attribute, not a chunk-based one) and
it is consistent time predicates that the user providers (e.g. via
`WHERE time > x`).
It also allows us to remove delete predicates (in their current,
non-scalable form) from the query path. A potential future version would
likely not use per chunk predicates (and "is processed" markers) but use
the timestamp / chunk order to determine to which data the predicate
should be applied.
Note that the lowering of the retention policy changed slightly from
```text
(time > (now() - retention)) AND (time < MAX)
```
to
```text
time > (now() - retention)
```
Since the `MAX` cut is just an artifact of the lowering and was unnecessary.
Closes#7409.
Closes#7410.
Add the DERIVATIVE and NON_NEGATIVE_DERIVATIVE functions to influxql.
These are used to calculate derivatives over arbitrary time units.
The implementation is modeled after the DIFFERENCE and
NON_NEGATIVE_DIFFERENCE functions, with a difference that the unit
parameters is a configuration of the user-defined aggregator function
and therefore there cannot be a single shared definition of the
function.
The NON_NEGATIVE_DIFFERENCE function implementation has been
refactored to be an arbitrary NON_NEGATIVE wrapper for any Accumulator
function.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: provide convenience methods to create Scheduler, and keep the scheduler implementations crate private. External crates can only create a Scheduler based upon configs.
* feat: provide Scheduler as a component to compactor. Specifically, the scheduler configs are present within the compactor run config, and the scheduler in created within the compactor hardcoded components.
* feat: within the compactor ScheduledPartitionsSource, utilize the dyn Scheduler and Scheduler.get_jobs()
* feat: CompactionJob should be per partition, and have a uniqueness characteristic independent of the partition
* feat: keep compactor_scheduler separate from clap_blocks. Only interface is within ioxd_compactor where the CLI configs are transformed into ShardConfig and PartitionsSourceConfig.
* chore: make IdOnlyPartitionFilter into only pub(crate)
* chore: update scheduler display to include any report information (a.k.a. shard_config, if present)
We already have a method that adds some default metrics / instruments to
the metric registry. Use that for jemalloc as well. This makes it easier
to follow how metrics are setup up for our prod binary.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* test: ensure that selectors check arg count
* feat: basic non-aggregates w/ InfluxQL selector functions
See #7533.
* refactor: clean up code
* feat: get more advanced cases to work
* docs: remove stale comments
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
This commit adds an `inspect` command to read through the sequenced
operations in a WAL file and debug pretty print their contents to
stdout, optionally filtering by a sequence number range.
* feat: Allow passing service protection limits in create db gRPC call
* fix: Move the impl into the catalog namespace trait
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* chore: Update DataFusion pin
* chore: Update cargo
* fix: update for API changes
* fix: Update plans
* chore: Update for new api
* fix: Update plans
* chore: Update for API changes more
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
This changes the e2e test to delete the WAL segment file, restart the
ingester and ensure the results returned by an ingester query after
feeding the regenerated line proto in are the same as those before.
This commit adds support for the CLI to query the namespace and schema
APIs to retrieve database and table names from the IDs found in WAL
entries being regenerated.
Instead, the type responsible for initialising it handles namespaced
`Write` initialisation and management, as well as the failure paths that
may need handling. This commit introduces a `NamespaceDemultiplexer`
type with a generic implementation allowing fallible `async` lazy init
of any type from a given `NamespaceId`. This paves the way for catalog-aware
initialisation of `LineProtoWriter`s.
* test: integration test for tracing of queries to the ingester
* chore: add FlightFrameEncodeRecorder to record spans per each polling result
* refactor(trace): impl TraceCollector for Arc
Allow any Arc-wrapped TraceCollector implementation to be used as a
TraceCollector. This avoids needing to as_any() and downcast later.
* test: assert FlightFrameEncodeRecorder trace spans
This test exercises the FlightDataEncoder wrapped with the trace
decorator (FlightFrameEncodeRecorder) when executing against a data
source that yields data after varying numbers of Stream polls.
This test passing will validate the FlightFrameEncodeRecorder correctly
instruments the amount of time a client spends waiting on the
FlightDataEncoder to acquire or encode a protocol frame, but also
ensures the decorator correctly accounts for varying behaviours allowed
through the Stream abstraction. It does this by simulating a data source
that is not always immediately ready to provide data, such as a buffer
wrapped in a contended async mutex.
* refactor: move tracing decorator into separate mod
* fix: record spans
* refactor(test): update test
The frame encoder is not one-to-one - it emits two frames for the first
data payload, a schema and a payload. This commit updates the test to
account for it!
* refactor: remove unneeded mut ref, and use enum state method which panics when in a (should be unreachable) state
* chore: add more docs to FlightFrameEncodeRecorder and related
---------
Co-authored-by: Dom Dwyer <dom@itsallbroken.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* refactor: consolidate pruning code
Let's have a single chunk pruning implementation in our code, not two.
Also removes a bit of crust from `QueryChunk` since it is technically no
longer responsible for pruning (this part has been pushed into the
querier for early pruning and bits for the `iox_query_influxrpc` for
some RPC shenanigans).
* test: regression test for incident
* fix: chunk pruning
* docs: add some test notes
This commit fixes loads of crates (47!) had unused dependencies, or
mis-configured dependencies (test deps as normal deps).
I added the "unused_crate_dependencies" to all crates to help prevent
this mess from growing again!
https://doc.rust-lang.org/beta/nightly-rustc/rustc_lint_defs/builtin/static.UNUSED_CRATE_DEPENDENCIES.html
This has the minor downside of false-positives when specifying
dev-dependencies for test/bench binaries - these are files in /test or
/benches (not normal tests). This commit includes a workaround,
importing them in lib.rs (gated by a feature flag). I think the
trade-off of better dependency management is worth it!
* refactor: remove line number from test
* refactor: clean up code
* fix: flaky `service::tests::test_log_on_panic`
This seems to be a really rare issue. So provoking it requires a lot of
tests. This can be done (using `fish`):
```console
$ while cargo with "parallel -n0 {bin} {args} ::: $(seq 10)" -- test -p service_grpc_influxrpc --lib -- --test-threads 20; echo; end
```
Fixes#7838.
* fix: panic handler construction
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>