Commit Graph

621 Commits (729851be580e8a7813c0f3fe3e574f7cb20a6593)

Author SHA1 Message Date
Martin Hilton 9111cd517f
feat(influxql): PERCENTILE function (#8187)
* feat(influxql): support TOP and BOTTOM  functions

Add support for the TOP and BOTTOM functions which return the first
n rows in some ordered data set.

* fix: clippy

* refactor(influxql): use window aggregates for selectors

Change the implentation of ProjectionType::Selector to use a window
aggregate, rather than an aggregate with a custom selector function.
This is in preparation for implementing PERCENTILE.

* feat(influxql): PERCENTILE selector

Add a selector for the row containing the nth percentile of a
partition. This is the behaviour used when a single selector function
is used in an influxql query.

* feat(influxql): PERCENTILE aggregator

Add the PERCENTILE aggregation function for when the PERCENTILE
function is used in an aggregating projection. This implementation
buffers all non-null field values in memory in order to perform the
operation and therefore could be an expensive operation. This is
necessary for compatibility with earlier influxdb versions.

* refactor(influxql): move PERCENTILE implementation out of plan

The plan module is getting rather full of user-defined function
implementations. This breaks the new functions used to implement
percentile into some new top-level modules for aggregate and window
UDFs.

* fix: doc-lint

* chore: refactor `find_enumerated`

* chore: use `s` in format string

* chore: include the unexpected selector function in the error

* chore(influxql): review suggestions

Added some addition comments to help understanding.

Changed the handling os slector functions such that FIRST, LAST,
MAX & MIN behave the same as they did before PERCENTILE was added.

* chore(influxql): make percent_row_number a window UDF

Now that user-defined window functions are available make the
percent_row_number function be one of those. this allows the values
to be calculated for the entire window partition in one go.

For some reason the user-defined window function cannot return NULL
values. This function uses 0 where it would otherwise use NULL, as
row numbering starts at 1.

---------

Co-authored-by: Stuart Carnie <stuart.carnie@gmail.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-07-11 05:33:16 +00:00
Fraser Savage dec0244bff
refactor(e2e): Wait 100ms between queries in debug::build_catalog test 2023-07-10 15:27:30 +01:00
Fraser Savage 0978aa0551
fix(e2e): Add small busy-loop to debug::build_catalog test to assert only on non-empty results 2023-07-10 15:13:37 +01:00
Andrew Lamb 3ce11d8d66
chore: Update DataFusion (#8190)
* chore: Update DataFusion

* chore: Run cargo hakari tasks

* fix: Update for API changes

* fix: use display format

* chore: Update explain plan output

* fix: update plans

---------

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-07-10 09:54:50 +00:00
Andrew Lamb 048fc32bd5
feat: add `influxdb_iox debug build-catalog` command (#8067)
* feat: add `influxdb_iox debug build-catalog` command

* fix: tests

* fix: Use info! logs instead of println for status

* fix: Set partition_hash_id as well

* fix: remove leftover code

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-07-07 18:32:27 +00:00
Stuart Carnie 1ca547b313
fix: Teach planner to rewrite binary expressions for div operator
Specifically when the operands are integers, to match InfluxQL OG
2023-07-07 11:22:03 +10:00
Martin Hilton dfffdc1d90
feat(influxql): support TOP and BOTTOM functions (#8143)
* refactor(iox_query_influxql): expand select projection

Change the SELECT projection in the planner to make it clearer how
each projection type works.

* feat(influxql): support TOP and BOTTOM  functions

Add support for the TOP and BOTTOM functions which return the first
n rows in some ordered data set.

* fix: clippy

* chore: Use array / slice destructuring

* chore: review suggestion in iox_query_influxql/src/plan/planner.rs

Co-authored-by: Stuart Carnie <stuart.carnie@gmail.com>

---------

Co-authored-by: Stuart Carnie <stuart.carnie@gmail.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-07-06 07:08:45 +00:00
Marco Neumann 70b44f78ee
test: correctly decode ingester reponses in end2end tests 2023-07-03 17:25:01 +02:00
Marco Neumann b1a4e3955e
test: `ingester_partition_pruning` must perform type coercion 2023-07-03 17:25:00 +02:00
Carol (Nichols || Goulding) cd28bf0337
test: Query an ingester with a predicate that should prune partitions 2023-07-03 17:24:58 +02:00
Dom Dwyer e5a9e1534a
test: assert 1 file persisted
There should be a single file persisted during graceful shutdown.
2023-07-03 15:51:02 +02:00
Dom Dwyer 5d0c172e61
test(e2e): query shutdown-persisted files
Ensure buffered ingester data is persisted and remains queryable after a
graceful ingester shutdown.
2023-07-03 15:51:02 +02:00
Marco Neumann 4638b89d93
refactor: migrate retention to proper predicates (#8092)
Do not (ab)use per-chunk delete predicates for the retention policy.
Instead use a per-table predicate.

This makes the code way cleaner, since the scoping is correct (i.e.
delete predicates are a table-wide attribute, not a chunk-based one) and
it is consistent time predicates that the user providers (e.g. via
`WHERE time > x`).

It also allows us to remove delete predicates (in their current,
non-scalable form) from the query path. A potential future version would
likely not use per chunk predicates (and "is processed" markers) but use
the timestamp / chunk order to determine to which data the predicate
should be applied.

Note that the lowering of the retention policy changed slightly from

```text
(time > (now() - retention)) AND (time < MAX)
```

to

```text
time > (now() - retention)
```

Since the `MAX` cut is just an artifact of the lowering and was unnecessary.

Closes #7409.
Closes #7410.
2023-06-29 08:36:37 +00:00
Martin Hilton 511a0bae78
feat(influxql): add derivative and non_negative_derivative (#8103)
Add the DERIVATIVE and NON_NEGATIVE_DERIVATIVE functions to influxql.
These are used to calculate derivatives over arbitrary time units.
The implementation is modeled after the DIFFERENCE and
NON_NEGATIVE_DIFFERENCE functions, with a difference that the unit
parameters is a configuration of the user-defined aggregator function
and therefore there cannot be a single shared definition of the
function.

The NON_NEGATIVE_DIFFERENCE function implementation has been
refactored to be an arbitrary NON_NEGATIVE wrapper for any Accumulator
function.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-29 05:53:18 +00:00
Marco Neumann 178483c1a0
feat: basic non-aggregates w/ InfluxQL selector functions (#8016)
* test: ensure that selectors check arg count

* feat: basic non-aggregates w/ InfluxQL selector functions

See #7533.

* refactor: clean up code

* feat: get more advanced cases to work

* docs: remove stale comments

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-23 08:05:50 +00:00
Stuart Carnie 7b4a1a0660
chore: PR feedback
Add tests for fewer rows than N for `moving_average`

See: https://github.com/influxdata/influxdb_iox/pull/8023#discussion_r1237298376
2023-06-22 12:15:47 +10:00
Stuart Carnie 13726c2a76
Merge branch 'main' into sgc/issue/7600_moving_average 2023-06-22 10:10:22 +10:00
Marco Neumann 83a5037e61
feat: query support for custom partitioning (#8025)
* feat: querier-specific stat creation routine

* feat: prune querier chunks using partition col ranges

* feat: add table client

* test: custom partitioning

* fix: correctly set up stats for chunks with col subsets

* fix: flaky test

* refactor: remove obsolete dead_code markers

* feat: add partition template to `create_namespace`

* test: extend custom partitioning end2end tests

* fix: explain shuffling, make it actual deterministic
2023-06-21 09:03:19 +00:00
Stuart Carnie 2cbaf9cffa
chore: more tests, renamed avg_n → moving_average 2023-06-21 15:05:08 +10:00
Stuart Carnie edaac28498
Merge branch 'main' into sgc/issue/7600_moving_average 2023-06-21 11:39:06 +10:00
wiedld 34b5fadde0
refactor: move scheduler related configs to compactor_scheduler (#8013) 2023-06-20 09:55:35 -07:00
Stuart Carnie a2521bbf35
feat: moving_average, difference and non_negative_difference
There is a `todo` regarding `update_batch` to be discussed with @alamb
2023-06-20 16:37:28 +10:00
Stuart Carnie 8670b28445
Merge branch 'main' into sgc/issue/7600_moving_average 2023-06-18 09:41:19 +10:00
Andrew Lamb 5889c96501
chore: Update `datafusion` and other dependencies (#7981)
* chore: Update DatFaFusion pin

* chore: Update other dependencies

* chore: Update hakari

* fix: Update for API changes

* fix: Update explain plan

* fix: Update influxql plans

* fix: rustdoc links

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-16 09:48:55 +00:00
Stuart Carnie 2407be8062
feat: trialed retractable UDAF
Unfortunately, this is not suitable when the source data has nulls,
as InfluxQL OG ignores these values.
2023-06-16 13:10:47 +10:00
Fraser Savage 73c0c28bd0
feat(cli): Add `influxdb_iox debug wal inspect` command
This commit adds an `inspect` command to read through the sequenced
operations in a WAL file and debug pretty print their contents to
stdout, optionally filtering by a sequence number range.
2023-06-09 18:16:57 +01:00
Marko Mikulicic d26ad8e079
feat: Allow passing service protection limits in create db gRPC call (#7941)
* feat: Allow passing service protection limits in create db gRPC call

* fix: Move the impl into the catalog namespace trait

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-08 14:28:32 +00:00
Andrew Lamb 17c0d837b3
chore: Update DataFusion, arrow, object_store pins (#7942)
* chore: Update DataFusion, arrow, object_store pins

* chore: Update for hakari

* chore: Update for new APIs

* fix: update test

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-07 17:08:31 +00:00
Stuart Carnie c18902b05e
Merge branch 'main' into sgc/issue/7829_time_bounds_3 2023-06-07 08:51:38 +10:00
Nga Tran a2f5f37b2e
test: turn interval 0 test on after upgrading DF with the fix (#7938)
* test: turn interval 0 test on after upgrading DF with the fix

* chore: remove obsolete comments
2023-06-06 15:50:54 +00:00
Stuart Carnie f114842711
feat: Push outer query time-range to subqueries
Added additional end-to-end tests to validate time-range behaviour
2023-06-06 16:33:01 +10:00
Stuart Carnie 9e2550c933
Merge branch 'main' into sgc/issue/7829_time_bounds_3
# Conflicts:
#	iox_query_influxql/src/plan/planner.rs
2023-06-06 12:55:43 +10:00
Andrew Lamb f571aeb445
chore: Update DataFusion pin (#7916)
* chore: Update DataFusion pin

* chore: Update cargo

* fix: update for API changes

* fix: Update plans

* chore: Update for new api

* fix: Update plans

* chore: Update for API changes more

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-05 18:38:59 +00:00
Stuart Carnie d8c2f2c679
refactor: Simplify `TimeRange` to match InfluxQL OG behaviour explicitly 2023-06-05 15:14:13 +10:00
Stuart Carnie 28166006a8
chore: clippy 2023-06-04 06:56:19 +10:00
kodiakhq[bot] 1d6fd83a9a
Merge branch 'main' into savage/wal-regenerate-lp-catalog-support 2023-06-02 14:23:55 +00:00
Fraser Savage e9b5708c70
refactor(cli): Perform `regenerate-lp` using a sorted output comparison
Query the ingester directly through the test cluster to allow for less
brittle assertion of results.
2023-06-02 13:43:44 +01:00
Fraser Savage 50797b6967
test(cli): Assert writing `regenerate-lp` output produces same query results
This changes the e2e test to delete the WAL segment file, restart the
ingester and ensure the results returned by an ingester query after
feeding the regenerated line proto in are the same as those before.
2023-06-02 12:45:52 +01:00
Marco Neumann efbaf455a0
feat: `selector_first` with additional args (#7898)
* feat: `selector_first` with additional args

Foundation for #7533.

* test: `selector_first` malformed args

* docs: explain type handling
2023-06-02 10:08:21 +00:00
Nga Tran 21752cfb69
test: reproducer for panic bug attempt to calculate the remainder with a divisor of zero (#7903)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-01 15:43:24 +00:00
Stuart Carnie 600ed6652c
refactor: rewrite time-range expressions to a single range
Fixes gap filling, which was confused by multiple lower or upper
time bounds.
2023-05-30 15:46:45 +10:00
Christopher M. Wolff 2a07b53879
feat: add more tag predicate rewrite logic for InfluxQL (#7869)
* feat: add more tag predicate rewrite logic for InfluxQL

* chore: cargo fmt

* chore: fmt

* test: add more tests

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-05-26 21:53:52 +00:00
Fraser Savage bf031641c5
feat(cli): Add measurement name lookup to `wal regenerate-lp` command
This commit adds support for the CLI to query the namespace and schema
APIs to retrieve database and table names from the IDs found in WAL
entries being regenerated.
2023-05-26 17:31:19 +01:00
wiedld 7bcde3c544
chore(7618): trace ingester response encoding v2 (#7820)
* test: integration test for tracing of queries to the ingester

* chore: add FlightFrameEncodeRecorder to record spans per each polling result

* refactor(trace): impl TraceCollector for Arc

Allow any Arc-wrapped TraceCollector implementation to be used as a
TraceCollector. This avoids needing to as_any() and downcast later.

* test: assert FlightFrameEncodeRecorder trace spans

This test exercises the FlightDataEncoder wrapped with the trace
decorator (FlightFrameEncodeRecorder) when executing against a data
source that yields data after varying numbers of Stream polls.

This test passing will validate the FlightFrameEncodeRecorder correctly
instruments the amount of time a client spends waiting on the
FlightDataEncoder to acquire or encode a protocol frame, but also
ensures the decorator correctly accounts for varying behaviours allowed
through the Stream abstraction. It does this by simulating a data source
that is not always immediately ready to provide data, such as a buffer
wrapped in a contended async mutex.

* refactor: move tracing decorator into separate mod

* fix: record spans

* refactor(test): update test

The frame encoder is not one-to-one - it emits two frames for the first
data payload, a schema and a payload. This commit updates the test to
account for it!

* refactor: remove unneeded mut ref, and use enum state method which panics when in a (should be unreachable) state

* chore: add more docs to FlightFrameEncodeRecorder and related

---------

Co-authored-by: Dom Dwyer <dom@itsallbroken.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-05-26 09:40:16 +00:00
Carol (Nichols || Goulding) c3117e7eb8
fix: Return 'already exists' errors from namespace and table gRPC APIs
When appropriate, rather than internal errors.
2023-05-25 13:19:33 -04:00
Marco Neumann bc18c6dc5f
refactor: re-land #7815. (#7852)
* refactor: consolidate pruning code

Let's have a single chunk pruning implementation in our code, not two.

Also removes a bit of crust from `QueryChunk` since it is technically no
longer responsible for pruning (this part has been pushed into the
querier for early pruning and bits for the `iox_query_influxrpc` for
some RPC shenanigans).

* test: regression test for incident

* fix: chunk pruning

* docs: add some test notes
2023-05-24 09:46:49 +00:00
Dom Dwyer e61fb3a78c
test: remove line numbers from asserts
I don't think the tests are that specific that they need to assert the
line.
2023-05-23 14:55:43 +02:00
Stuart Carnie d9feed3374
Merge branch 'main' into sgc/issue/7794_subquery_inconsistency 2023-05-23 09:52:28 +10:00
kodiakhq[bot] b9bcaf1aa0
Merge branch 'main' into savage/wal-regenerate-lp-cli-command 2023-05-22 16:18:44 +00:00
Marco Neumann 31b8813760
feat: hide `system.queries` table from prod by default (#7810)
Introduce a new header called `iox-debug` which when set enables certain
debug features. The first one will be the `system.queries` table which
is a process-local, namespace-scoped query log. In most prod setups this
is only useful for debugging and will confuse the user a lot because
when multiple queries are deployed then the K8s routing decides which
pod/process the users hits. This leads to an inconsistent view. However
the log is still useful for debugging.

This also wires the "debug header set" flag through the Flight ticket,
because JDBC proved (integration tests FTW!) that headers are only
passed to `GetFlightInfo` but not to `DoGet` and the ticket must encode
all the relevant information.

Closes #7119.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-05-22 12:29:24 +00:00