Commit Graph

53 Commits (4fa6ead27d3397b8c08467394efbe78d0e317d4f)

Author SHA1 Message Date
Martin Hilton b1c695d5a2
fix(influxql): fill count aggregates with 0 by default (#8284)
* chore: update expected output for `COUNT` aggregates with `FILL(null)`

See #8232

* fix(influxql): fill count aggregates with 0 by default

When gap-filling a COUNT aggregate any missing rows should be filled
with 0, unless otherwise directed by a FILL clause. To do this the
projection on the aggregate plan is modiefied to coalesce any COUNT
fields with 0 unless a FILL value has been specified in the query.

* chore: add more tests

* chore: add explanation of COUNT gap filling with multiple measurements

* fix: update test introduced with merge

---------

Co-authored-by: Stuart Carnie <stuart.carnie@gmail.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-07-21 16:31:10 +00:00
Martin Hilton 5731e012bf
fix(influxql): advanced syntax window functions with selector aggregates (#8303)
Ensure that advanced syntax window functions that contain a selector,
rather than an aggregate, function are considered valid and generate
a correct plan.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-07-21 14:48:15 +00:00
Christopher M. Wolff 668a1c3d8e
fix: aggregate fns called on tags should return null (#8274)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-07-20 14:55:16 +00:00
Martin Hilton d1640bb926
feat(influxql): CUMULATIVE_SUM window function (#8248)
* feat(influxql): CUMULATIVE_SUM window function

Implement the InfluxQL CUMULATIVE_SUM window function. This is
implemented as described in
https://docs.influxdata.com/influxdb/v1.8/query_language/functions/#cumulative_sum.

* chore: Add a test demonstrating NULL handling of CUMULATIVE_SUM

---------

Co-authored-by: Stuart Carnie <stuart.carnie@gmail.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-07-18 06:13:58 +00:00
Christopher M. Wolff 33e41fc5cb
fix: improve error for malformed gap fill query (#8252)
* fix: improve error for malformed gap fill query

* fix: code review feedback
2023-07-17 21:20:34 +00:00
Christopher M. Wolff b916a89159
fix: recurse through SubqueryAlias when finding gap fill time range (#8249) 2023-07-17 19:39:30 +00:00
Christopher M. Wolff 85f03acbdf
fix: correctly catch field/tag discrepancy (#8234)
Co-authored-by: Marco Neumann <marco@crepererum.net>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-07-14 18:21:56 +00:00
Martin Hilton 9111cd517f
feat(influxql): PERCENTILE function (#8187)
* feat(influxql): support TOP and BOTTOM  functions

Add support for the TOP and BOTTOM functions which return the first
n rows in some ordered data set.

* fix: clippy

* refactor(influxql): use window aggregates for selectors

Change the implentation of ProjectionType::Selector to use a window
aggregate, rather than an aggregate with a custom selector function.
This is in preparation for implementing PERCENTILE.

* feat(influxql): PERCENTILE selector

Add a selector for the row containing the nth percentile of a
partition. This is the behaviour used when a single selector function
is used in an influxql query.

* feat(influxql): PERCENTILE aggregator

Add the PERCENTILE aggregation function for when the PERCENTILE
function is used in an aggregating projection. This implementation
buffers all non-null field values in memory in order to perform the
operation and therefore could be an expensive operation. This is
necessary for compatibility with earlier influxdb versions.

* refactor(influxql): move PERCENTILE implementation out of plan

The plan module is getting rather full of user-defined function
implementations. This breaks the new functions used to implement
percentile into some new top-level modules for aggregate and window
UDFs.

* fix: doc-lint

* chore: refactor `find_enumerated`

* chore: use `s` in format string

* chore: include the unexpected selector function in the error

* chore(influxql): review suggestions

Added some addition comments to help understanding.

Changed the handling os slector functions such that FIRST, LAST,
MAX & MIN behave the same as they did before PERCENTILE was added.

* chore(influxql): make percent_row_number a window UDF

Now that user-defined window functions are available make the
percent_row_number function be one of those. this allows the values
to be calculated for the entire window partition in one go.

For some reason the user-defined window function cannot return NULL
values. This function uses 0 where it would otherwise use NULL, as
row numbering starts at 1.

---------

Co-authored-by: Stuart Carnie <stuart.carnie@gmail.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-07-11 05:33:16 +00:00
Andrew Lamb 3ce11d8d66
chore: Update DataFusion (#8190)
* chore: Update DataFusion

* chore: Run cargo hakari tasks

* fix: Update for API changes

* fix: use display format

* chore: Update explain plan output

* fix: update plans

---------

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-07-10 09:54:50 +00:00
Stuart Carnie 1ca547b313
fix: Teach planner to rewrite binary expressions for div operator
Specifically when the operands are integers, to match InfluxQL OG
2023-07-07 11:22:03 +10:00
Martin Hilton dfffdc1d90
feat(influxql): support TOP and BOTTOM functions (#8143)
* refactor(iox_query_influxql): expand select projection

Change the SELECT projection in the planner to make it clearer how
each projection type works.

* feat(influxql): support TOP and BOTTOM  functions

Add support for the TOP and BOTTOM functions which return the first
n rows in some ordered data set.

* fix: clippy

* chore: Use array / slice destructuring

* chore: review suggestion in iox_query_influxql/src/plan/planner.rs

Co-authored-by: Stuart Carnie <stuart.carnie@gmail.com>

---------

Co-authored-by: Stuart Carnie <stuart.carnie@gmail.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-07-06 07:08:45 +00:00
Marco Neumann 4638b89d93
refactor: migrate retention to proper predicates (#8092)
Do not (ab)use per-chunk delete predicates for the retention policy.
Instead use a per-table predicate.

This makes the code way cleaner, since the scoping is correct (i.e.
delete predicates are a table-wide attribute, not a chunk-based one) and
it is consistent time predicates that the user providers (e.g. via
`WHERE time > x`).

It also allows us to remove delete predicates (in their current,
non-scalable form) from the query path. A potential future version would
likely not use per chunk predicates (and "is processed" markers) but use
the timestamp / chunk order to determine to which data the predicate
should be applied.

Note that the lowering of the retention policy changed slightly from

```text
(time > (now() - retention)) AND (time < MAX)
```

to

```text
time > (now() - retention)
```

Since the `MAX` cut is just an artifact of the lowering and was unnecessary.

Closes #7409.
Closes #7410.
2023-06-29 08:36:37 +00:00
Martin Hilton 511a0bae78
feat(influxql): add derivative and non_negative_derivative (#8103)
Add the DERIVATIVE and NON_NEGATIVE_DERIVATIVE functions to influxql.
These are used to calculate derivatives over arbitrary time units.
The implementation is modeled after the DIFFERENCE and
NON_NEGATIVE_DIFFERENCE functions, with a difference that the unit
parameters is a configuration of the user-defined aggregator function
and therefore there cannot be a single shared definition of the
function.

The NON_NEGATIVE_DIFFERENCE function implementation has been
refactored to be an arbitrary NON_NEGATIVE wrapper for any Accumulator
function.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-29 05:53:18 +00:00
Marco Neumann 178483c1a0
feat: basic non-aggregates w/ InfluxQL selector functions (#8016)
* test: ensure that selectors check arg count

* feat: basic non-aggregates w/ InfluxQL selector functions

See #7533.

* refactor: clean up code

* feat: get more advanced cases to work

* docs: remove stale comments

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-23 08:05:50 +00:00
Stuart Carnie 7b4a1a0660
chore: PR feedback
Add tests for fewer rows than N for `moving_average`

See: https://github.com/influxdata/influxdb_iox/pull/8023#discussion_r1237298376
2023-06-22 12:15:47 +10:00
Stuart Carnie 13726c2a76
Merge branch 'main' into sgc/issue/7600_moving_average 2023-06-22 10:10:22 +10:00
Marco Neumann 83a5037e61
feat: query support for custom partitioning (#8025)
* feat: querier-specific stat creation routine

* feat: prune querier chunks using partition col ranges

* feat: add table client

* test: custom partitioning

* fix: correctly set up stats for chunks with col subsets

* fix: flaky test

* refactor: remove obsolete dead_code markers

* feat: add partition template to `create_namespace`

* test: extend custom partitioning end2end tests

* fix: explain shuffling, make it actual deterministic
2023-06-21 09:03:19 +00:00
Stuart Carnie 2cbaf9cffa
chore: more tests, renamed avg_n → moving_average 2023-06-21 15:05:08 +10:00
Stuart Carnie a2521bbf35
feat: moving_average, difference and non_negative_difference
There is a `todo` regarding `update_batch` to be discussed with @alamb
2023-06-20 16:37:28 +10:00
Stuart Carnie 8670b28445
Merge branch 'main' into sgc/issue/7600_moving_average 2023-06-18 09:41:19 +10:00
Andrew Lamb 5889c96501
chore: Update `datafusion` and other dependencies (#7981)
* chore: Update DatFaFusion pin

* chore: Update other dependencies

* chore: Update hakari

* fix: Update for API changes

* fix: Update explain plan

* fix: Update influxql plans

* fix: rustdoc links

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-16 09:48:55 +00:00
Stuart Carnie 2407be8062
feat: trialed retractable UDAF
Unfortunately, this is not suitable when the source data has nulls,
as InfluxQL OG ignores these values.
2023-06-16 13:10:47 +10:00
Andrew Lamb 17c0d837b3
chore: Update DataFusion, arrow, object_store pins (#7942)
* chore: Update DataFusion, arrow, object_store pins

* chore: Update for hakari

* chore: Update for new APIs

* fix: update test

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-07 17:08:31 +00:00
Stuart Carnie c18902b05e
Merge branch 'main' into sgc/issue/7829_time_bounds_3 2023-06-07 08:51:38 +10:00
Nga Tran a2f5f37b2e
test: turn interval 0 test on after upgrading DF with the fix (#7938)
* test: turn interval 0 test on after upgrading DF with the fix

* chore: remove obsolete comments
2023-06-06 15:50:54 +00:00
Stuart Carnie f114842711
feat: Push outer query time-range to subqueries
Added additional end-to-end tests to validate time-range behaviour
2023-06-06 16:33:01 +10:00
Stuart Carnie 9e2550c933
Merge branch 'main' into sgc/issue/7829_time_bounds_3
# Conflicts:
#	iox_query_influxql/src/plan/planner.rs
2023-06-06 12:55:43 +10:00
Andrew Lamb f571aeb445
chore: Update DataFusion pin (#7916)
* chore: Update DataFusion pin

* chore: Update cargo

* fix: update for API changes

* fix: Update plans

* chore: Update for new api

* fix: Update plans

* chore: Update for API changes more

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-05 18:38:59 +00:00
Stuart Carnie d8c2f2c679
refactor: Simplify `TimeRange` to match InfluxQL OG behaviour explicitly 2023-06-05 15:14:13 +10:00
Marco Neumann efbaf455a0
feat: `selector_first` with additional args (#7898)
* feat: `selector_first` with additional args

Foundation for #7533.

* test: `selector_first` malformed args

* docs: explain type handling
2023-06-02 10:08:21 +00:00
Nga Tran 21752cfb69
test: reproducer for panic bug attempt to calculate the remainder with a divisor of zero (#7903)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-01 15:43:24 +00:00
Stuart Carnie 600ed6652c
refactor: rewrite time-range expressions to a single range
Fixes gap filling, which was confused by multiple lower or upper
time bounds.
2023-05-30 15:46:45 +10:00
Christopher M. Wolff 2a07b53879
feat: add more tag predicate rewrite logic for InfluxQL (#7869)
* feat: add more tag predicate rewrite logic for InfluxQL

* chore: cargo fmt

* chore: fmt

* test: add more tests

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-05-26 21:53:52 +00:00
Stuart Carnie ed9a16c4ad
chore: Add test to validate compatibility 2023-05-22 16:23:21 +10:00
Christopher M. Wolff 90a25a3ff0
chore: update DataFusion (#7825)
* chore: update DataFusion

* chore: Run cargo hakari tasks

---------

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
2023-05-18 17:51:16 +00:00
Carol (Nichols || Goulding) 9cc2169ce2
fix: Rename Step::WaitForPersisted2 to Step::WaitForPersisted 2023-05-17 17:02:59 -04:00
Carol (Nichols || Goulding) 6785dcfd37
fix: Correct invalid test setups that the detector now detects 2023-05-17 17:00:17 -04:00
Carol (Nichols || Goulding) 45e47af974
test: Add an invalid test configuration checker
If the test setup calls `Step::Persist` to persist on-demand, that
means it shouldn't be used with `ChunkStage::Parquet`, which tries to
persist as fast as possible. This will fail the test with a hopefully
helpful message to prevent this.
2023-05-17 16:58:50 -04:00
Marco Neumann 62fed73bcd
refactor: upgrade DataFusion to `19b03240920ad63cac916b42951754c0337bdac8#19b03240920ad63cac916b42951754c0337bdac8` (#7813)
I need:

- https://github.com/apache/arrow-datafusion/pull/6226.

Changes in code due to:

- https://github.com/apache/arrow-datafusion/pull/6332

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-05-17 13:57:12 +00:00
Marco Neumann 7e64264eef
refactor: remove `RedudantSort` optimizer pass (#7809)
* test: add dedup test for multiple partitions and ranges

* refactor: remove `RedudantSort` optimizer pass

Similar to #7807 this is now covered by DataFusion, as demonstrated by
the fact that all query tests (incl. explain tests) still pass.

The good thing is: passes that are no longer required don't require any
upstreaming, so this also closes #7411.
2023-05-17 09:30:04 +00:00
Nga Tran ca12f1c03d
fix: correctly recurse in `ParquetSortness` (#7778)
* test: reproducer for idpe_17556

* fix: `ParquetSortness` and partial opt

1. correctly handle cases where `ParquetSortness` would optimize one
   child branch but not the other
2. handle cases where `ParquetSortness` recusion should stop a bit
   clearer (using `TreeNodeRewriter`)
3. rename query tests to be a bit clearer
4. add test case with many (but not too many) duplicate files and an
   ingester (basically a prod use case where the compactor is slightly
   behind)

---------

Co-authored-by: Marco Neumann <marco@crepererum.net>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-05-17 06:09:23 +00:00
Stuart Carnie d7ab96c879
Merge branch 'main' into sgc/issue/6879_subquery_01 2023-05-17 07:20:08 +10:00
wiedld a4ad4fe69e
fix(4895): handle measurement missing, null bytes, and `=` in measurement names (#7759)
* test: add tests for the desired contract for parsing measurements from line protocol
* fix: restrict null chars in measurement
* chore: make an explicit Measurement type
* refactor: have iox lp parser match influxdb contract, for acceptance of eq in measurements
* test: create end_to_end test to confirm same write-then-read behavior with `=` in measurements, is the same as influxdb
2023-05-16 10:48:39 -07:00
Stuart Carnie 5a813fb61f
chore: Simplify first queries 2023-05-16 10:18:34 +10:00
Stuart Carnie d2fe92f71e
chore: Add additional queries to be fixed by #7794 2023-05-16 10:05:04 +10:00
Stuart Carnie 7ba619a32b
feat: outer GROUP BY pushed down to subqueries; more Cloud 2 examples 2023-05-15 15:31:20 +10:00
Stuart Carnie 62a4c02836
feat: Handle default FILL behaviour for subqueries 2023-05-15 11:22:26 +10:00
Stuart Carnie c77c4b3d23
feat: support nested aggregate subqueries 2023-05-15 09:31:06 +10:00
Stuart Carnie 4e96f814db
chore: Improve docs 2023-05-15 07:21:36 +10:00
Stuart Carnie f4a19fc6c1
fix: Aggregate subqueries with push-down `GROUP BY tags` 2023-05-12 16:53:16 +10:00