* chore: update expected output for `COUNT` aggregates with `FILL(null)`
See #8232
* fix(influxql): fill count aggregates with 0 by default
When gap-filling a COUNT aggregate any missing rows should be filled
with 0, unless otherwise directed by a FILL clause. To do this the
projection on the aggregate plan is modiefied to coalesce any COUNT
fields with 0 unless a FILL value has been specified in the query.
* chore: add more tests
* chore: add explanation of COUNT gap filling with multiple measurements
* fix: update test introduced with merge
---------
Co-authored-by: Stuart Carnie <stuart.carnie@gmail.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Ensure that advanced syntax window functions that contain a selector,
rather than an aggregate, function are considered valid and generate
a correct plan.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat(influxql): CUMULATIVE_SUM window function
Implement the InfluxQL CUMULATIVE_SUM window function. This is
implemented as described in
https://docs.influxdata.com/influxdb/v1.8/query_language/functions/#cumulative_sum.
* chore: Add a test demonstrating NULL handling of CUMULATIVE_SUM
---------
Co-authored-by: Stuart Carnie <stuart.carnie@gmail.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat(influxql): support TOP and BOTTOM functions
Add support for the TOP and BOTTOM functions which return the first
n rows in some ordered data set.
* fix: clippy
* refactor(influxql): use window aggregates for selectors
Change the implentation of ProjectionType::Selector to use a window
aggregate, rather than an aggregate with a custom selector function.
This is in preparation for implementing PERCENTILE.
* feat(influxql): PERCENTILE selector
Add a selector for the row containing the nth percentile of a
partition. This is the behaviour used when a single selector function
is used in an influxql query.
* feat(influxql): PERCENTILE aggregator
Add the PERCENTILE aggregation function for when the PERCENTILE
function is used in an aggregating projection. This implementation
buffers all non-null field values in memory in order to perform the
operation and therefore could be an expensive operation. This is
necessary for compatibility with earlier influxdb versions.
* refactor(influxql): move PERCENTILE implementation out of plan
The plan module is getting rather full of user-defined function
implementations. This breaks the new functions used to implement
percentile into some new top-level modules for aggregate and window
UDFs.
* fix: doc-lint
* chore: refactor `find_enumerated`
* chore: use `s` in format string
* chore: include the unexpected selector function in the error
* chore(influxql): review suggestions
Added some addition comments to help understanding.
Changed the handling os slector functions such that FIRST, LAST,
MAX & MIN behave the same as they did before PERCENTILE was added.
* chore(influxql): make percent_row_number a window UDF
Now that user-defined window functions are available make the
percent_row_number function be one of those. this allows the values
to be calculated for the entire window partition in one go.
For some reason the user-defined window function cannot return NULL
values. This function uses 0 where it would otherwise use NULL, as
row numbering starts at 1.
---------
Co-authored-by: Stuart Carnie <stuart.carnie@gmail.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* refactor(iox_query_influxql): expand select projection
Change the SELECT projection in the planner to make it clearer how
each projection type works.
* feat(influxql): support TOP and BOTTOM functions
Add support for the TOP and BOTTOM functions which return the first
n rows in some ordered data set.
* fix: clippy
* chore: Use array / slice destructuring
* chore: review suggestion in iox_query_influxql/src/plan/planner.rs
Co-authored-by: Stuart Carnie <stuart.carnie@gmail.com>
---------
Co-authored-by: Stuart Carnie <stuart.carnie@gmail.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Do not (ab)use per-chunk delete predicates for the retention policy.
Instead use a per-table predicate.
This makes the code way cleaner, since the scoping is correct (i.e.
delete predicates are a table-wide attribute, not a chunk-based one) and
it is consistent time predicates that the user providers (e.g. via
`WHERE time > x`).
It also allows us to remove delete predicates (in their current,
non-scalable form) from the query path. A potential future version would
likely not use per chunk predicates (and "is processed" markers) but use
the timestamp / chunk order to determine to which data the predicate
should be applied.
Note that the lowering of the retention policy changed slightly from
```text
(time > (now() - retention)) AND (time < MAX)
```
to
```text
time > (now() - retention)
```
Since the `MAX` cut is just an artifact of the lowering and was unnecessary.
Closes#7409.
Closes#7410.
Add the DERIVATIVE and NON_NEGATIVE_DERIVATIVE functions to influxql.
These are used to calculate derivatives over arbitrary time units.
The implementation is modeled after the DIFFERENCE and
NON_NEGATIVE_DIFFERENCE functions, with a difference that the unit
parameters is a configuration of the user-defined aggregator function
and therefore there cannot be a single shared definition of the
function.
The NON_NEGATIVE_DIFFERENCE function implementation has been
refactored to be an arbitrary NON_NEGATIVE wrapper for any Accumulator
function.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* test: ensure that selectors check arg count
* feat: basic non-aggregates w/ InfluxQL selector functions
See #7533.
* refactor: clean up code
* feat: get more advanced cases to work
* docs: remove stale comments
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* chore: Update DataFusion pin
* chore: Update cargo
* fix: update for API changes
* fix: Update plans
* chore: Update for new api
* fix: Update plans
* chore: Update for API changes more
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
If the test setup calls `Step::Persist` to persist on-demand, that
means it shouldn't be used with `ChunkStage::Parquet`, which tries to
persist as fast as possible. This will fail the test with a hopefully
helpful message to prevent this.
* test: add dedup test for multiple partitions and ranges
* refactor: remove `RedudantSort` optimizer pass
Similar to #7807 this is now covered by DataFusion, as demonstrated by
the fact that all query tests (incl. explain tests) still pass.
The good thing is: passes that are no longer required don't require any
upstreaming, so this also closes#7411.
* test: reproducer for idpe_17556
* fix: `ParquetSortness` and partial opt
1. correctly handle cases where `ParquetSortness` would optimize one
child branch but not the other
2. handle cases where `ParquetSortness` recusion should stop a bit
clearer (using `TreeNodeRewriter`)
3. rename query tests to be a bit clearer
4. add test case with many (but not too many) duplicate files and an
ingester (basically a prod use case where the compactor is slightly
behind)
---------
Co-authored-by: Marco Neumann <marco@crepererum.net>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* test: add tests for the desired contract for parsing measurements from line protocol
* fix: restrict null chars in measurement
* chore: make an explicit Measurement type
* refactor: have iox lp parser match influxdb contract, for acceptance of eq in measurements
* test: create end_to_end test to confirm same write-then-read behavior with `=` in measurements, is the same as influxdb