* chore(cli): add `--partition-template` to namespace create
* chore: fix typo in doc for `PartitionTemplateConfig`
chore: add max limit 8 for partition template in doc
* chore: add e2e tests
* chore: fmt
* chore: add more e2e tests for namespace create with partition template
* chore: show doc comments in cli help interface
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* chore: Update datafusion pin
* fix: Update for change in API
* chore: Update plan
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: Make parquet_file.partition_id optional in the catalog
This will acquire a short lock on the table in postgres, per:
<https://stackoverflow.com/questions/52760971/will-making-column-nullable-lock-the-table-for-reads>
This allows us to persist data for new partitions and associate the
Parquet file catalog records with the partition records using only the
partition hash ID, rather than both that are used now.
* fix: Support transition partition ID in the catalog service
* fix: Use transition partition ID in import/export
This commit also removes support for the `--partition-id` flag of the
`influxdb_iox remote store get-table` command, which Andrew approved.
The `--partition-id` filter was getting the results of the catalog gRPC
service's query for Parquet files of a table and then keeping only the
files whose partition IDs matched. The gRPC query is no longer returning
the partition ID from the Parquet file table, and really, this command
should instead be using `GetParquetFilesByPartitionId` to only request
what's needed rather than filtering.
* feat: Support looking up Parquet files by either kind of Partition id
Regardless of which is actually stored on the Parquet file record.
That is, say there's a Partition in the catalog with:
Partition {
id: 3,
hash_id: abcdefg,
}
and a Parquet file that has:
ParquetFile {
partition_hash_id: abcdefg,
}
calling `list_by_partition_not_to_delete(PartitionId(3))` should still
return this Parquet file because it is associated with the partition
that has ID 3.
This is important for the compactor, which is currently only dealing in
PartitionIds, and I'd like to keep it that way for now to avoid having
to change Even More in this PR.
* fix: Use and set new partition ID fields everywhere they want to be
---------
Co-authored-by: Dom <dom@itsallbroken.com>
* chore: update expected output for `COUNT` aggregates with `FILL(null)`
See #8232
* fix(influxql): fill count aggregates with 0 by default
When gap-filling a COUNT aggregate any missing rows should be filled
with 0, unless otherwise directed by a FILL clause. To do this the
projection on the aggregate plan is modiefied to coalesce any COUNT
fields with 0 unless a FILL value has been specified in the query.
* chore: add more tests
* chore: add explanation of COUNT gap filling with multiple measurements
* fix: update test introduced with merge
---------
Co-authored-by: Stuart Carnie <stuart.carnie@gmail.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Ensure that advanced syntax window functions that contain a selector,
rather than an aggregate, function are considered valid and generate
a correct plan.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* chore: Update datafusion to get new grouping
* chore: Update for new API
* chore: update tests
* fix: new API
* fix: state type
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* fix: add retry loop to avoid CI flake in build-catalog test
* fix: Update influxdb_iox/tests/end_to_end_cases/debug.rs
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
While I cannot reproduce the CI flakyness locally (probably because the
local system is fast enough), looking at the test convinced me that the
ingester should not persist.
Closes#8245.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat(influxql): CUMULATIVE_SUM window function
Implement the InfluxQL CUMULATIVE_SUM window function. This is
implemented as described in
https://docs.influxdata.com/influxdb/v1.8/query_language/functions/#cumulative_sum.
* chore: Add a test demonstrating NULL handling of CUMULATIVE_SUM
---------
Co-authored-by: Stuart Carnie <stuart.carnie@gmail.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat(influxql): support TOP and BOTTOM functions
Add support for the TOP and BOTTOM functions which return the first
n rows in some ordered data set.
* fix: clippy
* refactor(influxql): use window aggregates for selectors
Change the implentation of ProjectionType::Selector to use a window
aggregate, rather than an aggregate with a custom selector function.
This is in preparation for implementing PERCENTILE.
* feat(influxql): PERCENTILE selector
Add a selector for the row containing the nth percentile of a
partition. This is the behaviour used when a single selector function
is used in an influxql query.
* feat(influxql): PERCENTILE aggregator
Add the PERCENTILE aggregation function for when the PERCENTILE
function is used in an aggregating projection. This implementation
buffers all non-null field values in memory in order to perform the
operation and therefore could be an expensive operation. This is
necessary for compatibility with earlier influxdb versions.
* refactor(influxql): move PERCENTILE implementation out of plan
The plan module is getting rather full of user-defined function
implementations. This breaks the new functions used to implement
percentile into some new top-level modules for aggregate and window
UDFs.
* fix: doc-lint
* chore: refactor `find_enumerated`
* chore: use `s` in format string
* chore: include the unexpected selector function in the error
* chore(influxql): review suggestions
Added some addition comments to help understanding.
Changed the handling os slector functions such that FIRST, LAST,
MAX & MIN behave the same as they did before PERCENTILE was added.
* chore(influxql): make percent_row_number a window UDF
Now that user-defined window functions are available make the
percent_row_number function be one of those. this allows the values
to be calculated for the entire window partition in one go.
For some reason the user-defined window function cannot return NULL
values. This function uses 0 where it would otherwise use NULL, as
row numbering starts at 1.
---------
Co-authored-by: Stuart Carnie <stuart.carnie@gmail.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: add `influxdb_iox debug build-catalog` command
* fix: tests
* fix: Use info! logs instead of println for status
* fix: Set partition_hash_id as well
* fix: remove leftover code
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* refactor(iox_query_influxql): expand select projection
Change the SELECT projection in the planner to make it clearer how
each projection type works.
* feat(influxql): support TOP and BOTTOM functions
Add support for the TOP and BOTTOM functions which return the first
n rows in some ordered data set.
* fix: clippy
* chore: Use array / slice destructuring
* chore: review suggestion in iox_query_influxql/src/plan/planner.rs
Co-authored-by: Stuart Carnie <stuart.carnie@gmail.com>
---------
Co-authored-by: Stuart Carnie <stuart.carnie@gmail.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Do not (ab)use per-chunk delete predicates for the retention policy.
Instead use a per-table predicate.
This makes the code way cleaner, since the scoping is correct (i.e.
delete predicates are a table-wide attribute, not a chunk-based one) and
it is consistent time predicates that the user providers (e.g. via
`WHERE time > x`).
It also allows us to remove delete predicates (in their current,
non-scalable form) from the query path. A potential future version would
likely not use per chunk predicates (and "is processed" markers) but use
the timestamp / chunk order to determine to which data the predicate
should be applied.
Note that the lowering of the retention policy changed slightly from
```text
(time > (now() - retention)) AND (time < MAX)
```
to
```text
time > (now() - retention)
```
Since the `MAX` cut is just an artifact of the lowering and was unnecessary.
Closes#7409.
Closes#7410.
Add the DERIVATIVE and NON_NEGATIVE_DERIVATIVE functions to influxql.
These are used to calculate derivatives over arbitrary time units.
The implementation is modeled after the DIFFERENCE and
NON_NEGATIVE_DIFFERENCE functions, with a difference that the unit
parameters is a configuration of the user-defined aggregator function
and therefore there cannot be a single shared definition of the
function.
The NON_NEGATIVE_DIFFERENCE function implementation has been
refactored to be an arbitrary NON_NEGATIVE wrapper for any Accumulator
function.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* test: ensure that selectors check arg count
* feat: basic non-aggregates w/ InfluxQL selector functions
See #7533.
* refactor: clean up code
* feat: get more advanced cases to work
* docs: remove stale comments
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>