Commit Graph

647 Commits (dac0db21960c871c298924269d198a8b01849724)

Author SHA1 Message Date
Chunchun Ye c8242c7469
chore(cli): add `--partition-template` to `namespace create` (#8365)
* chore(cli): add `--partition-template` to namespace create

* chore: fix typo in doc for `PartitionTemplateConfig`

chore: add max limit 8 for partition template in doc

* chore: add e2e tests

* chore: fmt

* chore: add more e2e tests for namespace create with partition template

* chore: show doc comments in cli help interface

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-08-01 14:37:00 +00:00
Andrew Lamb de79619e71
chore: Update datafusion (#8355)
* chore: Update datafusion pin

* fix: Update for change in API

* chore: Update plan

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-07-31 15:41:00 +00:00
Carol (Nichols || Goulding) 4a9e76b8b7
feat: Make parquet_file.partition_id optional in the catalog (#8339)
* feat: Make parquet_file.partition_id optional in the catalog

This will acquire a short lock on the table in postgres, per:
<https://stackoverflow.com/questions/52760971/will-making-column-nullable-lock-the-table-for-reads>

This allows us to persist data for new partitions and associate the
Parquet file catalog records with the partition records using only the
partition hash ID, rather than both that are used now.

* fix: Support transition partition ID in the catalog service

* fix: Use transition partition ID in import/export

This commit also removes support for the `--partition-id` flag of the
`influxdb_iox remote store get-table` command, which Andrew approved.

The `--partition-id` filter was getting the results of the catalog gRPC
service's query for Parquet files of a table and then keeping only the
files whose partition IDs matched. The gRPC query is no longer returning
the partition ID from the Parquet file table, and really, this command
should instead be using `GetParquetFilesByPartitionId` to only request
what's needed rather than filtering.

* feat: Support looking up Parquet files by either kind of Partition id

Regardless of which is actually stored on the Parquet file record.

That is, say there's a Partition in the catalog with:

Partition {
    id: 3,
    hash_id: abcdefg,
}

and a Parquet file that has:

ParquetFile {
    partition_hash_id: abcdefg,
}

calling `list_by_partition_not_to_delete(PartitionId(3))` should still
return this Parquet file because it is associated with the partition
that has ID 3.

This is important for the compactor, which is currently only dealing in
PartitionIds, and I'd like to keep it that way for now to avoid having
to change Even More in this PR.

* fix: Use and set new partition ID fields everywhere they want to be

---------

Co-authored-by: Dom <dom@itsallbroken.com>
2023-07-31 12:40:56 +00:00
NGA-TRAN afd5b12324 chore: address review comments and fix tests due to previous commit 2023-07-26 16:40:54 -04:00
NGA-TRAN 1ddc64d68d test: modify and add tests check valid and invalid strftime 2023-07-26 11:44:13 -04:00
NGA-TRAN e6cf9c9d61 fix: rename namespaces of differnet tests to avoid test failures 2023-07-25 17:07:28 -04:00
NGA-TRAN 44e1c1abdb feat: implement partition templpate as json and more tests as well as verify the partition key after inserting data 2023-07-25 16:51:57 -04:00
NGA-TRAN 144778430e Merge branch 'main' into ntran/table_cli 2023-07-21 14:49:02 -04:00
NGA-TRAN 2aff6a7495 chore: remove unused comments 2023-07-21 14:34:19 -04:00
NGA-TRAN a340fd5a6a feat: create table CLI 2023-07-21 14:17:42 -04:00
Martin Hilton b1c695d5a2
fix(influxql): fill count aggregates with 0 by default (#8284)
* chore: update expected output for `COUNT` aggregates with `FILL(null)`

See #8232

* fix(influxql): fill count aggregates with 0 by default

When gap-filling a COUNT aggregate any missing rows should be filled
with 0, unless otherwise directed by a FILL clause. To do this the
projection on the aggregate plan is modiefied to coalesce any COUNT
fields with 0 unless a FILL value has been specified in the query.

* chore: add more tests

* chore: add explanation of COUNT gap filling with multiple measurements

* fix: update test introduced with merge

---------

Co-authored-by: Stuart Carnie <stuart.carnie@gmail.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-07-21 16:31:10 +00:00
Martin Hilton 5731e012bf
fix(influxql): advanced syntax window functions with selector aggregates (#8303)
Ensure that advanced syntax window functions that contain a selector,
rather than an aggregate, function are considered valid and generate
a correct plan.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-07-21 14:48:15 +00:00
Andrew Lamb 3eb48ef210
chore: Update datafusion again (#8247)
* chore: Update datafusion to get new grouping

* chore: Update for new API

* chore: update tests

* fix: new API

* fix: state type

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-07-21 11:20:36 +00:00
Andrew Lamb ac9d1946e9
fix: add retry loop to avoid CI flake in build-catalog test (#8271)
* fix: add retry loop to avoid CI flake in build-catalog test

* fix: Update influxdb_iox/tests/end_to_end_cases/debug.rs

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-07-21 10:14:06 +00:00
Joe-Blount 1bed99567c
chore: add DF metrics to compaction spans (#8270)
* chore: add DF metrics to compaction spans

* chore: update string for test verification

* chore: update comment

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-07-20 15:00:22 +00:00
Christopher M. Wolff 668a1c3d8e
fix: aggregate fns called on tags should return null (#8274)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-07-20 14:55:16 +00:00
Marco Neumann 6ae9143742
fix: `end_to_end_cases::cli::query_ingester` flakyness (#8281)
While I cannot reproduce the CI flakyness locally (probably because the
local system is fast enough), looking at the test convinced me that the
ingester should not persist.

Closes #8245.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-07-20 11:48:59 +00:00
Martin Hilton d1640bb926
feat(influxql): CUMULATIVE_SUM window function (#8248)
* feat(influxql): CUMULATIVE_SUM window function

Implement the InfluxQL CUMULATIVE_SUM window function. This is
implemented as described in
https://docs.influxdata.com/influxdb/v1.8/query_language/functions/#cumulative_sum.

* chore: Add a test demonstrating NULL handling of CUMULATIVE_SUM

---------

Co-authored-by: Stuart Carnie <stuart.carnie@gmail.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-07-18 06:13:58 +00:00
Christopher M. Wolff 33e41fc5cb
fix: improve error for malformed gap fill query (#8252)
* fix: improve error for malformed gap fill query

* fix: code review feedback
2023-07-17 21:20:34 +00:00
Christopher M. Wolff b916a89159
fix: recurse through SubqueryAlias when finding gap fill time range (#8249) 2023-07-17 19:39:30 +00:00
Joe-Blount 85a9e13262
Merge branch 'main' into jrb_63_compactor_spans 2023-07-17 09:52:27 -05:00
Christopher M. Wolff 85f03acbdf
fix: correctly catch field/tag discrepancy (#8234)
Co-authored-by: Marco Neumann <marco@crepererum.net>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-07-14 18:21:56 +00:00
Joe-Blount 803122e3b4 Merge remote-tracking branch 'origin/main' into jrb_63_compactor_spans
# Conflicts:
#	compactor/src/driver.rs
2023-07-13 08:54:22 -05:00
Andrew Lamb 9bfec2f77c
fix: ignore flaky test while it is debugged (#8227)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-07-13 10:27:13 +00:00
Andrew Lamb 48a5c3e966
chore: Add longer sleep in `end_to_end_cases::debug::build_catalog` and extra logging (#8224)
* fix: Add longer sleep in end_to_end_cases::debug::build_catalog

* chore: add debug logging when test fails

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-07-13 10:14:59 +00:00
Joe-Blount c5a4912399 chore: add compactor tracing test case 2023-07-11 10:43:09 -05:00
Martin Hilton 9111cd517f
feat(influxql): PERCENTILE function (#8187)
* feat(influxql): support TOP and BOTTOM  functions

Add support for the TOP and BOTTOM functions which return the first
n rows in some ordered data set.

* fix: clippy

* refactor(influxql): use window aggregates for selectors

Change the implentation of ProjectionType::Selector to use a window
aggregate, rather than an aggregate with a custom selector function.
This is in preparation for implementing PERCENTILE.

* feat(influxql): PERCENTILE selector

Add a selector for the row containing the nth percentile of a
partition. This is the behaviour used when a single selector function
is used in an influxql query.

* feat(influxql): PERCENTILE aggregator

Add the PERCENTILE aggregation function for when the PERCENTILE
function is used in an aggregating projection. This implementation
buffers all non-null field values in memory in order to perform the
operation and therefore could be an expensive operation. This is
necessary for compatibility with earlier influxdb versions.

* refactor(influxql): move PERCENTILE implementation out of plan

The plan module is getting rather full of user-defined function
implementations. This breaks the new functions used to implement
percentile into some new top-level modules for aggregate and window
UDFs.

* fix: doc-lint

* chore: refactor `find_enumerated`

* chore: use `s` in format string

* chore: include the unexpected selector function in the error

* chore(influxql): review suggestions

Added some addition comments to help understanding.

Changed the handling os slector functions such that FIRST, LAST,
MAX & MIN behave the same as they did before PERCENTILE was added.

* chore(influxql): make percent_row_number a window UDF

Now that user-defined window functions are available make the
percent_row_number function be one of those. this allows the values
to be calculated for the entire window partition in one go.

For some reason the user-defined window function cannot return NULL
values. This function uses 0 where it would otherwise use NULL, as
row numbering starts at 1.

---------

Co-authored-by: Stuart Carnie <stuart.carnie@gmail.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-07-11 05:33:16 +00:00
Fraser Savage dec0244bff
refactor(e2e): Wait 100ms between queries in debug::build_catalog test 2023-07-10 15:27:30 +01:00
Fraser Savage 0978aa0551
fix(e2e): Add small busy-loop to debug::build_catalog test to assert only on non-empty results 2023-07-10 15:13:37 +01:00
Andrew Lamb 3ce11d8d66
chore: Update DataFusion (#8190)
* chore: Update DataFusion

* chore: Run cargo hakari tasks

* fix: Update for API changes

* fix: use display format

* chore: Update explain plan output

* fix: update plans

---------

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-07-10 09:54:50 +00:00
Andrew Lamb 048fc32bd5
feat: add `influxdb_iox debug build-catalog` command (#8067)
* feat: add `influxdb_iox debug build-catalog` command

* fix: tests

* fix: Use info! logs instead of println for status

* fix: Set partition_hash_id as well

* fix: remove leftover code

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-07-07 18:32:27 +00:00
Stuart Carnie 1ca547b313
fix: Teach planner to rewrite binary expressions for div operator
Specifically when the operands are integers, to match InfluxQL OG
2023-07-07 11:22:03 +10:00
Martin Hilton dfffdc1d90
feat(influxql): support TOP and BOTTOM functions (#8143)
* refactor(iox_query_influxql): expand select projection

Change the SELECT projection in the planner to make it clearer how
each projection type works.

* feat(influxql): support TOP and BOTTOM  functions

Add support for the TOP and BOTTOM functions which return the first
n rows in some ordered data set.

* fix: clippy

* chore: Use array / slice destructuring

* chore: review suggestion in iox_query_influxql/src/plan/planner.rs

Co-authored-by: Stuart Carnie <stuart.carnie@gmail.com>

---------

Co-authored-by: Stuart Carnie <stuart.carnie@gmail.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-07-06 07:08:45 +00:00
Marco Neumann 70b44f78ee
test: correctly decode ingester reponses in end2end tests 2023-07-03 17:25:01 +02:00
Marco Neumann b1a4e3955e
test: `ingester_partition_pruning` must perform type coercion 2023-07-03 17:25:00 +02:00
Carol (Nichols || Goulding) cd28bf0337
test: Query an ingester with a predicate that should prune partitions 2023-07-03 17:24:58 +02:00
Dom Dwyer e5a9e1534a
test: assert 1 file persisted
There should be a single file persisted during graceful shutdown.
2023-07-03 15:51:02 +02:00
Dom Dwyer 5d0c172e61
test(e2e): query shutdown-persisted files
Ensure buffered ingester data is persisted and remains queryable after a
graceful ingester shutdown.
2023-07-03 15:51:02 +02:00
Marco Neumann 4638b89d93
refactor: migrate retention to proper predicates (#8092)
Do not (ab)use per-chunk delete predicates for the retention policy.
Instead use a per-table predicate.

This makes the code way cleaner, since the scoping is correct (i.e.
delete predicates are a table-wide attribute, not a chunk-based one) and
it is consistent time predicates that the user providers (e.g. via
`WHERE time > x`).

It also allows us to remove delete predicates (in their current,
non-scalable form) from the query path. A potential future version would
likely not use per chunk predicates (and "is processed" markers) but use
the timestamp / chunk order to determine to which data the predicate
should be applied.

Note that the lowering of the retention policy changed slightly from

```text
(time > (now() - retention)) AND (time < MAX)
```

to

```text
time > (now() - retention)
```

Since the `MAX` cut is just an artifact of the lowering and was unnecessary.

Closes #7409.
Closes #7410.
2023-06-29 08:36:37 +00:00
Martin Hilton 511a0bae78
feat(influxql): add derivative and non_negative_derivative (#8103)
Add the DERIVATIVE and NON_NEGATIVE_DERIVATIVE functions to influxql.
These are used to calculate derivatives over arbitrary time units.
The implementation is modeled after the DIFFERENCE and
NON_NEGATIVE_DIFFERENCE functions, with a difference that the unit
parameters is a configuration of the user-defined aggregator function
and therefore there cannot be a single shared definition of the
function.

The NON_NEGATIVE_DIFFERENCE function implementation has been
refactored to be an arbitrary NON_NEGATIVE wrapper for any Accumulator
function.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-29 05:53:18 +00:00
Marco Neumann 178483c1a0
feat: basic non-aggregates w/ InfluxQL selector functions (#8016)
* test: ensure that selectors check arg count

* feat: basic non-aggregates w/ InfluxQL selector functions

See #7533.

* refactor: clean up code

* feat: get more advanced cases to work

* docs: remove stale comments

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-23 08:05:50 +00:00
Stuart Carnie 7b4a1a0660
chore: PR feedback
Add tests for fewer rows than N for `moving_average`

See: https://github.com/influxdata/influxdb_iox/pull/8023#discussion_r1237298376
2023-06-22 12:15:47 +10:00
Stuart Carnie 13726c2a76
Merge branch 'main' into sgc/issue/7600_moving_average 2023-06-22 10:10:22 +10:00
Marco Neumann 83a5037e61
feat: query support for custom partitioning (#8025)
* feat: querier-specific stat creation routine

* feat: prune querier chunks using partition col ranges

* feat: add table client

* test: custom partitioning

* fix: correctly set up stats for chunks with col subsets

* fix: flaky test

* refactor: remove obsolete dead_code markers

* feat: add partition template to `create_namespace`

* test: extend custom partitioning end2end tests

* fix: explain shuffling, make it actual deterministic
2023-06-21 09:03:19 +00:00
Stuart Carnie 2cbaf9cffa
chore: more tests, renamed avg_n → moving_average 2023-06-21 15:05:08 +10:00
Stuart Carnie edaac28498
Merge branch 'main' into sgc/issue/7600_moving_average 2023-06-21 11:39:06 +10:00
wiedld 34b5fadde0
refactor: move scheduler related configs to compactor_scheduler (#8013) 2023-06-20 09:55:35 -07:00
Stuart Carnie a2521bbf35
feat: moving_average, difference and non_negative_difference
There is a `todo` regarding `update_batch` to be discussed with @alamb
2023-06-20 16:37:28 +10:00
Stuart Carnie 8670b28445
Merge branch 'main' into sgc/issue/7600_moving_average 2023-06-18 09:41:19 +10:00
Andrew Lamb 5889c96501
chore: Update `datafusion` and other dependencies (#7981)
* chore: Update DatFaFusion pin

* chore: Update other dependencies

* chore: Update hakari

* fix: Update for API changes

* fix: Update explain plan

* fix: Update influxql plans

* fix: rustdoc links

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-16 09:48:55 +00:00