Commit Graph

366 Commits (65230040083eeca3d8a91905567e7341bd671103)

Author SHA1 Message Date
Andrew Lamb 43e236e040
chore: Update datafusion again (#7353)
* chore: Update DataFusion

* refactor: Update predicate crate for new transform API

* refactor: Update iox_query crate for new APIs

* refactor: Update influxql for new API

* chore: Run cargo hakari tasks

---------

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-03-28 16:21:49 +00:00
Christopher M. Wolff dbf6493312
feat: add scalar function LOCF (#7347)
* feat: add scalar function LOCF

* chore: cargo update spin@0.9.6

Apparently this version was yanked
2023-03-28 14:35:27 +00:00
Marco Neumann 71b88b22b9
fix: ensure we don't loose predicates in chunk roundtrips (#7340)
`extract_chunks` never runs after predicate pushdown. However IF this
should ever happen, we would potentially forget the predicates attached
to `ParquetExec`. So let's make sure we refuse chunk extraction in this
case. This is similar to the existing behavior, i.e. we don't support
chunk extraction after filter pushdown (i.e. if there is a filter around
an `RecordBatchesExec`).

For #6098.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-03-27 11:18:56 +00:00
Christopher M. Wolff f73187ff7e
feat: add interpolation fill strategy to GapFillExec (#7317)
* feat: add interpolation fill strategy to GapFillExec

* chore: clippy

* chore: code review feedback

* chore: fix doc comments

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-03-24 18:53:14 +00:00
Andrew Lamb 5dd71998a1
chore: Update datafusion (#7318)
* chore: Update datafusion

* chore: Update for API change

* chore: Run cargo hakari tasks

---------

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-03-24 15:07:23 +00:00
Andrew Lamb 184565b552
feat(flightsql): Implement FlightSQL `GetSqlInfo` endpoint (#7198)
* feat(flightsql): Implement GetSqlInfo endpoint

* chore: Add some comments to clarify the tests intent
2023-03-20 19:34:18 +00:00
Christopher M. Wolff 866f9cefa1
feat: add null-as-missing gap filling (#7245)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-03-17 20:34:45 +00:00
Andrew Lamb 96c2094302
refactor(iox_query): extract influxrpc planner to its own crate (#7241)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-03-17 18:48:55 +00:00
Marco Neumann 20ec47b00b
feat: virtual chunk order col (#7240)
* feat: introduce `CHUNK_ORDER_COLUMN_NAME`

* feat: impl `ChunkOrder` everywhere

* feat: `ChunkOrder::get`

* feat: emit chunk order column for `RecordBatchesExec`

* feat: `chunk_order_field`

* feat: chunk order col for parquet chunks

* feat: optional chunk order col handling for dedup

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-03-17 09:39:21 +00:00
Marco Neumann 3e40de3cd4
feat: recover desired output sort in in `extract_chunks` (#7233)
This is helpful so that optimizer passes to forget the sort key, esp.
when the run after `DedupNullColumns` and `DedupSortOrder`.

For #6098.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-03-17 09:19:10 +00:00
Andrew Lamb 3fb4fad784
refactor(iox_query): Rename `prepare_sql` to `sql_to_physical_plan` (#7226)
* refactor(iox_query): Rename `prepare_sql` to `sql_to_physical_plan`

* fix: logical conflict
2023-03-16 19:12:15 +00:00
Andrew Lamb 7dfaa05e8a
chore: Update datafusion again (#7208)
* chore: update datafusion again

* fix: update test

* fix: use table_reference

* fix: clean up import

* chore: Run cargo hakari tasks

---------

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-03-16 14:34:40 +00:00
Marco Neumann 45d23f7652
refactor: `extract_chunks` return arrow schema (#7231)
Similar to #7217 there is no need to convert the arrow schema to an IOx
schema. This also makes it easier to handle the chunk order column in #6098.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-03-16 14:19:52 +00:00
Marco Neumann f128539f98
feat: more projection pushdown (#7218)
* feat: proj->proj pushdown

For #6098.

* feat: proj->SortPreservingMergeExec pushdown

For #6098.

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-03-16 08:59:48 +00:00
Marco Neumann 3a31f41c2c
refactor: use arrow schema in `chunks_to_physical_nodes` (#7217)
We don't need a validated IOx schema in this method. This will simplify
some work on #6098.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-03-16 08:45:14 +00:00
Andrew Lamb 6d6fd8f663
feat(flightsql): implement basic `CommandGetCatalogs` support (#7212)
* refactor: reduce redundancy in test

* chore: implement basic get_catalog support

* fix: clippy
2023-03-15 21:52:59 +00:00
Marco Neumann 393de6980e
feat: debug-log errors during chunk extraction (#7223)
Helps debugging while working on #6098 .
2023-03-15 18:55:33 +00:00
Christopher M. Wolff afb571a502
feat: implement gap fill with previous value (#7182)
* feat: implement gap fill with previous value

* test: update fill prev test to include null value

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-03-15 15:54:59 +00:00
Christopher M. Wolff 570c61f9a7
refactor: formalize abstraction for building gap filled columns (#7179)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-03-14 15:14:02 +00:00
Andrew Lamb 0eb858c70d
chore: Update datafusion (#7167)
* chore: Update datafusion

* chore: Update datafusion

* refactor: use UserDefinedLogicalNodeCore

* fix: remove stray comment

* fix: clippy

* chore: Run cargo hakari tasks

---------

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-03-13 16:41:32 +00:00
Christopher M. Wolff ffab683ead
refactor: move trailing_gaps bit into cursor (#7178)
* refactor: push trailing_gap bit into cursor

* chore: clippy

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-03-13 15:40:55 +00:00
Marco Neumann 737ea15d07
feat: projection pushdown phys. optimizer (#7161)
* feat: projection pushdown phys. optimizer

The is by far the largest pass (at least test-wise), because projections
are added last in the naive plan and you have to push them through
everything else. The actual code however isn't that complicated mostly
because we can reuse some DataFusion functionality and the different
variants for the different "child nodes" are very similar.

For #6098.

* feat: projection pushdown for `RecordBatchesExec`

* test: `test_ignore_when_partial_impure_projection_rename`

* test: more dedup projection tests

* test: integration
2023-03-13 12:59:45 +00:00
Marco Neumann 41802b7b5b
feat: `SchemaAdapterStream` may create virtual columns (#7173)
* feat: `SchemaAdapterStream` may create virtual columns

For chunk order handling in #6098.

* fix: improve `SchemaAdapterStream` docs and error handling
2023-03-13 10:02:13 +00:00
Carol (Nichols || Goulding) cc7c44f76a
chore: Upgrade to Rust 1.68 (#7175)
* chore: Upgrade to Rust 1.68

* fix: Remove unnecessary into_iter, thanks Clippy!

* fix: Use the size of the type, not a reference to the type... oops.

Thanks clippy!

* fix: Return block directly instead of creating a variable

Thanks clippy!

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-03-12 13:22:20 +00:00
Stuart Carnie fe48a685ec
refactor: Move InfluxQL behaviour from iox_query to new crate (#7156)
* refactor: Break unnecessary dependencies from `iox_query` crate

In the process, the test code has been simplified.

* refactor: Move InfluxQL plan module to iox_query_influxql crate

* refactor: Move remaining behaviour from iox_query to iox_query_influxql

* chore: rustfmt 🙄

I was under the impression `clippy` would catch formatting
2023-03-08 22:29:20 +00:00
Marco Neumann 309177b750
feat: phys. pred. pushdown to parquet (#7159)
For #6098.
2023-03-08 16:36:27 +00:00
Marco Neumann 3828d2a50e
chore: update DataFusion to `deeaa5632ed99a58b91767261570756db736d158` (#7158)
* chore: update DataFusion to `deeaa5632ed99a58b91767261570756db736d158`

I want to get pull:

- https://github.com/apache/arrow-datafusion/pull/5495

Changes in the IOx code base due to:

- https://github.com/apache/arrow-datafusion/pull/5423
- https://github.com/apache/arrow-datafusion/pull/5421
- https://github.com/apache/arrow-datafusion/pull/5450

* refactor: simplify expression simplifcation

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* refactor: remove upstreamed code

* test: update snapshots

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
2023-03-08 13:05:31 +00:00
Marco Neumann 58dad4cb01
feat: remove all-NULL columns from dedup (#7146)
For #6098.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-03-08 10:04:48 +00:00
Marco Neumann 81388e7ff2
feat: determine cheap de-dup sort order (#7147)
* feat: determine cheap de-dup sort order

For #6098.

* test: `test_three_chunks_different_subsets`

* fix: ensure that columns can be drawn early

* docs: improve algo explaination

* refactor: make code clearer
2023-03-08 09:50:07 +00:00
Christopher M. Wolff ff11fe465d
refactor: convert gap fill exec tests to use insta snapshots (#7154)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-03-08 01:53:15 +00:00
Stuart Carnie 2b74f07fe5
feat: Support `GROUP BY` with tags in raw `SELECT` queries (#7109)
* chore: Normalise name of Call expression to lowercase

Simplifies matching functions in planner, as they are guaranteed to be
lowercase.

This also ensures compatibility with InfluxQL when generating column
alias names, which are reflected in updated tests.

* chore: Ensure aggregate functions fail gracefully.

* feat: GROUP BY tag support

* feat: Ensure schema-level metadata is propagated

Requires: https://github.com/apache/arrow-rs/issues/3779

* chore: Add some tests to validate GROUP BY output

* chore: Add clarifying comment

* chore: Declare message in flight.proto

The metadata is public API, so best practice is to encode this in a way
that is most compatible for clients in other languages, and will also
document the history of schema changes.

Added tests to validate the metadata is encoded correctly.

* chore: Placate linters

* chore: Use correct column in test cases

* chore: Add `is_projected` to the TagKeyColumn message

`is_projected` is necessary to inform a client whether it should include
the tag key is used exclusively for the group key (false) or also
projected in the `SELECT` column list.

* refactor: Move constants to `schema` crate per PR feedback

* chore: rustfmt 🙄

* chore: Update docs for InfluxQlMetadata

Co-authored-by: Andrew Lamb <alamb@influxdata.com>

---------

Co-authored-by: Andrew Lamb <alamb@influxdata.com>
2023-03-07 22:40:23 +00:00
Christopher M. Wolff 3f3a47eae9
feat: add a type to characterize fill strategy (#7150)
* feat: add a type to characterize fill strategy

* chore: clippy and fix comment
2023-03-07 17:11:31 +00:00
Marco Neumann 91471fe568
fix: check schema when calculating sorting for `ParquetExec` (#7136)
When combining sort keys, we have to check the schema of the chunk to
differentiate between "column does not exist within this chunk" and
"column exists but is not sorted".

This is unlikely an issue in prod at the moment (if there is not bug in
the ingester or compactor), but this was found while working on tests
for #6098. Overall this should improve robustness.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-03-07 09:20:31 +00:00
Andrew Lamb ed0704ac8d
chore: Update datafusion (#7100)
* chore: Update datafusion

* chore: iox_query to compile for API changes + update tests

* chore: Run cargo hakari tasks

---------

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
2023-03-06 17:59:24 +00:00
Christopher M. Wolff c15d789613
fix: account for memory in GapFill operator (#7115)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-03-03 16:21:13 +00:00
dependabot[bot] 3256fcc72e
chore(deps): Bump object_store from 0.5.4 to 0.5.5
Bumps [object_store](https://github.com/apache/arrow-rs) from 0.5.4 to 0.5.5.
- [Release notes](https://github.com/apache/arrow-rs/releases)
- [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG-old.md)
- [Commits](https://github.com/apache/arrow-rs/compare/object_store_0.5.4...object_store_0.5.5)

---
updated-dependencies:
- dependency-name: object_store
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-03-03 02:00:51 +00:00
Marco Neumann 999a5dae03
refactor: sort key cleanups (#7113)
* refactor: remove unused `ColumnSort`

* refactor: remove invalid assertion

It is true that time SHOULD be the last sort key, but we absoletely
don't require that, esp. not in the query tier. The ingester will
currently always produce sort keys where time is last, but if we ever
going to deal w/ external data sources like bulk loaded parquet files,
this may not always be the case.

Found while constructing some edge case tests.

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-03-02 16:08:21 +00:00
Marco Neumann 2c4da24f73
feat: sort-related phys. optimizers (#7095)
* feat: `SortPushdown` optimizer

* feat: `RedundantSort` optimizer

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-03-02 14:20:13 +00:00
Andrew Lamb eb488cf55a
chore(compactor2): Improve documentation on split times (#7105)
* chore(compactor2): Improve documentation on split times

* fix: Apply suggestions from code review

Co-authored-by: Joe-Blount <73478756+Joe-Blount@users.noreply.github.com>

* fix: fmt

* fix: typo

---------

Co-authored-by: Joe-Blount <73478756+Joe-Blount@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-03-02 12:00:55 +00:00
dependabot[bot] c538cac4ef
chore(deps): Bump tokio from 1.25.0 to 1.26.0 (#7107)
* chore(deps): Bump tokio from 1.25.0 to 1.26.0

Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.25.0 to 1.26.0.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.25.0...tokio-1.26.0)

---
updated-dependencies:
- dependency-name: tokio
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore: Run cargo hakari tasks

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: Dom <dom@itsallbroken.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-03-02 09:50:39 +00:00
Marco Neumann c95d078e46
feat: add `NestedUnion` opt (#7092)
* docs: typo

* feat: add `NestedUnion` opt

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-03-02 09:09:05 +00:00
Christopher M. Wolff d1a54cf0d4
feat: allow no lower bound gap fill implementation (#7104)
* feat: allow no lower bound gap fill implementation

* chore: clippy

* refactor: code review feedback

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-03-01 23:32:57 +00:00
Marco Neumann 8f11372eac
feat: predicate pushdown phys. optimizer rule (#7083)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-03-01 09:44:57 +00:00
Marco Neumann 7fb562cb01
feat: "collect chunks" phys. optimizer rule (#7086)
* feat: "collect chunks" phys. optimizer rule

Required to clean up the plan a bit after all the dedup split and
removal passes.

For #6098.

* refactor: `collect` -> `combine`

* fix: submodule vis
2023-03-01 09:38:11 +00:00
Marco Neumann b85869778d
fix: `extract_chunks` schema handling (#7085)
I forgot that both `RecordBatchExec` and `ParquetExec` can have schemas
with more columns than the chunks they contain, i.e. both provide null
column creation. When extracting the schema for the chunks within a
plan, the full schemas should be preserved, otherwise the physical
optimizer rules will create invalid plan nodes (i.e. with missing
columns).

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-03-01 09:17:31 +00:00
Christopher M. Wolff 24b9bfacb5
refactor: rewrite gap filling code to be more intuitive and extendable (#7076)
* refactor: rewrite gap filling code to be more intuitive and extendable

* chore: address clippy issue
2023-02-28 22:18:52 +00:00
Marco Neumann 6d8fd37e26
feat: add "split dedup by time" optimizer rule (#7041)
* feat: add "split dedup by time" optimizer rule

For #6098.

* docs: fix typo

Co-authored-by: Andrew Lamb <alamb@influxdata.com>

* feat: add log messages for skipped optimizations

---------

Co-authored-by: Andrew Lamb <alamb@influxdata.com>
2023-02-28 11:29:42 +00:00
Marco Neumann 04f3296d7b
feat: add "remove de-duplication" optimizer pass (#7042)
For #6098.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-02-28 07:57:19 +00:00
Carol (Nichols || Goulding) faae5eb438 chore: Rerun cargo hakari manage-deps 2023-02-27 11:56:15 +01:00
Marco Neumann 8002d34fa2
feat: add "split dedup by partition" optimizer rule (#7020)
* feat: add "split dedup by partition" optimizer rule

- some additional testing infra
- includes config infra for optimizer passes
- not wired up yet since we still use the old plan generation

For #6098.

* refactor: change default and improve docs
2023-02-27 10:27:48 +00:00
Stuart Carnie 2ed5758ddb
feat: InfluxQL planner learns how to project multiple measurements (#7063)
* feat: Planner learns how to project multiple measurements

Closes #6896

* chore: Update test

* chore: PR feedback
2023-02-27 00:50:06 +00:00
Marco Neumann b76b75e911
fix: do not panic for unsupported expressions (#7052)
We see this in one of our prod clusters ATM.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-02-23 17:03:43 +00:00
Marco Neumann 08578cded5
refactor: n_threads and n_target_partitions are non-zero (#7047)
* refactor: n_threads and n_target_partitions are non-zero

Zero values will just panic. Prevent that earlier.

* fix: typo

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

---------

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2023-02-23 16:57:00 +00:00
Christopher M. Wolff 0282eb4750
feat: streaming implementation of gap filling (#7037)
* feat: streaming implementation of gap filling

* chore: cargo fmt

* refactor: when gapfilling, concatenate input batches less frequently

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-02-23 15:42:41 +00:00
Andrew Lamb 7e31b2638d
fix: Understandable compactor2 config report (#7028)
* fix: Understandable compactor2 config report

* fix: do not log postgres dsn

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-02-22 23:43:31 +00:00
Andrew Lamb f93baf7693
chore: Update DataFusion and `arrow` / `arrow-flight` / `parquet` to `33.0.0` (#7045)
* chore: Update DataFusion and arrow/arrow-flight/parquet to 33.0.0

* fix: Update test output

* fix: update more test output

* fix: Update querier test output

* chore: Run cargo hakari tasks

* test: fix formatting

Fix formatting of batch pretty printing.

* test: fix formatting

Fix formatting of batch pretty printing.

* test: fix formatting for selector tests

---------

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: Dom Dwyer <dom@itsallbroken.com>
Co-authored-by: Christopher Wolff <chris.wolff@influxdata.com>
2023-02-22 21:24:20 +00:00
Stuart Carnie 6fb93a7679
refactor: Make InfluxQL planning sync (#7038)
* refactor: Move statement parsing to separate fn

* refactor: Remove async from `InfluxQLToLogicalPlan`

Closes #6607

* chore: Remove async functions and tokio::test

* chore: Remove redundant attribute

* chore: Feedback, switch to dynamic dispatch vs generic implementation
2023-02-22 19:33:49 +00:00
Marco Neumann e9ec213b72
refactor: remove `TaskConfig` param from `chunks_to_physical_nodes` (#7019)
This makes it easier to use it from optimizer passes.

Ref #6098.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-02-22 09:19:59 +00:00
Stuart Carnie 929ac9081e
feat: Rewrite logical expression to match InfluxQL behaviour (#7031)
* chore: Move to inline snapshots

* chore: Container for the DataFusion and IOx schema

* chore: Simplify using logical expression helper functions

* feat: Rewrite conditional expressions using InfluxQL rules

* feat: Add tests to validation conditional expression rewriting

* feat: Rewrite column expressions

* chore: Rewrite expression to use false when possible

This allows the planner to optimise away the entire logical plan to an
empty plan in many cases.

* feat: Complete cast postfix operator support

Added `unsigned` postfix operator, as the feature was mostly complete.

Closes #6895

* chore: Remove redundant attribute
2023-02-21 20:01:31 +00:00
Marco Neumann bda2310ca1
feat: extract chunks from phys. plan (#7018)
* feat: extract chunks from phys. plan

For #6098.

* test: ensure that `extract_chunks` does NOT scan through other nodes
2023-02-17 11:41:39 +00:00
Marco Neumann a8feed120c
test: `chunks_to_physical_nodes` (#7013)
No new actual code but sets up some test infra that I need for #6098.
2023-02-17 09:37:43 +00:00
Andrew Lamb 27890b313f
chore: Update datafusion (#6997)
* chore: Update datafusion

* chore: update the plans

* fix: update some plans

* chore: Update plans and port some explain plans to use insta snapshots

* fix: another plan

* chore: Run cargo hakari tasks

---------

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-02-16 17:03:25 +00:00
Christopher M. Wolff fea5245148
refactor: move GapFillParams to its own module (#7014)
* refactor: move params to own module

* chore: cargo fmt
2023-02-16 16:52:52 +00:00
Stuart Carnie b840ed0ad9
fix: Use `as_expr` vs `col` to avoid splitting identifiers with periods (#7011)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-02-16 11:03:06 +00:00
Marco Neumann 822063b7f2
feat: remember `QueryChunk` for every parquet file (#7000)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-02-16 08:02:13 +00:00
Marco Neumann e41cf080b4
feat: `RecordBatchesExec` remembers chunks (#6999)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-02-16 07:55:35 +00:00
Marco Neumann 67794bccdb
refactor: `group_potential_duplicates` cannot fail (#6998)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-02-15 19:28:02 +00:00
Christopher M. Wolff 7fb052208f
feat: allow gap filling to produce multiple batches (#6986)
* feat: allow gap filling to produce multiple batches

* chore: code review feedback

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-02-14 22:48:29 +00:00
Marco Neumann ed007cb71f
refactor: replace IF-statement w/ optimizer rule (#6982)
* refactor: replace IF-statement w/ optimizer rule

This replaces a single IF-statement within the physical plan
construction with a physical optimizer rule. While on its own this seems
kinda pointless, it sets the foundation for #6098. W/o the optimizer
some EXPLAIN query tests would fail.

* test: use insta snapshots

* fix: update test snapshots

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-02-14 17:10:41 +00:00
Stuart Carnie 969319dfd3
fix: Allow all valid characters following a keyword (#6959)
* fix: Allow all valid characters following a keyword

Closes #6382

* chore: Identified additional test cases
2023-02-13 22:21:11 +00:00
Christopher M. Wolff a2510c8343
feat: partial implementation of gap filling operator (#6911)
* feat: partial implementation of gap filling operator

* chore: code review feedback

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-02-10 19:03:34 +00:00
Andrew Lamb 2f4d901fbe
chore: Update datafusion (#6893)
* chore: Update datafusion

* chore: Run cargo hakari tasks

* fix: update for api change

---------

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-02-08 18:27:18 +00:00
dependabot[bot] 0ecde75af5
chore(deps): Bump object_store from 0.5.3 to 0.5.4 (#6900)
Bumps [object_store](https://github.com/apache/arrow-rs) from 0.5.3 to 0.5.4.
- [Release notes](https://github.com/apache/arrow-rs/releases)
- [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG-old.md)
- [Commits](https://github.com/apache/arrow-rs/compare/object_store_0.5.3...object_store_0.5.4)

---
updated-dependencies:
- dependency-name: object_store
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-02-08 09:40:11 +00:00
Stuart Carnie a2945a77a4
fix: Implement EXPLAIN, ORDER BY and default ordering (#6864)
* chore: Add more tests

* chore: Fix default ordering; implement ORDER BY

* feat: Add EXPLAIN support

* chore: Add additional tests to validate GROUP BY expansion

* chore: More test cases for TZ, and failing log scalar function
2023-02-07 22:18:52 +00:00
Christopher M. Wolff a79b4ec899
refactor: better validation of gap filling queries (#6875)
* refactor: propagate origin argument to gap fill operator

* refactor: add param expressions to from_template

* chore: add more validation for gap fill queries

* feat: extract stride, first and last from gap fill params

* chore: clippy

* refactor: code review feedback

* chore: update for changed result type

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-02-06 23:36:26 +00:00
Raphael Taylor-Davies d3601a59f8
chore: update DataFusion, upgrade `arrow` `arrow-flight` and `parquet` to `32.0.0` (#6756)
* chore: update DataFusion

* fix: test

* chore: format

* chore: clippy

* chore: update arrow

* chore: arrow upgrade fallout

* chore: Run cargo hakari tasks

* chore: remove failing warm compaction test

* fix: flight error propagation

* chore: update parquet size

* fix: Update error message

* chore: Update parquet metadata test

---------

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-02-06 11:35:39 +00:00
Carol (Nichols || Goulding) b0226e9baf
test: Update expected spans in query tracing 2023-02-03 13:06:20 -05:00
Carol (Nichols || Goulding) 38b204c604
fix: Update test expectation, need to investigate 2023-02-03 13:06:20 -05:00
Carol (Nichols || Goulding) 30fea67701
fix: Move variables within format strings. Thanks clippy!
Changes made automatically using `cargo clippy --fix`.
2023-02-03 13:06:17 -05:00
Christopher M. Wolff c9d40d6b80
feat: find time range in WHERE clause for gap-filling (#6805)
* feat: add analysis to find time predicates

* refactor: propagate time range to gap fill logical node

* refactor: propagate time range to GapFillExec

* refactor: code review feedback

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-02-02 23:48:36 +00:00
Stuart Carnie 8e931514fe
feat: Regular expression operator support (#6792)
* feat: Regular expression operator support

* chore: Added additional comments
2023-02-01 23:17:40 +00:00
Stuart Carnie 57f55e14c8
feat: IOx InfluxQL planner learns how to process time range expressions (#6772)
* feat: IOx learns InfluxQL time-range expression → DF logical Expr

IOx now understand the how to evaluate an InfluxQL time-range filter
expression and transform that to a DataFusion logical expression.

* chore: move time range expression to independent functions

There is no need for these to be part of the `InfluxQLToLogicalPlan`
struct and makes them easier to test.

* chore: support scalar now on either side of binary expression

* chore: improve error messages

* chore: address clippy concerns

* chore: add tests for time ranges

* chore: add a test where time appears on the right-hand side

Ensure time is correctly identified on the right-hand side of a
conditional expression.

* chore: add tests that specify a timezone

* chore: Run cargo hakari tasks

* chore: fix linting issues

* chore: Remove unnecessary line

* chore: Feedback: Add API to parse a conditional expression

Based on feedback from @alamb, we don't want to hide the error from
parsing a `ConditionalExpression`. To do this, we use the
public API, `parse_statements` as a model and provide a new API,
`parse_conditional_expression`, which returns a `Result` with the error
being a `ParseError`. Additionally, `ConditionalExpression` implements
the `FromStr` API using the `parse_conditional_expression` API.

* chore: PR feedback reverting this change

I believe my intention was to update all instances in the match, but
never completed the change. Will leave for another day.

* chore: PR feedback add additional comments

* chore: rustfmt

---------

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
2023-02-01 00:27:17 +00:00
Andrew Lamb 80f0125940
feat: Add number of rows to explain of RecordBatchesExec (#6781)
* feat: Add number of rows to explain of RecordBatchesExec

* fix: Update test output
2023-01-31 14:26:20 +00:00
Andrew Lamb 5b14caa780
chore: Update DataFusion (#6753)
* chore: Update datafusion

* fix: Update for changes

* chore: Run cargo hakari tasks

---------

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-01-30 14:48:52 +00:00
dependabot[bot] ed7d02a225
chore(deps): Bump tokio from 1.24.2 to 1.25.0
Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.24.2 to 1.25.0.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/commits/tokio-1.25.0)

---
updated-dependencies:
- dependency-name: tokio
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-01-30 01:57:27 +00:00
Andrew Lamb 0d32662eea
chore: Update datafusion again (#6722)
* chore: Update datafusion

* fix: Update for API

* chore: Run cargo hakari tasks

---------

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-01-27 18:59:27 +00:00
Christopher M. Wolff 9a942ceff5
refactor: propagate gapfill stride to exec (#6690)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-01-24 20:49:29 +00:00
Andrew Lamb c3bc61f10e
refactor: Move `flightsql` code into its own module, add docs and tests (#6640)
* refactor: Move `flightsql`  code into its own module

* fix: get schema from LogicalPlan

* refactor: use arrow_flight::sql::Any instead of prost_types::any

* fix: cleanup docs and avoid as_ref

* fix: Use Bytes

* fix: use Any::pack

* fix: doclink
2023-01-24 18:24:32 +00:00
Marco Neumann cb02262b9d
refactor: extract "exec DF plan" and "store stream to file" components (#6663)
* refactor: extract `PartitionInfo`

* refactor: extract DF exec component

* feat: add some error conversions

* refactor: make fn public

* refactor: extract file sink component

* fix: clippy
2023-01-23 14:40:35 +00:00
Christopher M. Wolff 6f39ae342e
feat: create a GapFillExec type (#6641)
* refactor: make gap fill rule avoid aliasing

* feat: create a GapFillExec type

* refactor: remove unneeded sort node from GapFill rule

* chore: code review feedback

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-01-20 17:44:00 +00:00
Christopher M. Wolff 413e4e4088
feat: create a logical plan node and rule for gap-filling (#6602)
* feat: create a GapFill logical plan node

* feat: create a GapFill optimizer rule

* chore: code review feedback

* chore: fix issue found after merging main

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-01-18 17:01:55 +00:00
Andrew Lamb 8410998408
chore: Update datafusion to Jan 17, 2023 (2 / 2) and arrow/parquet `30.0.1` (#6604)
* chore: Update datafusion to Jan 9, 2023 (2 / 2) and arrow/parquet `30.0.1`

* chore: Update for changes in arrow ipc

* chore: Run cargo hakari tasks

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
2023-01-18 15:51:24 +00:00
Andrew Lamb 57f08dbccd
chore: Update datafusion to Jan 9, 2023 (1 / 2) (#6603)
* refactor: Update DataFusion pin to early Jan 2023

* fix: Update tests now that planning is async

* fix: Updates for API changes

* chore: Run cargo hakari tasks

* fix: Update comment

* refactor: nicer config setup

* fix: gapfill async

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
2023-01-18 12:19:32 +00:00
Stuart Carnie 15a9b4f1e5
refactor: Drop Expr::UnaryOp to simplify tree traversal (#6600)
* refactor: Drop Expr::UnaryOp to simplify tree traversal

The UnaryOp doesn't provide and additional value and complicates
walking the AST, as literal values wrapped in a UnaryOp(Minus, ...)
require extra handling when reducing time range expressions, etc.

This change also is true to the InfluxQL Go implementation,
which represents whole number literals as signed integers unless
they exceed i64::MAX.

* chore: Refactor all usages of format!("{}", ?) to ?.to_string()

Per https://github.com/influxdata/influxdb_iox/pull/6600#discussion_r1072028895
2023-01-18 02:27:38 +00:00
Marco Neumann 56c38ba8e1
feat: safely stream data from one tokio runtime to another (#6586)
* refactor: remove unused code

* refactor: make fn private

* feat: safely stream data from one tokio runtime to another

Closes #6577.

* refactor: review comments

Co-authored-by: Andrew Lamb <alamb@influxdata.com>

* docs: improve

* test: explain

* test: make tests more tricky

* refactor: improve error message

Co-authored-by: Andrew Lamb <alamb@influxdata.com>

Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-01-17 10:32:46 +00:00
Stuart Carnie 3f6bb3e330
feat: Parse IANA timezones in an InfluxQL TZ clause (#6585)
* feat: Parse IANA timezone strings to chrono_tz::Tz

* feat: Visitors can customise the return error type

This avoids having to remap errors from `&'static str` to the caller's
error type, and will be used in a future PR for time range expressions.

* chore: Run cargo hakari tasks

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
2023-01-15 22:00:41 +00:00
Marco Neumann bc030150f5
refactor: improve executor panic/error handling (#6582)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-01-13 07:10:53 +00:00
Stuart Carnie 81ffb3edb5
chore: move walk and the mutable variant to the parser crate (#6575)
It is generally a useful API to be core to the InfluxQL parser crate.
2023-01-12 21:06:06 +00:00
Marco Neumann e2da573dcf
refactor: improve thread naming (#6579)
- name exec driver thread (instead of using the default that `thread::spawn`
  gives us)
- provide number to every worker thread (both for the dedicatd executor
  and for the main runtime)
- shorten thread names (current naming too long for most debug tools)
2023-01-12 14:22:49 +00:00
Stuart Carnie 66047f4372
feat: InfluxQL learns how to plan some InfluxQL queries (#6520)
* feat: InfluxQL learns how to plan some queries

Also added a means to test the planner and execution

* chore: Update module docs

* chore: Document the planner functions

* chore: Update end_to_end_cases crate

* chore: Clarify why `SLIMIT` and `SOFFSET` return `NotImplemented`

* chore: Address lint issues

* chore: Fix rustdoc link issue

* chore: Remove InfluxQL tests from query_tests crate

Will follow conventions established by @carols10cents when
new query_tests crate is merged.

* chore: `now` field

`now` is a DataFusion built-in scalar function

* chore: remove unused code

* chore: Add additional arithmetic expression tests

* chore: Establish pattern for identifying and tracking InfluxQL issues

* chore: Add tests for case sensitivity issues

* chore: group tests into modules and functions

This avoids mass rewriting of insta snapshots as new
tests are added to each function. When tests are added in the middle,
existing snapshots are renamed (-N+1, -N+2, etc) resulting in
having to review numerous additional snapshots.
2023-01-11 02:50:49 +00:00
dependabot[bot] b49cc2e35e
chore(deps): Bump tokio from 1.24.0 to 1.24.1 (#6545)
Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.24.0 to 1.24.1.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.24.0...tokio-1.24.1)

---
updated-dependencies:
- dependency-name: tokio
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-01-10 09:48:44 +00:00
Andrew Lamb 29df5d7fcb
fix: create statistics for nulled columns in RecordBatchExec (#6527) 2023-01-09 07:37:10 +00:00
Raphael Taylor-Davies e1036a0c63
refactor: cleanup schema boxing (#6511)
* refactor: cleanup Schema boxing

* chore: clippy
2023-01-06 10:57:39 +00:00
Raphael Taylor-Davies 2037db7f7b
refactor: decouple influxql from SchemaProvider (#6507)
* refactor: decouple influxql from SchemaProvider

* refactor: reorder arguments

* refactor: use QueryNamespaceMeta

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-01-06 07:36:29 +00:00
Marco Neumann 25f275f1b0
refactor: improve influxRPC warning logging (#6493)
The current version is barely readable because the logged schema w/ all
it's metadata is soooo long.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-01-04 16:52:32 +00:00
Stuart Carnie aacd91db94
feat: Teach InfluxQL rewriter how to name columns (#6481)
* feat: Add timestamp data type

* feat: Add with_quiet API to suppress output to STDOUT

* fix: Field name resolution to match InfluxQL

* refactor: Allow TestChunks to be directly accessed

This will be useful when testing the InfluxQL planner.

* fix: Add Timestamp case to var_ref module

* feat: Add InfluxQL compatible column naming

* chore: Add doc comment.

* fix: keywords may be followed by a `!` such as `!=`

* fix: field_name improvements

* No longer clones expressions
* Explicitly handle all Expr enumerated items
* more tests

* fix: collision with explicitly aliased column

Fixes case where column is explicitly aliased to an auto-named variant.
Test case added to validate.
2023-01-04 00:55:18 +00:00
Andrew Lamb dbe52f1ca1
chore: Upgrade datafusion (#6467)
* chore: Update datafusion

* fix: Update for new apis

* chore: Update expected plan

* fix: Update for new config construction

* chore: update clippy

* fix: Fix error codes

* fix: update another test

* chore: Run cargo hakari tasks

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-01-03 15:29:11 +00:00
Stuart Carnie 4add55d39e
feat: InfluxQL planner learns how to normalise InfluxQL AST (#6236)
* chore: Move logic to context, in line with DataFusion SQL

* chore: Add ordering for InfluxQL data types

Ordering is used to determine automatic casting operations. If two
field columns are present in an expression, one float and one integer,
the integer should be cast to a float, such that the final expression
will be a float.

* chore: Add DerefMut trait to collection types

Will allow these collections to be mutated when traversing the InfluxQL
AST.

* chore: Add influxql module with initial AST normalisation implementation

* chore: Add more unit tests and docs

* chore: Run cargo hakari tasks

* chore: Fix link

* chore: Support regular expression expansion and Call expressions

* chore: Add tests for walk_expr functions

* chore: Add insta snapshot files

* chore: Add docs and make API accessible to the crate

* chore: Move to Arc<dyn SchemaProvider> for use in influxql planner

* chore: Move code back; it is better encapsulated here

* chore: Remove redundant attribute

* chore: Improve regex compatibility with InfluxQL / Go

* chore: Style improvement.

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
2023-01-02 23:48:21 +00:00
Carol (Nichols || Goulding) 46ff8854ec
fix: Use code backticks around invalid HTML tags in doc strings 2022-12-21 16:36:17 -05:00
Carol (Nichols || Goulding) bfc74db94c
fix: Use into_values function. Thanks clippy! 2022-12-21 14:32:35 -05:00
Andrew Lamb d0d5906476
chore: Update datafusion pin (#6442)
* chore: Update datafusion pin

* refactor: Update iox_query for new apis

* chore: Update some more apis

* chore: Run cargo hakari tasks

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
2022-12-19 20:02:42 +00:00
Dom Dwyer 933ab1f8c7
feat(ingester2): optimal persist parallelism
This commit changes the behaviour of the persist system to enable
optimal parallelism of persist operations, and improve the accuracy of
the outstanding job bound / back-pressure.

Previously all persist operations for a given partition were
consistently hashed to a single worker task. This serialised persistence
per partition, ensuring all updates to the partition sort key were
serialised. However, this also unnecessarily serialises persist
operations that do not need to update the sort key, reducing the
potential throughput of the system; in the worst case of a single
partition receiving all the writes, only one worker would be persisting,
and the other N-1 workers would be idle.

After this change, the sort key is inspected when enqueuing the persist
operation and if it can be determined that no sort key update is
necessary (the typical case), then the persist task is placed into a
global work queue from which all workers consume. This allows for
maximal parallelisation of these jobs, and the removes the per-worker
head-of-line blocking.

In the case that the sort key does need updating, these jobs continue to
be consistently hashed to a single worker, ensuring serialised sort key
updates only where necessary.

To support these changes, the back-pressure system has been changed to
account for all outstanding persist jobs in the system, regardless of
type or assigned worker - a logical, bounded queue is composed together
of a semaphore limiting the number of persist tasks overall, and a
series of physical, unbounded queues - one to each worker & the global
queue. The overall system remains bounded by the
INFLUXDB_IOX_PERSIST_QUEUE_DEPTH value, and is now simpler to reason
about (it is independent of the number of workers, etc).
2022-12-15 18:30:51 +01:00
Marco Neumann a5d693eba2
feat: lower Influx regex expressions to DF regex expressions (#6394)
* feat: lower Influx regex experessions to DF regex expressions

For #6388.

* refactor: address review comments
2022-12-15 09:33:28 +00:00
Andrew Lamb 8729977851
chore: Upgrade datafusion / arrow to 29.0.0 to get flightsql client (#6396)
* chore: Update datafusion pin

* chore: Update for API change

* chore: Run cargo hakari tasks

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
2022-12-13 20:16:09 +00:00
Marco Neumann 65687bf0fa
test: regex baseline test (#6389)
For #6388.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-12-13 17:42:31 +00:00
Andrew Lamb 9175f4a0b5
chore: Upgrade datafusion to get correct support for multi-part identifiers (#6349)
* test: add tests for periods in measurement names

* chore: Update Datafusion

* chore: Update for changed APIs

* chore: Update expected plan output

* chore: Run cargo hakari tasks

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-12-08 11:27:13 +00:00
Marco Neumann c25afda6cc
fix: `GroupGenerator`/`Converter` panic (#6351)
Do not poll a ready future.
2022-12-08 11:08:21 +00:00
Marco Neumann 080aff8f71
fix: account for memory allocations in InfluxRPC group outputs (#6345)
* fix: account for memory allocations in InfluxRPC group outputs

This should prevent the querier from OOMing.

See https://github.com/influxdata/idpe/issues/16614 .

* docs: improve

Co-authored-by: Andrew Lamb <alamb@influxdata.com>

* refactor: pull out constant

Co-authored-by: Andrew Lamb <alamb@influxdata.com>
2022-12-08 09:55:31 +00:00
dependabot[bot] 1d38d400f0
chore(deps): Bump object_store from 0.5.1 to 0.5.2 (#6339)
* chore(deps): Bump object_store from 0.5.1 to 0.5.2

Bumps [object_store](https://github.com/apache/arrow-rs) from 0.5.1 to 0.5.2.
- [Release notes](https://github.com/apache/arrow-rs/releases)
- [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG-old.md)
- [Commits](https://github.com/apache/arrow-rs/compare/object_store_0.5.1...object_store_0.5.2)

---
updated-dependencies:
- dependency-name: object_store
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore: Run cargo hakari tasks

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-12-06 07:53:54 +00:00
Marco Neumann f62b270852
fix: gRPC errors regarding group cols (#6314)
* fix: gRPC errors regarding group cols

- missing group col prev. produced an "internal error" but should be
  "invalid argument"
- duplicate group cols produced a panic but should also be "invalid
  argument"

* docs: clarify
2022-12-06 07:36:32 +00:00
Marco Neumann cd6a8a1a82
refactor: DF-driven on-demand mem limit instead of ahead-of-time heuristics (#6313)
* refactor: DF-driven on-demand mem limit instead of ahead-of-time heuristics

Closes #6310.

* refactor: rename and tune default exec mem limits

* fix: ingester2 bits after rebase
2022-12-05 12:38:28 +00:00
Marco Neumann 942a6100b5
fix: check schemas in `pretty_print_batches` (#6309)
* fix: check schemas in `pretty_print_batches`

I think most users of this function (and `assert_batches_eq`) assume
that all batches have the same schema. If not, `pretty_print_batches`
may either fail producing an actual table (some rows may have more or
less columns) or silently produce a table that looks "alright".

* fix: equalize schemas where it is required/desired

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-12-02 12:14:16 +00:00
Marco Neumann ec2e72d223
test: simplify test executors (#6312)
Have a single global test executor w/ reasonable defaults. Also don't
require tests to join/await executor shutdowns (most tests forget this
anyways and will get a runtime warning).

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-12-02 11:38:18 +00:00
Marco Neumann ab4f910111
refactor: improve DF error handling (#6311)
This is required to extract "resource exhausted" errors in more cases.
2022-12-02 11:25:30 +00:00
Marco Neumann e2168ae859
refactor: stream-based series-set conversion (#6285)
* refactor: stream-based series-set conversion

Closes #6216.

* docs: improve

Co-authored-by: Andrew Lamb <alamb@influxdata.com>

* refactor: improve algo docs and tests

* test: fix after rebase

* fix: broken `Series` conversion when slices are present

Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-12-01 17:24:36 +00:00
Andrew Lamb d0f1f6a4fd
chore: Upgrade datafusion to get memory limits (#6297)
* chore: Update datafusion

* fix: use correctly qualified column names

* chore: Run cargo hakari tasks

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-12-01 16:40:26 +00:00
Marco Neumann 01315bc063
refactor: bring back "stream-based `SeriesSetConvert::convert` interface (#6282)" (#6301)
This reverts commit 4a8bb871dc.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-12-01 14:27:43 +00:00
Marco Neumann 6cecc439d4
refactor: revert "simplify `SeriesSet` (#6277)" (#6298)
This reverts commit c41200536e.
2022-12-01 13:30:19 +00:00
Marco Neumann 4a8bb871dc refactor: revert stream-based `SeriesSetConvert::convert` interface (#6282)
This reverts commit dad6dee924.
2022-12-01 12:51:56 +01:00
Marco Neumann dad6dee924
refactor: stream-based `SeriesSetConvert::convert` interface (#6282)
Change the interface of `SeriesSetConvert::convert` to be stream-based.
This is the final interface-prep step before actually implementing #6216.
2022-11-30 17:12:54 +00:00
Marco Neumann c41200536e
refactor: simplify `SeriesSet` (#6277)
`RecordBatch` offers zero-copy slicing, so there is no need to store the
row range manually. This makes #6216 simpler.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-30 16:48:31 +00:00
Marco Neumann fa6f7ee926
refactor: stream-based(TM) `to_series_and_groups`, part 3 (#6275)
* refactor: stream-based(TM) `to_series_and_groups`, part 3

* refactor: remove dead code

Co-authored-by: Andrew Lamb <alamb@influxdata.com>

Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-30 13:21:39 +00:00
Marco Neumann 6eb13712c4
refactor: stream-based(TM) `to_series_and_groups`, part 2 (#6265)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-29 15:00:21 +00:00
Marco Neumann 297ea8be55
refactor: make `IOxSessionContext::exec` non-optional (#6266)
`None` was only used for testing and even than we should probably have a
proper executor instead of panicking for some methods.

Found while working on #6216.
2022-11-29 14:52:32 +00:00
Marco Neumann 514aa60f91
refactor: stream-based(TM) `to_series_and_groups`, part 1 (#6261)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-29 14:16:22 +00:00
Andrew Lamb fc5697b8e7
chore: Update datafusion again (N of N) (#6218)
* chore: Update datafusion again (4 of N)

* fix: Update plans

* fix: Update for renamed API

* fix: Update more plans

* chore: Update to datafusion @ d355f69aae2cc951cfd021e5c0b690861ba0c4ac

* fix: update explain plan tests

* fix: update test after schema error

* chore: Update datafusion again

* fix: Add size() calculation to selectors

* chore: Run cargo hakari tasks

* fix: Update newly added test

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-28 17:09:40 +00:00
Christopher M. Wolff aa7a3a7721
fix: ignore fields when considering tag predicates (#6212)
* fix: ignore fields when considering tag predicates

* chore: update test to not use time column in predicate

* chore: update with review feedback

* chore: update tests to avoid fields refs in RPC preds

This is more like what would be coming off the wire from
Influx RPC.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-28 15:16:55 +00:00
Stuart Carnie 2306c383f3
feat: Introduce InfluxQL to Flight (#6166)
* feat: Introduce InfluxQL to Flight

All InfluxQL queries will fail with an error

* chore: Temper protobuf lint

* chore: Finalize flight.proto changes; fix tests

* chore: Add tests for InfluxQL planner

* chore: Update docs

* chore: Update docs

* chore: Rename back to original

* chore: Use .into() rather than cast

* chore: Use function rather than field

* chore: Improved InfluxQL planner name

* chore: Restore `impl Into<String>` argument

* chore: Add a comment that Go clients are unable to execute InfluxQL

* chore: Add a test for the `--lang` argument and InfluxQL
2022-11-23 00:33:49 +00:00
Andrew Lamb 1a1ea74cb7
chore: Upgrade datafusion again (#6160)
* Revert "Revert "chore: Update datafusion again (#6108)""

This reverts commit 766b3bbeb440618cfe332f6ee7d4f8a8217acc48.

* fix: Respect the partition sort key

* chore: update plans

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-22 19:28:26 +00:00
dependabot[bot] a9db7581cd
chore(deps): Bump tokio from 1.21.2 to 1.22.0 (#6183)
Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.21.2 to 1.22.0.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.21.2...tokio-1.22.0)

---
updated-dependencies:
- dependency-name: tokio
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-21 10:21:24 +00:00
Andrew Lamb 4630bbb956
feat: push down all predicates (#6042)
* feat: push down all predicates

* fix: fmt

* fix: fmt

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-18 16:22:01 +00:00
Marco Neumann 71ffc92559
fix: only push safe select expression through de-dup (#6156)
* fix: only push safe select expression through de-dup

Fixes #6066.

* docs: improve

Co-authored-by: Andrew Lamb <alamb@influxdata.com>

* fix: rebase

* test: ensure we do not split ORs

Co-authored-by: Andrew Lamb <alamb@influxdata.com>
2022-11-18 09:56:11 +00:00
Andrew Lamb 67712b595c
Revert "chore: Update datafusion again (#6108)" (#6159)
This reverts commit fbe9f27f10.
2022-11-16 21:14:55 +00:00
Andrew Lamb fbe9f27f10
chore: Update datafusion again (#6108)
* chore: Update datafusion pin + api code

* chore: Run cargo hakari tasks

* refactor: combine_sort_key is more idomatic and add rationale comments

* refactor: satisfy borrow checker and updated comments

* fix: Add test case for combine_sort_key

* fix:  Apply suggestions from code review

Co-authored-by: Marco Neumann <marco@crepererum.net>

* fix: Add back test for deeply nested expression

* fix: Update output ordering

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: Marco Neumann <marco@crepererum.net>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-16 14:41:52 +00:00
Andrew Lamb 20f1ae1c8f
test: tests in the reorg planner and query tests for merging parquet files (#6137)
* test: tests in the reorg planner and query tests for merging parquet files

* fix: use 20 files

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-15 20:29:44 +00:00
dependabot[bot] a969754819
chore(deps): Bump chrono from 0.4.22 to 0.4.23 (#6129)
* chore(deps): Bump chrono from 0.4.22 to 0.4.23

Bumps [chrono](https://github.com/chronotope/chrono) from 0.4.22 to 0.4.23.
- [Release notes](https://github.com/chronotope/chrono/releases)
- [Changelog](https://github.com/chronotope/chrono/blob/main/CHANGELOG.md)
- [Commits](https://github.com/chronotope/chrono/compare/v0.4.22...v0.4.23)

---
updated-dependencies:
- dependency-name: chrono
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* refactor: chrono future compat

Integer->timstamp conversions should not silently panic.

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Marco Neumann <marco@crepererum.net>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-14 13:34:09 +00:00
kodiakhq[bot] 05d7d1495e
Merge branch 'main' into dependabot/cargo/hashbrown-0.13.1 2022-11-11 21:26:40 +00:00
Carol (Nichols || Goulding) 0657ad9600
fix: Rename QueryDatabase to QueryNamespace 2022-11-11 16:14:12 -05:00
Carol (Nichols || Goulding) 621560a0dc
fix: Rename QueryDatabaseMeta to QueryNamespaceMeta 2022-11-11 16:14:12 -05:00
Carol (Nichols || Goulding) bdff4e8848
fix: Consistently use 'namespace' instead of 'database' in comments and other internal text 2022-11-11 15:46:04 -05:00