Commit Graph

207 Commits (88d288235023831fdbb4f5c4c1d0685b7df22502)

Author SHA1 Message Date
Marco Neumann bda2310ca1
feat: extract chunks from phys. plan (#7018)
* feat: extract chunks from phys. plan

For #6098.

* test: ensure that `extract_chunks` does NOT scan through other nodes
2023-02-17 11:41:39 +00:00
Marco Neumann a8feed120c
test: `chunks_to_physical_nodes` (#7013)
No new actual code but sets up some test infra that I need for #6098.
2023-02-17 09:37:43 +00:00
Andrew Lamb 27890b313f
chore: Update datafusion (#6997)
* chore: Update datafusion

* chore: update the plans

* fix: update some plans

* chore: Update plans and port some explain plans to use insta snapshots

* fix: another plan

* chore: Run cargo hakari tasks

---------

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-02-16 17:03:25 +00:00
Christopher M. Wolff fea5245148
refactor: move GapFillParams to its own module (#7014)
* refactor: move params to own module

* chore: cargo fmt
2023-02-16 16:52:52 +00:00
Stuart Carnie b840ed0ad9
fix: Use `as_expr` vs `col` to avoid splitting identifiers with periods (#7011)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-02-16 11:03:06 +00:00
Marco Neumann 822063b7f2
feat: remember `QueryChunk` for every parquet file (#7000)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-02-16 08:02:13 +00:00
Marco Neumann e41cf080b4
feat: `RecordBatchesExec` remembers chunks (#6999)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-02-16 07:55:35 +00:00
Marco Neumann 67794bccdb
refactor: `group_potential_duplicates` cannot fail (#6998)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-02-15 19:28:02 +00:00
Christopher M. Wolff 7fb052208f
feat: allow gap filling to produce multiple batches (#6986)
* feat: allow gap filling to produce multiple batches

* chore: code review feedback

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-02-14 22:48:29 +00:00
Marco Neumann ed007cb71f
refactor: replace IF-statement w/ optimizer rule (#6982)
* refactor: replace IF-statement w/ optimizer rule

This replaces a single IF-statement within the physical plan
construction with a physical optimizer rule. While on its own this seems
kinda pointless, it sets the foundation for #6098. W/o the optimizer
some EXPLAIN query tests would fail.

* test: use insta snapshots

* fix: update test snapshots

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-02-14 17:10:41 +00:00
Stuart Carnie 969319dfd3
fix: Allow all valid characters following a keyword (#6959)
* fix: Allow all valid characters following a keyword

Closes #6382

* chore: Identified additional test cases
2023-02-13 22:21:11 +00:00
Christopher M. Wolff a2510c8343
feat: partial implementation of gap filling operator (#6911)
* feat: partial implementation of gap filling operator

* chore: code review feedback

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-02-10 19:03:34 +00:00
Andrew Lamb 2f4d901fbe
chore: Update datafusion (#6893)
* chore: Update datafusion

* chore: Run cargo hakari tasks

* fix: update for api change

---------

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-02-08 18:27:18 +00:00
dependabot[bot] 0ecde75af5
chore(deps): Bump object_store from 0.5.3 to 0.5.4 (#6900)
Bumps [object_store](https://github.com/apache/arrow-rs) from 0.5.3 to 0.5.4.
- [Release notes](https://github.com/apache/arrow-rs/releases)
- [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG-old.md)
- [Commits](https://github.com/apache/arrow-rs/compare/object_store_0.5.3...object_store_0.5.4)

---
updated-dependencies:
- dependency-name: object_store
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-02-08 09:40:11 +00:00
Stuart Carnie a2945a77a4
fix: Implement EXPLAIN, ORDER BY and default ordering (#6864)
* chore: Add more tests

* chore: Fix default ordering; implement ORDER BY

* feat: Add EXPLAIN support

* chore: Add additional tests to validate GROUP BY expansion

* chore: More test cases for TZ, and failing log scalar function
2023-02-07 22:18:52 +00:00
Christopher M. Wolff a79b4ec899
refactor: better validation of gap filling queries (#6875)
* refactor: propagate origin argument to gap fill operator

* refactor: add param expressions to from_template

* chore: add more validation for gap fill queries

* feat: extract stride, first and last from gap fill params

* chore: clippy

* refactor: code review feedback

* chore: update for changed result type

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-02-06 23:36:26 +00:00
Raphael Taylor-Davies d3601a59f8
chore: update DataFusion, upgrade `arrow` `arrow-flight` and `parquet` to `32.0.0` (#6756)
* chore: update DataFusion

* fix: test

* chore: format

* chore: clippy

* chore: update arrow

* chore: arrow upgrade fallout

* chore: Run cargo hakari tasks

* chore: remove failing warm compaction test

* fix: flight error propagation

* chore: update parquet size

* fix: Update error message

* chore: Update parquet metadata test

---------

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-02-06 11:35:39 +00:00
Carol (Nichols || Goulding) b0226e9baf
test: Update expected spans in query tracing 2023-02-03 13:06:20 -05:00
Carol (Nichols || Goulding) 38b204c604
fix: Update test expectation, need to investigate 2023-02-03 13:06:20 -05:00
Carol (Nichols || Goulding) 30fea67701
fix: Move variables within format strings. Thanks clippy!
Changes made automatically using `cargo clippy --fix`.
2023-02-03 13:06:17 -05:00
Christopher M. Wolff c9d40d6b80
feat: find time range in WHERE clause for gap-filling (#6805)
* feat: add analysis to find time predicates

* refactor: propagate time range to gap fill logical node

* refactor: propagate time range to GapFillExec

* refactor: code review feedback

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-02-02 23:48:36 +00:00
Stuart Carnie 8e931514fe
feat: Regular expression operator support (#6792)
* feat: Regular expression operator support

* chore: Added additional comments
2023-02-01 23:17:40 +00:00
Stuart Carnie 57f55e14c8
feat: IOx InfluxQL planner learns how to process time range expressions (#6772)
* feat: IOx learns InfluxQL time-range expression → DF logical Expr

IOx now understand the how to evaluate an InfluxQL time-range filter
expression and transform that to a DataFusion logical expression.

* chore: move time range expression to independent functions

There is no need for these to be part of the `InfluxQLToLogicalPlan`
struct and makes them easier to test.

* chore: support scalar now on either side of binary expression

* chore: improve error messages

* chore: address clippy concerns

* chore: add tests for time ranges

* chore: add a test where time appears on the right-hand side

Ensure time is correctly identified on the right-hand side of a
conditional expression.

* chore: add tests that specify a timezone

* chore: Run cargo hakari tasks

* chore: fix linting issues

* chore: Remove unnecessary line

* chore: Feedback: Add API to parse a conditional expression

Based on feedback from @alamb, we don't want to hide the error from
parsing a `ConditionalExpression`. To do this, we use the
public API, `parse_statements` as a model and provide a new API,
`parse_conditional_expression`, which returns a `Result` with the error
being a `ParseError`. Additionally, `ConditionalExpression` implements
the `FromStr` API using the `parse_conditional_expression` API.

* chore: PR feedback reverting this change

I believe my intention was to update all instances in the match, but
never completed the change. Will leave for another day.

* chore: PR feedback add additional comments

* chore: rustfmt

---------

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
2023-02-01 00:27:17 +00:00
Andrew Lamb 80f0125940
feat: Add number of rows to explain of RecordBatchesExec (#6781)
* feat: Add number of rows to explain of RecordBatchesExec

* fix: Update test output
2023-01-31 14:26:20 +00:00
Andrew Lamb 5b14caa780
chore: Update DataFusion (#6753)
* chore: Update datafusion

* fix: Update for changes

* chore: Run cargo hakari tasks

---------

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-01-30 14:48:52 +00:00
dependabot[bot] ed7d02a225
chore(deps): Bump tokio from 1.24.2 to 1.25.0
Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.24.2 to 1.25.0.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/commits/tokio-1.25.0)

---
updated-dependencies:
- dependency-name: tokio
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-01-30 01:57:27 +00:00
Andrew Lamb 0d32662eea
chore: Update datafusion again (#6722)
* chore: Update datafusion

* fix: Update for API

* chore: Run cargo hakari tasks

---------

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-01-27 18:59:27 +00:00
Christopher M. Wolff 9a942ceff5
refactor: propagate gapfill stride to exec (#6690)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-01-24 20:49:29 +00:00
Andrew Lamb c3bc61f10e
refactor: Move `flightsql` code into its own module, add docs and tests (#6640)
* refactor: Move `flightsql`  code into its own module

* fix: get schema from LogicalPlan

* refactor: use arrow_flight::sql::Any instead of prost_types::any

* fix: cleanup docs and avoid as_ref

* fix: Use Bytes

* fix: use Any::pack

* fix: doclink
2023-01-24 18:24:32 +00:00
Marco Neumann cb02262b9d
refactor: extract "exec DF plan" and "store stream to file" components (#6663)
* refactor: extract `PartitionInfo`

* refactor: extract DF exec component

* feat: add some error conversions

* refactor: make fn public

* refactor: extract file sink component

* fix: clippy
2023-01-23 14:40:35 +00:00
Christopher M. Wolff 6f39ae342e
feat: create a GapFillExec type (#6641)
* refactor: make gap fill rule avoid aliasing

* feat: create a GapFillExec type

* refactor: remove unneeded sort node from GapFill rule

* chore: code review feedback

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-01-20 17:44:00 +00:00
Christopher M. Wolff 413e4e4088
feat: create a logical plan node and rule for gap-filling (#6602)
* feat: create a GapFill logical plan node

* feat: create a GapFill optimizer rule

* chore: code review feedback

* chore: fix issue found after merging main

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-01-18 17:01:55 +00:00
Andrew Lamb 8410998408
chore: Update datafusion to Jan 17, 2023 (2 / 2) and arrow/parquet `30.0.1` (#6604)
* chore: Update datafusion to Jan 9, 2023 (2 / 2) and arrow/parquet `30.0.1`

* chore: Update for changes in arrow ipc

* chore: Run cargo hakari tasks

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
2023-01-18 15:51:24 +00:00
Andrew Lamb 57f08dbccd
chore: Update datafusion to Jan 9, 2023 (1 / 2) (#6603)
* refactor: Update DataFusion pin to early Jan 2023

* fix: Update tests now that planning is async

* fix: Updates for API changes

* chore: Run cargo hakari tasks

* fix: Update comment

* refactor: nicer config setup

* fix: gapfill async

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
2023-01-18 12:19:32 +00:00
Stuart Carnie 15a9b4f1e5
refactor: Drop Expr::UnaryOp to simplify tree traversal (#6600)
* refactor: Drop Expr::UnaryOp to simplify tree traversal

The UnaryOp doesn't provide and additional value and complicates
walking the AST, as literal values wrapped in a UnaryOp(Minus, ...)
require extra handling when reducing time range expressions, etc.

This change also is true to the InfluxQL Go implementation,
which represents whole number literals as signed integers unless
they exceed i64::MAX.

* chore: Refactor all usages of format!("{}", ?) to ?.to_string()

Per https://github.com/influxdata/influxdb_iox/pull/6600#discussion_r1072028895
2023-01-18 02:27:38 +00:00
Marco Neumann 56c38ba8e1
feat: safely stream data from one tokio runtime to another (#6586)
* refactor: remove unused code

* refactor: make fn private

* feat: safely stream data from one tokio runtime to another

Closes #6577.

* refactor: review comments

Co-authored-by: Andrew Lamb <alamb@influxdata.com>

* docs: improve

* test: explain

* test: make tests more tricky

* refactor: improve error message

Co-authored-by: Andrew Lamb <alamb@influxdata.com>

Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-01-17 10:32:46 +00:00
Stuart Carnie 3f6bb3e330
feat: Parse IANA timezones in an InfluxQL TZ clause (#6585)
* feat: Parse IANA timezone strings to chrono_tz::Tz

* feat: Visitors can customise the return error type

This avoids having to remap errors from `&'static str` to the caller's
error type, and will be used in a future PR for time range expressions.

* chore: Run cargo hakari tasks

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
2023-01-15 22:00:41 +00:00
Marco Neumann bc030150f5
refactor: improve executor panic/error handling (#6582)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-01-13 07:10:53 +00:00
Stuart Carnie 81ffb3edb5
chore: move walk and the mutable variant to the parser crate (#6575)
It is generally a useful API to be core to the InfluxQL parser crate.
2023-01-12 21:06:06 +00:00
Marco Neumann e2da573dcf
refactor: improve thread naming (#6579)
- name exec driver thread (instead of using the default that `thread::spawn`
  gives us)
- provide number to every worker thread (both for the dedicatd executor
  and for the main runtime)
- shorten thread names (current naming too long for most debug tools)
2023-01-12 14:22:49 +00:00
Stuart Carnie 66047f4372
feat: InfluxQL learns how to plan some InfluxQL queries (#6520)
* feat: InfluxQL learns how to plan some queries

Also added a means to test the planner and execution

* chore: Update module docs

* chore: Document the planner functions

* chore: Update end_to_end_cases crate

* chore: Clarify why `SLIMIT` and `SOFFSET` return `NotImplemented`

* chore: Address lint issues

* chore: Fix rustdoc link issue

* chore: Remove InfluxQL tests from query_tests crate

Will follow conventions established by @carols10cents when
new query_tests crate is merged.

* chore: `now` field

`now` is a DataFusion built-in scalar function

* chore: remove unused code

* chore: Add additional arithmetic expression tests

* chore: Establish pattern for identifying and tracking InfluxQL issues

* chore: Add tests for case sensitivity issues

* chore: group tests into modules and functions

This avoids mass rewriting of insta snapshots as new
tests are added to each function. When tests are added in the middle,
existing snapshots are renamed (-N+1, -N+2, etc) resulting in
having to review numerous additional snapshots.
2023-01-11 02:50:49 +00:00
dependabot[bot] b49cc2e35e
chore(deps): Bump tokio from 1.24.0 to 1.24.1 (#6545)
Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.24.0 to 1.24.1.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.24.0...tokio-1.24.1)

---
updated-dependencies:
- dependency-name: tokio
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-01-10 09:48:44 +00:00
Andrew Lamb 29df5d7fcb
fix: create statistics for nulled columns in RecordBatchExec (#6527) 2023-01-09 07:37:10 +00:00
Raphael Taylor-Davies e1036a0c63
refactor: cleanup schema boxing (#6511)
* refactor: cleanup Schema boxing

* chore: clippy
2023-01-06 10:57:39 +00:00
Raphael Taylor-Davies 2037db7f7b
refactor: decouple influxql from SchemaProvider (#6507)
* refactor: decouple influxql from SchemaProvider

* refactor: reorder arguments

* refactor: use QueryNamespaceMeta

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-01-06 07:36:29 +00:00
Marco Neumann 25f275f1b0
refactor: improve influxRPC warning logging (#6493)
The current version is barely readable because the logged schema w/ all
it's metadata is soooo long.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-01-04 16:52:32 +00:00
Stuart Carnie aacd91db94
feat: Teach InfluxQL rewriter how to name columns (#6481)
* feat: Add timestamp data type

* feat: Add with_quiet API to suppress output to STDOUT

* fix: Field name resolution to match InfluxQL

* refactor: Allow TestChunks to be directly accessed

This will be useful when testing the InfluxQL planner.

* fix: Add Timestamp case to var_ref module

* feat: Add InfluxQL compatible column naming

* chore: Add doc comment.

* fix: keywords may be followed by a `!` such as `!=`

* fix: field_name improvements

* No longer clones expressions
* Explicitly handle all Expr enumerated items
* more tests

* fix: collision with explicitly aliased column

Fixes case where column is explicitly aliased to an auto-named variant.
Test case added to validate.
2023-01-04 00:55:18 +00:00
Andrew Lamb dbe52f1ca1
chore: Upgrade datafusion (#6467)
* chore: Update datafusion

* fix: Update for new apis

* chore: Update expected plan

* fix: Update for new config construction

* chore: update clippy

* fix: Fix error codes

* fix: update another test

* chore: Run cargo hakari tasks

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-01-03 15:29:11 +00:00
Stuart Carnie 4add55d39e
feat: InfluxQL planner learns how to normalise InfluxQL AST (#6236)
* chore: Move logic to context, in line with DataFusion SQL

* chore: Add ordering for InfluxQL data types

Ordering is used to determine automatic casting operations. If two
field columns are present in an expression, one float and one integer,
the integer should be cast to a float, such that the final expression
will be a float.

* chore: Add DerefMut trait to collection types

Will allow these collections to be mutated when traversing the InfluxQL
AST.

* chore: Add influxql module with initial AST normalisation implementation

* chore: Add more unit tests and docs

* chore: Run cargo hakari tasks

* chore: Fix link

* chore: Support regular expression expansion and Call expressions

* chore: Add tests for walk_expr functions

* chore: Add insta snapshot files

* chore: Add docs and make API accessible to the crate

* chore: Move to Arc<dyn SchemaProvider> for use in influxql planner

* chore: Move code back; it is better encapsulated here

* chore: Remove redundant attribute

* chore: Improve regex compatibility with InfluxQL / Go

* chore: Style improvement.

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
2023-01-02 23:48:21 +00:00
Carol (Nichols || Goulding) 46ff8854ec
fix: Use code backticks around invalid HTML tags in doc strings 2022-12-21 16:36:17 -05:00