Commit Graph

348 Commits (4fa6ead27d3397b8c08467394efbe78d0e317d4f)

Author SHA1 Message Date
Andrew Lamb 530ee94558
fix: use correct sort key in projection_pushdown (#7718)
* fix: use correct sort key in projection_pushdown

* fix: tabs in docs

* refactor: Use Serde to format test results
2023-05-02 16:50:04 +00:00
Christopher M. Wolff 493b26831d
fix: make influx RPC interface break up series into multiple frames (#7691)
* fix: make influx RPC interface break up series into multiple frames

* refactor: code review feedback

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-05-01 20:18:05 +00:00
Marco Neumann 0556fdae53
refactor: remove `QueryChunk::partition_sort_key` (#7680)
As of #7250 / #7449 the partition sort key is no longer required for
query planning. Instead we use a combination of
`QueryChunk::partition_id` and `QueryChunk::sort_key` which is more
robust and easier to reason about.

Removing it simplifies the querier code a lot since we no longer need to
have a sort key for the ingester chunks and also don't need to "sync"
the sort key between chunks for consistency.
2023-04-27 10:54:41 +00:00
dependabot[bot] bdf7f316d7
chore(deps): Bump tokio from 1.27.0 to 1.28.0 (#7667)
* chore(deps): Bump tokio from 1.27.0 to 1.28.0

Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.27.0 to 1.28.0.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.27.0...tokio-1.28.0)

---
updated-dependencies:
- dependency-name: tokio
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore: Run cargo hakari tasks

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: Dom <dom@itsallbroken.com>
2023-04-26 12:53:26 +00:00
Christopher M. Wolff 7a6862ee3a
refactor: let date_bin_gapfill allow omitted origin (#7595)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-04-19 14:56:52 +00:00
Marco Neumann d7dc305972
feat: allow overwriting DataFusion's default config (#7586)
This is helpful to test changes in our defaults but also for testing.

Required for https://github.com/influxdata/idpe/issues/17474 .

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-04-18 11:28:45 +00:00
Andrew Lamb f46d06d56f
chore: Update DataFusion + arrow ecosystem to 37 (#7544)
* chore: Update datafusion and arrow/parquet to 37, tonic to 0.9.1

* refactor: Update for FieldRef and other API changes

* fix: Update field size calculation

* fix: Use `NullBuffer` directly

* fix: remove outdated comment

* chore: Update test for tonic

* chore: Run cargo hakari tasks

* chore: cargo update

---------

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-04-14 12:43:01 +00:00
Andrew Lamb 134ff2ef83
chore: update DataFusion pin (right before arrow 37 update) (#7540)
* chore: update DataFusion pin

* refactor: Update for deprecated API

* chore: Run cargo hakari tasks

---------

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-04-13 17:25:24 +00:00
Andrew Lamb 3ebd07358b
chore: Update DataFusion pin, upgrade `date_bin` and `InfluxQL` to use `Interval(MonthDayNano)` (#7516)
* chore: Update datafusion

* chore: Update for change in PhysicalSortExpr

* refactor: Update date_bin_gapfill to take IntervalMonthDayNano, fix FlightSQL

* chore: Run cargo hakari tasks

---------

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-04-13 10:43:32 +00:00
Christopher M. Wolff cbd747db44
feat: update gap fill planner rule to use `interpolate` (#7494)
* feat: add INTERPOLATE fn and update planner gap-fill planner rule

* test: add an end-to-end test for interpolate()
2023-04-12 21:51:44 +00:00
Christopher M. Wolff 0937615dba
fix: make interpolate() fill null values in input (#7490)
* fix: make interpolate() fill null values in input

* chore: cargo doc
2023-04-12 21:41:11 +00:00
Christopher M. Wolff 3e60369eff
refactor: input buffering for gap filling interpolate null-as-missing (#7478)
* refactor: move logic for knowing how much to buffer into GapFiller

* chore: clippy

* chore: add some clarifying comments

* refactor: clean up relationships between gap filling types

* refactor: remove use of RefCell from BufferedInput

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-04-12 21:08:51 +00:00
Andrew Lamb 8c42fedf33
chore: Remove dead code (#7475)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-04-11 10:44:49 +00:00
Andrew Lamb 1a80b8073c
fix: Improve span names for query access (#7476)
* fix: Improve span names for query access

* fix: update test

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-04-11 10:34:09 +00:00
Marco Neumann 5f43f2a719
refactor: remove old query planning code (#7449)
Closes #7406.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-04-06 16:05:08 +00:00
Marco Neumann 30b1878171
test: `ChunkTableProvider::scan` + fix "not dedup" (#7448)
1. Add loads of tests for `ChunkTableProvider::scan` (= the naive phys.
   plan before running any phys. optimizers)
2. Fix interaction of "no de-dup" and predicate pushdown. This might
   be used by the ingester at some point and I would like to have this
   correct before someone silently introduces a bug by pushing field
   predicates into the ingester.

This is mostly prep-work for #7406 so I know that test coverage is
sufficient.
2023-04-06 08:39:53 +00:00
Andrew Lamb e8b7d69b0f
chore: Update datafusion again (#7442)
* chore: Update datafusion

* chore: Fix up plans for datafusion API change

* chore: Run cargo hakari tasks

---------

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-04-05 18:21:53 +00:00
Andrew Lamb 94d390f31e
test: Add additional tests for reorg plans (#7444)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-04-05 11:15:23 +00:00
Christopher M. Wolff d57a4f8947
refactor: make null-as-missing default behavior for LOCF (#7443)
* refactor: make null-as-missing default behavior for LOCF

* test: update InfluxQL test

---------

Co-authored-by: Christopher Wolff <cwolff@athena.tail244ec.ts.net>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-04-04 18:03:09 +00:00
Andrew Lamb badc8865ef
chore: Update datafusion again (#7440)
* chore: Update DataFusion

* chore: Update for new API

* chore: Run cargo hakari tasks

* fix: cargo doc

---------

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
2023-04-04 15:45:46 +00:00
dependabot[bot] 66982f988b
chore(deps): Bump object_store from 0.5.5 to 0.5.6 (#7433)
Bumps [object_store](https://github.com/apache/arrow-rs) from 0.5.5 to 0.5.6.
- [Release notes](https://github.com/apache/arrow-rs/releases)
- [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG-old.md)
- [Commits](https://github.com/apache/arrow-rs/commits)

---
updated-dependencies:
- dependency-name: object_store
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Dom <dom@itsallbroken.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-04-04 08:43:34 +00:00
Marco Neumann e9bdf96457
refactor: remove DF-clean-DF phys. optimizer pass hack (#7428)
As discussed in https://github.com/influxdata/influxdb_iox/pull/7250#discussion_r1155684471

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-04-04 08:09:35 +00:00
Marco Neumann f04962d630
feat: new query planning (#7250)
Closes #6098.
2023-04-03 10:31:03 +00:00
Marco Neumann e3b802cd25
feat: "parquet sortness" optimizer pass (#7383)
* feat: "parquet sortness" optimizer pass

Trade wider fan-out for the not having to fully sort parquet files.

For #6098.

* test: rename

Co-authored-by: Andrew Lamb <alamb@influxdata.com>

---------

Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-03-31 08:01:33 +00:00
Marco Neumann 2d7bff91b5
feat: allow gap-fill logical opt. to handle inline filters (#7384)
With #6098 our `TableProvider` will declare `supports_filter_pushdown`
as "exact" since we handle the predicate pushdown ourselves. This has
two effects:

1. The phys. plan no longer contains an additional `FilterExec` node
   even if we already do all the correct filtering. This will improve
   performance.
2. The logical plan no longer contains a `Filter` node but instead the
   predicate is part of the `TableScan`. This simplifies the logical
   plan.

For (2) we need to adjust the gap fill logical optimizer to find the
time range again. Otherwise the optimizer pass will fail (which is
currently somewhat swallowed by DataFusion even though it is logged) and
the physical plan will contain our placeholder UDFs that are not
executable.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-03-31 06:09:51 +00:00
Marco Neumann d2f3f279f3
fix: projection pushdown w/ resorting (#7381)
We should resort properly when performing projection pushdown. Extended
test utils to actually catch this by checking the plan schemas.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-03-30 10:24:23 +00:00
dependabot[bot] 9cbcdc7672
chore(deps): Bump tokio from 1.26.0 to 1.27.0 (#7373)
Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.26.0 to 1.27.0.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.26.0...tokio-1.27.0)

---
updated-dependencies:
- dependency-name: tokio
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-03-30 09:36:04 +00:00
Marco Neumann 066c3280eb
fix: phys. optimizers must respect sort partitioning (#7362)
* fix: sort pushdown must preserve partioning

* fix: projection pushdown must preserve sort partitioning

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-03-30 08:10:21 +00:00
Stuart Carnie 19a0c7fe9c
feat: Teach InfluxQL how to process `FILL(null|previous|<value>)` (#7359)
* chore: Publicise gap-filling APIs

Helps #6916

* feat: IOx learns `FILL(null|previous|<value>)`

Helps #6916

* chore: More test cases

* chore: Revert change to TreeNodeVisitor

* chore: Update snapshot with expected gap-filling changes
2023-03-29 23:11:20 +00:00
Christopher M. Wolff f41c1a7945
feat: update gap fill planner rule to use LOCF (#7358)
* feat: update gap fill planner rule to use LOCF

* chore: cargo fmt

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-03-29 15:45:34 +00:00
Marco Neumann 39856ad432
fix: projection pushdown should project `ParquetExec` ordering (#7356)
* fix: projection pushdown should project `ParquetExec` ordering

Bug found while working on the final steps for #6098.

* fix: Update expected output

* test: make test even harder

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
2023-03-29 09:05:19 +00:00
Marco Neumann 52e54e0f8d
feat: more aggressive `CombineChunks` (#7355)
Try to combine chunks even when not all Union-arms/inputs are
combinable. This will later help to transform

```yaml
---
union:
  - parquet:
      files: [f1]
  - parquet:
      files: [f2]
  - dedup:
      parquet:
        files: [f3]
```

into

```yaml
---
union:
  - parquet:
      files: [f1, f2]
  - dedup:
      parquet:
        files: [f3]

```

Helps #6098.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-03-29 06:24:17 +00:00
Andrew Lamb 43e236e040
chore: Update datafusion again (#7353)
* chore: Update DataFusion

* refactor: Update predicate crate for new transform API

* refactor: Update iox_query crate for new APIs

* refactor: Update influxql for new API

* chore: Run cargo hakari tasks

---------

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-03-28 16:21:49 +00:00
Christopher M. Wolff dbf6493312
feat: add scalar function LOCF (#7347)
* feat: add scalar function LOCF

* chore: cargo update spin@0.9.6

Apparently this version was yanked
2023-03-28 14:35:27 +00:00
Marco Neumann 71b88b22b9
fix: ensure we don't loose predicates in chunk roundtrips (#7340)
`extract_chunks` never runs after predicate pushdown. However IF this
should ever happen, we would potentially forget the predicates attached
to `ParquetExec`. So let's make sure we refuse chunk extraction in this
case. This is similar to the existing behavior, i.e. we don't support
chunk extraction after filter pushdown (i.e. if there is a filter around
an `RecordBatchesExec`).

For #6098.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-03-27 11:18:56 +00:00
Christopher M. Wolff f73187ff7e
feat: add interpolation fill strategy to GapFillExec (#7317)
* feat: add interpolation fill strategy to GapFillExec

* chore: clippy

* chore: code review feedback

* chore: fix doc comments

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-03-24 18:53:14 +00:00
Andrew Lamb 5dd71998a1
chore: Update datafusion (#7318)
* chore: Update datafusion

* chore: Update for API change

* chore: Run cargo hakari tasks

---------

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-03-24 15:07:23 +00:00
Andrew Lamb 184565b552
feat(flightsql): Implement FlightSQL `GetSqlInfo` endpoint (#7198)
* feat(flightsql): Implement GetSqlInfo endpoint

* chore: Add some comments to clarify the tests intent
2023-03-20 19:34:18 +00:00
Christopher M. Wolff 866f9cefa1
feat: add null-as-missing gap filling (#7245)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-03-17 20:34:45 +00:00
Andrew Lamb 96c2094302
refactor(iox_query): extract influxrpc planner to its own crate (#7241)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-03-17 18:48:55 +00:00
Marco Neumann 20ec47b00b
feat: virtual chunk order col (#7240)
* feat: introduce `CHUNK_ORDER_COLUMN_NAME`

* feat: impl `ChunkOrder` everywhere

* feat: `ChunkOrder::get`

* feat: emit chunk order column for `RecordBatchesExec`

* feat: `chunk_order_field`

* feat: chunk order col for parquet chunks

* feat: optional chunk order col handling for dedup

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-03-17 09:39:21 +00:00
Marco Neumann 3e40de3cd4
feat: recover desired output sort in in `extract_chunks` (#7233)
This is helpful so that optimizer passes to forget the sort key, esp.
when the run after `DedupNullColumns` and `DedupSortOrder`.

For #6098.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-03-17 09:19:10 +00:00
Andrew Lamb 3fb4fad784
refactor(iox_query): Rename `prepare_sql` to `sql_to_physical_plan` (#7226)
* refactor(iox_query): Rename `prepare_sql` to `sql_to_physical_plan`

* fix: logical conflict
2023-03-16 19:12:15 +00:00
Andrew Lamb 7dfaa05e8a
chore: Update datafusion again (#7208)
* chore: update datafusion again

* fix: update test

* fix: use table_reference

* fix: clean up import

* chore: Run cargo hakari tasks

---------

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-03-16 14:34:40 +00:00
Marco Neumann 45d23f7652
refactor: `extract_chunks` return arrow schema (#7231)
Similar to #7217 there is no need to convert the arrow schema to an IOx
schema. This also makes it easier to handle the chunk order column in #6098.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-03-16 14:19:52 +00:00
Marco Neumann f128539f98
feat: more projection pushdown (#7218)
* feat: proj->proj pushdown

For #6098.

* feat: proj->SortPreservingMergeExec pushdown

For #6098.

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-03-16 08:59:48 +00:00
Marco Neumann 3a31f41c2c
refactor: use arrow schema in `chunks_to_physical_nodes` (#7217)
We don't need a validated IOx schema in this method. This will simplify
some work on #6098.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-03-16 08:45:14 +00:00
Andrew Lamb 6d6fd8f663
feat(flightsql): implement basic `CommandGetCatalogs` support (#7212)
* refactor: reduce redundancy in test

* chore: implement basic get_catalog support

* fix: clippy
2023-03-15 21:52:59 +00:00
Marco Neumann 393de6980e
feat: debug-log errors during chunk extraction (#7223)
Helps debugging while working on #6098 .
2023-03-15 18:55:33 +00:00
Christopher M. Wolff afb571a502
feat: implement gap fill with previous value (#7182)
* feat: implement gap fill with previous value

* test: update fill prev test to include null value

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-03-15 15:54:59 +00:00
Christopher M. Wolff 570c61f9a7
refactor: formalize abstraction for building gap filled columns (#7179)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-03-14 15:14:02 +00:00
Andrew Lamb 0eb858c70d
chore: Update datafusion (#7167)
* chore: Update datafusion

* chore: Update datafusion

* refactor: use UserDefinedLogicalNodeCore

* fix: remove stray comment

* fix: clippy

* chore: Run cargo hakari tasks

---------

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-03-13 16:41:32 +00:00
Christopher M. Wolff ffab683ead
refactor: move trailing_gaps bit into cursor (#7178)
* refactor: push trailing_gap bit into cursor

* chore: clippy

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-03-13 15:40:55 +00:00
Marco Neumann 737ea15d07
feat: projection pushdown phys. optimizer (#7161)
* feat: projection pushdown phys. optimizer

The is by far the largest pass (at least test-wise), because projections
are added last in the naive plan and you have to push them through
everything else. The actual code however isn't that complicated mostly
because we can reuse some DataFusion functionality and the different
variants for the different "child nodes" are very similar.

For #6098.

* feat: projection pushdown for `RecordBatchesExec`

* test: `test_ignore_when_partial_impure_projection_rename`

* test: more dedup projection tests

* test: integration
2023-03-13 12:59:45 +00:00
Marco Neumann 41802b7b5b
feat: `SchemaAdapterStream` may create virtual columns (#7173)
* feat: `SchemaAdapterStream` may create virtual columns

For chunk order handling in #6098.

* fix: improve `SchemaAdapterStream` docs and error handling
2023-03-13 10:02:13 +00:00
Carol (Nichols || Goulding) cc7c44f76a
chore: Upgrade to Rust 1.68 (#7175)
* chore: Upgrade to Rust 1.68

* fix: Remove unnecessary into_iter, thanks Clippy!

* fix: Use the size of the type, not a reference to the type... oops.

Thanks clippy!

* fix: Return block directly instead of creating a variable

Thanks clippy!

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-03-12 13:22:20 +00:00
Stuart Carnie fe48a685ec
refactor: Move InfluxQL behaviour from iox_query to new crate (#7156)
* refactor: Break unnecessary dependencies from `iox_query` crate

In the process, the test code has been simplified.

* refactor: Move InfluxQL plan module to iox_query_influxql crate

* refactor: Move remaining behaviour from iox_query to iox_query_influxql

* chore: rustfmt 🙄

I was under the impression `clippy` would catch formatting
2023-03-08 22:29:20 +00:00
Marco Neumann 309177b750
feat: phys. pred. pushdown to parquet (#7159)
For #6098.
2023-03-08 16:36:27 +00:00
Marco Neumann 3828d2a50e
chore: update DataFusion to `deeaa5632ed99a58b91767261570756db736d158` (#7158)
* chore: update DataFusion to `deeaa5632ed99a58b91767261570756db736d158`

I want to get pull:

- https://github.com/apache/arrow-datafusion/pull/5495

Changes in the IOx code base due to:

- https://github.com/apache/arrow-datafusion/pull/5423
- https://github.com/apache/arrow-datafusion/pull/5421
- https://github.com/apache/arrow-datafusion/pull/5450

* refactor: simplify expression simplifcation

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* refactor: remove upstreamed code

* test: update snapshots

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
2023-03-08 13:05:31 +00:00
Marco Neumann 58dad4cb01
feat: remove all-NULL columns from dedup (#7146)
For #6098.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-03-08 10:04:48 +00:00
Marco Neumann 81388e7ff2
feat: determine cheap de-dup sort order (#7147)
* feat: determine cheap de-dup sort order

For #6098.

* test: `test_three_chunks_different_subsets`

* fix: ensure that columns can be drawn early

* docs: improve algo explaination

* refactor: make code clearer
2023-03-08 09:50:07 +00:00
Christopher M. Wolff ff11fe465d
refactor: convert gap fill exec tests to use insta snapshots (#7154)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-03-08 01:53:15 +00:00
Stuart Carnie 2b74f07fe5
feat: Support `GROUP BY` with tags in raw `SELECT` queries (#7109)
* chore: Normalise name of Call expression to lowercase

Simplifies matching functions in planner, as they are guaranteed to be
lowercase.

This also ensures compatibility with InfluxQL when generating column
alias names, which are reflected in updated tests.

* chore: Ensure aggregate functions fail gracefully.

* feat: GROUP BY tag support

* feat: Ensure schema-level metadata is propagated

Requires: https://github.com/apache/arrow-rs/issues/3779

* chore: Add some tests to validate GROUP BY output

* chore: Add clarifying comment

* chore: Declare message in flight.proto

The metadata is public API, so best practice is to encode this in a way
that is most compatible for clients in other languages, and will also
document the history of schema changes.

Added tests to validate the metadata is encoded correctly.

* chore: Placate linters

* chore: Use correct column in test cases

* chore: Add `is_projected` to the TagKeyColumn message

`is_projected` is necessary to inform a client whether it should include
the tag key is used exclusively for the group key (false) or also
projected in the `SELECT` column list.

* refactor: Move constants to `schema` crate per PR feedback

* chore: rustfmt 🙄

* chore: Update docs for InfluxQlMetadata

Co-authored-by: Andrew Lamb <alamb@influxdata.com>

---------

Co-authored-by: Andrew Lamb <alamb@influxdata.com>
2023-03-07 22:40:23 +00:00
Christopher M. Wolff 3f3a47eae9
feat: add a type to characterize fill strategy (#7150)
* feat: add a type to characterize fill strategy

* chore: clippy and fix comment
2023-03-07 17:11:31 +00:00
Marco Neumann 91471fe568
fix: check schema when calculating sorting for `ParquetExec` (#7136)
When combining sort keys, we have to check the schema of the chunk to
differentiate between "column does not exist within this chunk" and
"column exists but is not sorted".

This is unlikely an issue in prod at the moment (if there is not bug in
the ingester or compactor), but this was found while working on tests
for #6098. Overall this should improve robustness.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-03-07 09:20:31 +00:00
Andrew Lamb ed0704ac8d
chore: Update datafusion (#7100)
* chore: Update datafusion

* chore: iox_query to compile for API changes + update tests

* chore: Run cargo hakari tasks

---------

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
2023-03-06 17:59:24 +00:00
Christopher M. Wolff c15d789613
fix: account for memory in GapFill operator (#7115)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-03-03 16:21:13 +00:00
dependabot[bot] 3256fcc72e
chore(deps): Bump object_store from 0.5.4 to 0.5.5
Bumps [object_store](https://github.com/apache/arrow-rs) from 0.5.4 to 0.5.5.
- [Release notes](https://github.com/apache/arrow-rs/releases)
- [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG-old.md)
- [Commits](https://github.com/apache/arrow-rs/compare/object_store_0.5.4...object_store_0.5.5)

---
updated-dependencies:
- dependency-name: object_store
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-03-03 02:00:51 +00:00
Marco Neumann 999a5dae03
refactor: sort key cleanups (#7113)
* refactor: remove unused `ColumnSort`

* refactor: remove invalid assertion

It is true that time SHOULD be the last sort key, but we absoletely
don't require that, esp. not in the query tier. The ingester will
currently always produce sort keys where time is last, but if we ever
going to deal w/ external data sources like bulk loaded parquet files,
this may not always be the case.

Found while constructing some edge case tests.

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-03-02 16:08:21 +00:00
Marco Neumann 2c4da24f73
feat: sort-related phys. optimizers (#7095)
* feat: `SortPushdown` optimizer

* feat: `RedundantSort` optimizer

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-03-02 14:20:13 +00:00
Andrew Lamb eb488cf55a
chore(compactor2): Improve documentation on split times (#7105)
* chore(compactor2): Improve documentation on split times

* fix: Apply suggestions from code review

Co-authored-by: Joe-Blount <73478756+Joe-Blount@users.noreply.github.com>

* fix: fmt

* fix: typo

---------

Co-authored-by: Joe-Blount <73478756+Joe-Blount@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-03-02 12:00:55 +00:00
dependabot[bot] c538cac4ef
chore(deps): Bump tokio from 1.25.0 to 1.26.0 (#7107)
* chore(deps): Bump tokio from 1.25.0 to 1.26.0

Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.25.0 to 1.26.0.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.25.0...tokio-1.26.0)

---
updated-dependencies:
- dependency-name: tokio
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore: Run cargo hakari tasks

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: Dom <dom@itsallbroken.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-03-02 09:50:39 +00:00
Marco Neumann c95d078e46
feat: add `NestedUnion` opt (#7092)
* docs: typo

* feat: add `NestedUnion` opt

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-03-02 09:09:05 +00:00
Christopher M. Wolff d1a54cf0d4
feat: allow no lower bound gap fill implementation (#7104)
* feat: allow no lower bound gap fill implementation

* chore: clippy

* refactor: code review feedback

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-03-01 23:32:57 +00:00
Marco Neumann 8f11372eac
feat: predicate pushdown phys. optimizer rule (#7083)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-03-01 09:44:57 +00:00
Marco Neumann 7fb562cb01
feat: "collect chunks" phys. optimizer rule (#7086)
* feat: "collect chunks" phys. optimizer rule

Required to clean up the plan a bit after all the dedup split and
removal passes.

For #6098.

* refactor: `collect` -> `combine`

* fix: submodule vis
2023-03-01 09:38:11 +00:00
Marco Neumann b85869778d
fix: `extract_chunks` schema handling (#7085)
I forgot that both `RecordBatchExec` and `ParquetExec` can have schemas
with more columns than the chunks they contain, i.e. both provide null
column creation. When extracting the schema for the chunks within a
plan, the full schemas should be preserved, otherwise the physical
optimizer rules will create invalid plan nodes (i.e. with missing
columns).

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-03-01 09:17:31 +00:00
Christopher M. Wolff 24b9bfacb5
refactor: rewrite gap filling code to be more intuitive and extendable (#7076)
* refactor: rewrite gap filling code to be more intuitive and extendable

* chore: address clippy issue
2023-02-28 22:18:52 +00:00
Marco Neumann 6d8fd37e26
feat: add "split dedup by time" optimizer rule (#7041)
* feat: add "split dedup by time" optimizer rule

For #6098.

* docs: fix typo

Co-authored-by: Andrew Lamb <alamb@influxdata.com>

* feat: add log messages for skipped optimizations

---------

Co-authored-by: Andrew Lamb <alamb@influxdata.com>
2023-02-28 11:29:42 +00:00
Marco Neumann 04f3296d7b
feat: add "remove de-duplication" optimizer pass (#7042)
For #6098.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-02-28 07:57:19 +00:00
Carol (Nichols || Goulding) faae5eb438 chore: Rerun cargo hakari manage-deps 2023-02-27 11:56:15 +01:00
Marco Neumann 8002d34fa2
feat: add "split dedup by partition" optimizer rule (#7020)
* feat: add "split dedup by partition" optimizer rule

- some additional testing infra
- includes config infra for optimizer passes
- not wired up yet since we still use the old plan generation

For #6098.

* refactor: change default and improve docs
2023-02-27 10:27:48 +00:00
Stuart Carnie 2ed5758ddb
feat: InfluxQL planner learns how to project multiple measurements (#7063)
* feat: Planner learns how to project multiple measurements

Closes #6896

* chore: Update test

* chore: PR feedback
2023-02-27 00:50:06 +00:00
Marco Neumann b76b75e911
fix: do not panic for unsupported expressions (#7052)
We see this in one of our prod clusters ATM.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-02-23 17:03:43 +00:00
Marco Neumann 08578cded5
refactor: n_threads and n_target_partitions are non-zero (#7047)
* refactor: n_threads and n_target_partitions are non-zero

Zero values will just panic. Prevent that earlier.

* fix: typo

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

---------

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2023-02-23 16:57:00 +00:00
Christopher M. Wolff 0282eb4750
feat: streaming implementation of gap filling (#7037)
* feat: streaming implementation of gap filling

* chore: cargo fmt

* refactor: when gapfilling, concatenate input batches less frequently

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-02-23 15:42:41 +00:00
Andrew Lamb 7e31b2638d
fix: Understandable compactor2 config report (#7028)
* fix: Understandable compactor2 config report

* fix: do not log postgres dsn

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-02-22 23:43:31 +00:00
Andrew Lamb f93baf7693
chore: Update DataFusion and `arrow` / `arrow-flight` / `parquet` to `33.0.0` (#7045)
* chore: Update DataFusion and arrow/arrow-flight/parquet to 33.0.0

* fix: Update test output

* fix: update more test output

* fix: Update querier test output

* chore: Run cargo hakari tasks

* test: fix formatting

Fix formatting of batch pretty printing.

* test: fix formatting

Fix formatting of batch pretty printing.

* test: fix formatting for selector tests

---------

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: Dom Dwyer <dom@itsallbroken.com>
Co-authored-by: Christopher Wolff <chris.wolff@influxdata.com>
2023-02-22 21:24:20 +00:00
Stuart Carnie 6fb93a7679
refactor: Make InfluxQL planning sync (#7038)
* refactor: Move statement parsing to separate fn

* refactor: Remove async from `InfluxQLToLogicalPlan`

Closes #6607

* chore: Remove async functions and tokio::test

* chore: Remove redundant attribute

* chore: Feedback, switch to dynamic dispatch vs generic implementation
2023-02-22 19:33:49 +00:00
Marco Neumann e9ec213b72
refactor: remove `TaskConfig` param from `chunks_to_physical_nodes` (#7019)
This makes it easier to use it from optimizer passes.

Ref #6098.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-02-22 09:19:59 +00:00
Stuart Carnie 929ac9081e
feat: Rewrite logical expression to match InfluxQL behaviour (#7031)
* chore: Move to inline snapshots

* chore: Container for the DataFusion and IOx schema

* chore: Simplify using logical expression helper functions

* feat: Rewrite conditional expressions using InfluxQL rules

* feat: Add tests to validation conditional expression rewriting

* feat: Rewrite column expressions

* chore: Rewrite expression to use false when possible

This allows the planner to optimise away the entire logical plan to an
empty plan in many cases.

* feat: Complete cast postfix operator support

Added `unsigned` postfix operator, as the feature was mostly complete.

Closes #6895

* chore: Remove redundant attribute
2023-02-21 20:01:31 +00:00
Marco Neumann bda2310ca1
feat: extract chunks from phys. plan (#7018)
* feat: extract chunks from phys. plan

For #6098.

* test: ensure that `extract_chunks` does NOT scan through other nodes
2023-02-17 11:41:39 +00:00
Marco Neumann a8feed120c
test: `chunks_to_physical_nodes` (#7013)
No new actual code but sets up some test infra that I need for #6098.
2023-02-17 09:37:43 +00:00
Andrew Lamb 27890b313f
chore: Update datafusion (#6997)
* chore: Update datafusion

* chore: update the plans

* fix: update some plans

* chore: Update plans and port some explain plans to use insta snapshots

* fix: another plan

* chore: Run cargo hakari tasks

---------

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-02-16 17:03:25 +00:00
Christopher M. Wolff fea5245148
refactor: move GapFillParams to its own module (#7014)
* refactor: move params to own module

* chore: cargo fmt
2023-02-16 16:52:52 +00:00
Stuart Carnie b840ed0ad9
fix: Use `as_expr` vs `col` to avoid splitting identifiers with periods (#7011)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-02-16 11:03:06 +00:00
Marco Neumann 822063b7f2
feat: remember `QueryChunk` for every parquet file (#7000)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-02-16 08:02:13 +00:00
Marco Neumann e41cf080b4
feat: `RecordBatchesExec` remembers chunks (#6999)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-02-16 07:55:35 +00:00
Marco Neumann 67794bccdb
refactor: `group_potential_duplicates` cannot fail (#6998)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-02-15 19:28:02 +00:00
Christopher M. Wolff 7fb052208f
feat: allow gap filling to produce multiple batches (#6986)
* feat: allow gap filling to produce multiple batches

* chore: code review feedback

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-02-14 22:48:29 +00:00