Commit Graph

366 Commits (65230040083eeca3d8a91905567e7341bd671103)

Author SHA1 Message Date
Andrew Lamb 967aef0e9d
chore: Update datafusion (#8515)
* chore: Update datafusion

* fix: update for API

* fix: Verify unsupported statements, with tests

* fix: update tests

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-08-21 17:49:21 +00:00
Andrew Lamb 0a0ef66a05
chore: Make it clear there is only a single DataFusion memory pool (#8501)
* chore: Make it clear there is only a single DataFusion memory pool

* fix: assert there is a single bridge

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-08-17 12:35:31 +00:00
Marco Neumann 535ff5f0c8
refactor: extract InfluxRPC-specific code to `iox_query_influxrpc` (part 1) (#8508)
* refactor: replace test usage of `Predicate`

* refactor: remove dead code

* refactor: decouple recorg planning from InfluxRPC planning

* refactor: move InfluxRPC-specific scan plan construction

* refactor: move InfluxRPC-specific "missing columns" handling to `iox_query_influxrpc`

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-08-17 11:59:04 +00:00
Marco Neumann 3612b1c482
refactor: use DF `Expr` instead of `Predicate` for chunk pruning (#8500)
`Predicate` is InfluxRPC specific and contains way more than just
filter expression.

Ref #8097.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-08-17 08:18:45 +00:00
dependabot[bot] 7094189004
chore(deps): Bump tokio from 1.31.0 to 1.32.0 (#8507)
Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.31.0 to 1.32.0.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.31.0...tokio-1.32.0)

---
updated-dependencies:
- dependency-name: tokio
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-08-17 08:06:29 +00:00
Marco Neumann 39a08fab69
feat: expose DataFusion mem pool metrics (#8492)
Closes #8466.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-08-15 15:19:11 +00:00
Marco Neumann ad4068bbea
refactor: decouple `QueryNamespace` from synchronous schema interface (`QueryNamespaceMeta`) (#8472)
* refactor: remove unused impl

* refactor: inline `ExecutionContextProvider` into `QueryNamespace`

* refactor: use global `DEFAULT_SCHEMA`

* refactor: decouple `QueryNamespace` from `QueryNamespaceMeta`

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-08-14 08:09:08 +00:00
dependabot[bot] 34b8585931
chore(deps): Bump tokio from 1.30.0 to 1.31.0 (#8482)
Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.30.0 to 1.31.0.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.30.0...tokio-1.31.0)

---
updated-dependencies:
- dependency-name: tokio
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-08-14 06:32:34 +00:00
Marco Neumann 9358ec74db
refactor: remove `Predicate` usage from `QueryNamespace` (#8468)
For #8097.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-08-10 16:32:55 +00:00
Andrew Lamb 232eee059f
chore: Update DataFusion (#8460)
* chore: Update DataFusion

* chore: update for API changes

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-08-10 14:54:52 +00:00
Marco Neumann 71e6b66476
refactor: replace `Predicate` w/ `&[Expr]` in querier internals (#8465)
* refactor: replace `Predicate` w/ `&[Expr]` in querier internals

First step towards #8097. This replaces most internal usages of
`Predicate` with the more appropriate `&[Expr]` within the querier code.

This is also triggered by #8443 because the new ingester protocol shall
not use `Predicate` anymore.

Note that the querier still uses `Predicate` for a few interfaces. These
will be fixed later:

- the current ingester RPC version
- chunk pruning
- `QuerierNamespace::chunks`

* fix: docs
2023-08-10 13:00:43 +00:00
dependabot[bot] 3675043585
chore(deps): Bump tokio from 1.29.1 to 1.30.0 (#8464)
Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.29.1 to 1.30.0.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.29.1...tokio-1.30.0)

---
updated-dependencies:
- dependency-name: tokio
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-08-10 07:50:18 +00:00
Christopher M. Wolff 3d972561d5
feat: push sort through union (#8423)
* feat: push sort through union

* test: add a few more tests based on review feedback

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-08-08 15:00:34 +00:00
Nga Tran a79cddf942
chore: remove dead code compute_sort_key_for_chunks (#8399) 2023-08-02 20:45:38 +00:00
Carol (Nichols || Goulding) 92ae8e4084
refactor: Extract a convenience constructor for Deterministic transition ids 2023-08-02 10:17:23 -04:00
Carol (Nichols || Goulding) fd147f871b
fix: Have QueryChunk return a TransitionPartitionId
Thus using the PartitionHashId if one is available. This does not
compile yet because of all the uses of QueryChunk, but iox_query
compiles and passes its tests.
2023-08-02 10:17:22 -04:00
Carol (Nichols || Goulding) e4b9455344
feat: Have QueryChunk return a reference from partition_id() 2023-08-02 10:17:22 -04:00
Andrew Lamb de79619e71
chore: Update datafusion (#8355)
* chore: Update datafusion pin

* fix: Update for change in API

* chore: Update plan

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-07-31 15:41:00 +00:00
Joe-Blount 1bed99567c
chore: add DF metrics to compaction spans (#8270)
* chore: add DF metrics to compaction spans

* chore: update string for test verification

* chore: update comment

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-07-20 15:00:22 +00:00
Marco Neumann 0173c50ba1
fix: use correct error code when querier is shutting down (#8282)
When a long running query is in process and the querier is shutting
down, it might happen that the executor (= thread pool and tokio
executor responsible for the CPU-bound DataFusion execution) is shut
down while the query is running. From a "systems interaction" PoV I
think this is totally fine and I would like to avoid some weird
ref-counting. Or in other words: if the system is shutting down, shut it
down.

However the error was treated as "internal" which is not useful. The
client should rather be informed that its server was gone and that it is
OK (and desired) to retry. So as per
<https://grpc.github.io/grpc/core/md_doc_statuscodes.html> I think this
should signal "unavailable".

This change wires the error code in such a way that the gRPC service
layer can properly inspect it and then changes the error mapping.

Ref https://github.com/influxdata/idpe/issues/17917 .

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-07-20 12:08:22 +00:00
Christopher M. Wolff 33e41fc5cb
fix: improve error for malformed gap fill query (#8252)
* fix: improve error for malformed gap fill query

* fix: code review feedback
2023-07-17 21:20:34 +00:00
Christopher M. Wolff b916a89159
fix: recurse through SubqueryAlias when finding gap fill time range (#8249) 2023-07-17 19:39:30 +00:00
Carol (Nichols || Goulding) a9b788b58f
feat: Collate chunks based on their partition hash id if they have it 2023-07-17 10:34:01 -04:00
Carol (Nichols || Goulding) 313baca8b6
fix: Use sort_by rather than sort_by_key to use references
These places are sorting by `PartitionId` currently, which implements
`Copy`, but are about to be changed to be sorted on `PartitionHashId`,
which does not implement `Copy`.
2023-07-17 09:56:55 -04:00
Carol (Nichols || Goulding) 10a0f8e3bf
fix: Remove ::default() when constructing unit structs
As recommended by https://rust-lang.github.io/rust-clippy/master/index.html#default_constructed_unit_structs
2023-07-14 10:50:55 -04:00
Dom Dwyer 7f7d1f2ee7
fix(ingester): projection without time column
The ingester can project arbitrary columns at query time, and has no
special requirement that the "time" column be part of that projection.

Because the timestamp summary generation explicitly requires the time
column to exist, it panics when there's no "time" column in the
projection - this is a bit of a modelling mismatch more than anything.
2023-07-13 14:22:48 +02:00
Andrew Lamb b24f9c81ba
chore: Update DataFusion pin, updates for API changed (#8199) 2023-07-11 13:36:38 +00:00
Andrew Lamb 3ce11d8d66
chore: Update DataFusion (#8190)
* chore: Update DataFusion

* chore: Run cargo hakari tasks

* fix: Update for API changes

* fix: use display format

* chore: Update explain plan output

* fix: update plans

---------

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-07-10 09:54:50 +00:00
Marco Neumann 0bcf85d48c
refactor: de-dup code 2023-07-03 17:24:59 +02:00
Carol (Nichols || Goulding) 8ebf390d9c
feat: Try to prune ingester partitions by partition key
This is hacktastic.
2023-07-03 17:24:58 +02:00
Carol (Nichols || Goulding) b76fdab1a4
refactor: Move querier::df_stats to iox_query::chunk_statistics so it can be shared with ingester 2023-07-03 17:24:55 +02:00
Marco Neumann ce6a2fb613
refactor: remove `QueryChunk::column_values` (#8111)
Similar to #8109.

This was once implemented by the RUB but as it stands right now, no
chunk implements this anymore.

If we ever want to bring this back, we should use the output of
`QueryChunk::data` instead (i.e. use a data-based implementation instead
of a per-chunk one).

Closes #8096.
2023-07-03 09:03:21 +00:00
Andrew Lamb 4a1f8db254
chore: Update datafusion + arrow/arrow-flight/parquet to patched version `42.0.0` (#8113)
* Revert "Revert "chore: Update datafusion + arrow/arrow-flight/parquet to version `42.0.0` (#8036)" (#8049)"

This reverts commit fb0674fc01.

* chore: Update Cargo and hakari

* chore: Update to patched version

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-30 12:59:31 +00:00
Marco Neumann b982ee180e
refactor: remove `QueryChunk::column_names` (#8109)
This interface was once specially implemented by the RUB. The only
actual implementation of it is within the querier that just forwards it
to a simple schema scan. Lift this semantic to `iox_query_influxrpc`
instead so all the chunks can use it.

If we ever want to optimize this again, we should use `QueryChunk::data`
instead (i.e. instead of implementing it within the chunk it should use
the data method and do something smart based on that).

First half of #8096.
2023-06-29 13:43:10 +00:00
Marco Neumann ca31c1eade
feat: hook up tokio metrics (#8050)
* feat: metrics for main tokio runtime

* feat: instrument executor tokio runtime

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-29 11:11:44 +00:00
Marco Neumann dcb4a9bb5c
refactor: fuse `QueryChunk` and `QueryChunkMeta` (#8107)
Closes #8095.
2023-06-29 11:02:48 +00:00
Marco Neumann 4638b89d93
refactor: migrate retention to proper predicates (#8092)
Do not (ab)use per-chunk delete predicates for the retention policy.
Instead use a per-table predicate.

This makes the code way cleaner, since the scoping is correct (i.e.
delete predicates are a table-wide attribute, not a chunk-based one) and
it is consistent time predicates that the user providers (e.g. via
`WHERE time > x`).

It also allows us to remove delete predicates (in their current,
non-scalable form) from the query path. A potential future version would
likely not use per chunk predicates (and "is processed" markers) but use
the timestamp / chunk order to determine to which data the predicate
should be applied.

Note that the lowering of the retention policy changed slightly from

```text
(time > (now() - retention)) AND (time < MAX)
```

to

```text
time > (now() - retention)
```

Since the `MAX` cut is just an artifact of the lowering and was unnecessary.

Closes #7409.
Closes #7410.
2023-06-29 08:36:37 +00:00
dependabot[bot] b15c6062a9
chore(deps): Bump tokio from 1.28.2 to 1.29.0 (#8100)
Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.28.2 to 1.29.0.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.28.2...tokio-1.29.0)

---
updated-dependencies:
- dependency-name: tokio
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-06-28 13:18:08 +00:00
dependabot[bot] 990044dcb2
chore(deps): Bump indexmap from 1.9.3 to 2.0.0 (#8073)
* chore(deps): Bump indexmap from 1.9.3 to 2.0.0

Bumps [indexmap](https://github.com/bluss/indexmap) from 1.9.3 to 2.0.0.
- [Changelog](https://github.com/bluss/indexmap/blob/master/RELEASES.md)
- [Commits](https://github.com/bluss/indexmap/compare/1.9.3...2.0.0)

---
updated-dependencies:
- dependency-name: indexmap
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore: Run cargo hakari tasks

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-26 08:52:51 +00:00
Marco Neumann 7322f238fb
docs: query processing (#8033)
* docs: query processing

Closes https://github.com/influxdata/idpe/issues/17770 .

* docs: apply recommendations

Co-authored-by: Stuart Carnie <stuart.carnie@gmail.com>
Co-authored-by: Andrew Lamb <alamb@influxdata.com>

* docs: improve description of the flight protocol

* docs: link `LogicalPlan`

* docs: link `ExecutionPlan`

* docs: improve wording

* docs: improve query planning docs

---------

Co-authored-by: Stuart Carnie <stuart.carnie@gmail.com>
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
2023-06-23 09:13:14 +00:00
dependabot[bot] 74a48a8f63
chore(deps): Bump itertools from 0.10.5 to 0.11.0 (#8060)
* chore(deps): Bump itertools from 0.10.5 to 0.11.0

Bumps [itertools](https://github.com/rust-itertools/itertools) from 0.10.5 to 0.11.0.
- [Changelog](https://github.com/rust-itertools/itertools/blob/master/CHANGELOG.md)
- [Commits](https://github.com/rust-itertools/itertools/compare/v0.10.5...v0.11.0)

---
updated-dependencies:
- dependency-name: itertools
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore: Run cargo hakari tasks

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-23 08:11:56 +00:00
Andrew Lamb fb0674fc01
Revert "chore: Update datafusion + arrow/arrow-flight/parquet to version `42.0.0` (#8036)" (#8049)
This reverts commit 70ffedadc7.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-22 11:03:25 +00:00
Andrew Lamb 70ffedadc7
chore: Update datafusion + arrow/arrow-flight/parquet to version `42.0.0` (#8036)
* chore: Update datafusion + arrow/arrow-flight/parquet to version `42.0.0`

* chore: Update for new APIs

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-21 16:11:36 +00:00
Stuart Carnie e10b8c93c8
chore: Update DataFusion and other dependencies (#8014)
* chore: Update DataFusion pin

* chore: Update API changes

* chore: Don't use deprecated API

* chore: Run cargo hakari tasks

* chore: Update tests due to changes in logical plan nodes from DF update

* chore: Fix broken links in docs

* chore: Adjust changes to expected output

---------

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
2023-06-16 10:39:36 +00:00
Andrew Lamb 5889c96501
chore: Update `datafusion` and other dependencies (#7981)
* chore: Update DatFaFusion pin

* chore: Update other dependencies

* chore: Update hakari

* fix: Update for API changes

* fix: Update explain plan

* fix: Update influxql plans

* fix: rustdoc links

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-16 09:48:55 +00:00
Marco Neumann 8ef1b64f6a
fix: remove de-dup even if we have many partitions (#8004)
See optimizer pipeline here:

5d0bb68c5b/iox_query/src/physical_optimizer/mod.rs (L33-L35)

After generating the naive initial plan w/ many partitions, we must
consider more than 100 partitions to split the key space.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-15 15:25:25 +00:00
Andrew Lamb 17c0d837b3
chore: Update DataFusion, arrow, object_store pins (#7942)
* chore: Update DataFusion, arrow, object_store pins

* chore: Update for hakari

* chore: Update for new APIs

* fix: update test

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-07 17:08:31 +00:00
Andrew Lamb f571aeb445
chore: Update DataFusion pin (#7916)
* chore: Update DataFusion pin

* chore: Update cargo

* fix: update for API changes

* fix: Update plans

* chore: Update for new api

* fix: Update plans

* chore: Update for API changes more

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-05 18:38:59 +00:00
Stuart Carnie da682d8c53
chore: clippy 🧠 2023-06-04 07:23:11 +10:00
Marco Neumann fa5011197c
refactor: migrate `iox_query` to use DataFusion statistics (#7908)
This is the major part of #7470. Additional clean ups (e.g. to remove
the actual types from `data_types`) will follow.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-02 09:18:59 +00:00