Commit Graph

151 Commits (fdbf9e112e3b03701c7fb89f89399a8fb6e2e763)

Author SHA1 Message Date
Andrew Lamb 9175f4a0b5
chore: Upgrade datafusion to get correct support for multi-part identifiers (#6349)
* test: add tests for periods in measurement names

* chore: Update Datafusion

* chore: Update for changed APIs

* chore: Update expected plan output

* chore: Run cargo hakari tasks

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-12-08 11:27:13 +00:00
Marco Neumann c25afda6cc
fix: `GroupGenerator`/`Converter` panic (#6351)
Do not poll a ready future.
2022-12-08 11:08:21 +00:00
Marco Neumann 080aff8f71
fix: account for memory allocations in InfluxRPC group outputs (#6345)
* fix: account for memory allocations in InfluxRPC group outputs

This should prevent the querier from OOMing.

See https://github.com/influxdata/idpe/issues/16614 .

* docs: improve

Co-authored-by: Andrew Lamb <alamb@influxdata.com>

* refactor: pull out constant

Co-authored-by: Andrew Lamb <alamb@influxdata.com>
2022-12-08 09:55:31 +00:00
dependabot[bot] 1d38d400f0
chore(deps): Bump object_store from 0.5.1 to 0.5.2 (#6339)
* chore(deps): Bump object_store from 0.5.1 to 0.5.2

Bumps [object_store](https://github.com/apache/arrow-rs) from 0.5.1 to 0.5.2.
- [Release notes](https://github.com/apache/arrow-rs/releases)
- [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG-old.md)
- [Commits](https://github.com/apache/arrow-rs/compare/object_store_0.5.1...object_store_0.5.2)

---
updated-dependencies:
- dependency-name: object_store
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore: Run cargo hakari tasks

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-12-06 07:53:54 +00:00
Marco Neumann f62b270852
fix: gRPC errors regarding group cols (#6314)
* fix: gRPC errors regarding group cols

- missing group col prev. produced an "internal error" but should be
  "invalid argument"
- duplicate group cols produced a panic but should also be "invalid
  argument"

* docs: clarify
2022-12-06 07:36:32 +00:00
Marco Neumann cd6a8a1a82
refactor: DF-driven on-demand mem limit instead of ahead-of-time heuristics (#6313)
* refactor: DF-driven on-demand mem limit instead of ahead-of-time heuristics

Closes #6310.

* refactor: rename and tune default exec mem limits

* fix: ingester2 bits after rebase
2022-12-05 12:38:28 +00:00
Marco Neumann 942a6100b5
fix: check schemas in `pretty_print_batches` (#6309)
* fix: check schemas in `pretty_print_batches`

I think most users of this function (and `assert_batches_eq`) assume
that all batches have the same schema. If not, `pretty_print_batches`
may either fail producing an actual table (some rows may have more or
less columns) or silently produce a table that looks "alright".

* fix: equalize schemas where it is required/desired

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-12-02 12:14:16 +00:00
Marco Neumann ec2e72d223
test: simplify test executors (#6312)
Have a single global test executor w/ reasonable defaults. Also don't
require tests to join/await executor shutdowns (most tests forget this
anyways and will get a runtime warning).

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-12-02 11:38:18 +00:00
Marco Neumann ab4f910111
refactor: improve DF error handling (#6311)
This is required to extract "resource exhausted" errors in more cases.
2022-12-02 11:25:30 +00:00
Marco Neumann e2168ae859
refactor: stream-based series-set conversion (#6285)
* refactor: stream-based series-set conversion

Closes #6216.

* docs: improve

Co-authored-by: Andrew Lamb <alamb@influxdata.com>

* refactor: improve algo docs and tests

* test: fix after rebase

* fix: broken `Series` conversion when slices are present

Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-12-01 17:24:36 +00:00
Andrew Lamb d0f1f6a4fd
chore: Upgrade datafusion to get memory limits (#6297)
* chore: Update datafusion

* fix: use correctly qualified column names

* chore: Run cargo hakari tasks

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-12-01 16:40:26 +00:00
Marco Neumann 01315bc063
refactor: bring back "stream-based `SeriesSetConvert::convert` interface (#6282)" (#6301)
This reverts commit 4a8bb871dc.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-12-01 14:27:43 +00:00
Marco Neumann 6cecc439d4
refactor: revert "simplify `SeriesSet` (#6277)" (#6298)
This reverts commit c41200536e.
2022-12-01 13:30:19 +00:00
Marco Neumann 4a8bb871dc refactor: revert stream-based `SeriesSetConvert::convert` interface (#6282)
This reverts commit dad6dee924.
2022-12-01 12:51:56 +01:00
Marco Neumann dad6dee924
refactor: stream-based `SeriesSetConvert::convert` interface (#6282)
Change the interface of `SeriesSetConvert::convert` to be stream-based.
This is the final interface-prep step before actually implementing #6216.
2022-11-30 17:12:54 +00:00
Marco Neumann c41200536e
refactor: simplify `SeriesSet` (#6277)
`RecordBatch` offers zero-copy slicing, so there is no need to store the
row range manually. This makes #6216 simpler.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-30 16:48:31 +00:00
Marco Neumann fa6f7ee926
refactor: stream-based(TM) `to_series_and_groups`, part 3 (#6275)
* refactor: stream-based(TM) `to_series_and_groups`, part 3

* refactor: remove dead code

Co-authored-by: Andrew Lamb <alamb@influxdata.com>

Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-30 13:21:39 +00:00
Marco Neumann 6eb13712c4
refactor: stream-based(TM) `to_series_and_groups`, part 2 (#6265)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-29 15:00:21 +00:00
Marco Neumann 297ea8be55
refactor: make `IOxSessionContext::exec` non-optional (#6266)
`None` was only used for testing and even than we should probably have a
proper executor instead of panicking for some methods.

Found while working on #6216.
2022-11-29 14:52:32 +00:00
Marco Neumann 514aa60f91
refactor: stream-based(TM) `to_series_and_groups`, part 1 (#6261)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-29 14:16:22 +00:00
Andrew Lamb fc5697b8e7
chore: Update datafusion again (N of N) (#6218)
* chore: Update datafusion again (4 of N)

* fix: Update plans

* fix: Update for renamed API

* fix: Update more plans

* chore: Update to datafusion @ d355f69aae2cc951cfd021e5c0b690861ba0c4ac

* fix: update explain plan tests

* fix: update test after schema error

* chore: Update datafusion again

* fix: Add size() calculation to selectors

* chore: Run cargo hakari tasks

* fix: Update newly added test

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-28 17:09:40 +00:00
Christopher M. Wolff aa7a3a7721
fix: ignore fields when considering tag predicates (#6212)
* fix: ignore fields when considering tag predicates

* chore: update test to not use time column in predicate

* chore: update with review feedback

* chore: update tests to avoid fields refs in RPC preds

This is more like what would be coming off the wire from
Influx RPC.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-28 15:16:55 +00:00
Stuart Carnie 2306c383f3
feat: Introduce InfluxQL to Flight (#6166)
* feat: Introduce InfluxQL to Flight

All InfluxQL queries will fail with an error

* chore: Temper protobuf lint

* chore: Finalize flight.proto changes; fix tests

* chore: Add tests for InfluxQL planner

* chore: Update docs

* chore: Update docs

* chore: Rename back to original

* chore: Use .into() rather than cast

* chore: Use function rather than field

* chore: Improved InfluxQL planner name

* chore: Restore `impl Into<String>` argument

* chore: Add a comment that Go clients are unable to execute InfluxQL

* chore: Add a test for the `--lang` argument and InfluxQL
2022-11-23 00:33:49 +00:00
Andrew Lamb 1a1ea74cb7
chore: Upgrade datafusion again (#6160)
* Revert "Revert "chore: Update datafusion again (#6108)""

This reverts commit 766b3bbeb440618cfe332f6ee7d4f8a8217acc48.

* fix: Respect the partition sort key

* chore: update plans

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-22 19:28:26 +00:00
dependabot[bot] a9db7581cd
chore(deps): Bump tokio from 1.21.2 to 1.22.0 (#6183)
Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.21.2 to 1.22.0.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.21.2...tokio-1.22.0)

---
updated-dependencies:
- dependency-name: tokio
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-21 10:21:24 +00:00
Andrew Lamb 4630bbb956
feat: push down all predicates (#6042)
* feat: push down all predicates

* fix: fmt

* fix: fmt

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-18 16:22:01 +00:00
Marco Neumann 71ffc92559
fix: only push safe select expression through de-dup (#6156)
* fix: only push safe select expression through de-dup

Fixes #6066.

* docs: improve

Co-authored-by: Andrew Lamb <alamb@influxdata.com>

* fix: rebase

* test: ensure we do not split ORs

Co-authored-by: Andrew Lamb <alamb@influxdata.com>
2022-11-18 09:56:11 +00:00
Andrew Lamb 67712b595c
Revert "chore: Update datafusion again (#6108)" (#6159)
This reverts commit fbe9f27f10.
2022-11-16 21:14:55 +00:00
Andrew Lamb fbe9f27f10
chore: Update datafusion again (#6108)
* chore: Update datafusion pin + api code

* chore: Run cargo hakari tasks

* refactor: combine_sort_key is more idomatic and add rationale comments

* refactor: satisfy borrow checker and updated comments

* fix: Add test case for combine_sort_key

* fix:  Apply suggestions from code review

Co-authored-by: Marco Neumann <marco@crepererum.net>

* fix: Add back test for deeply nested expression

* fix: Update output ordering

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: Marco Neumann <marco@crepererum.net>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-16 14:41:52 +00:00
Andrew Lamb 20f1ae1c8f
test: tests in the reorg planner and query tests for merging parquet files (#6137)
* test: tests in the reorg planner and query tests for merging parquet files

* fix: use 20 files

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-15 20:29:44 +00:00
dependabot[bot] a969754819
chore(deps): Bump chrono from 0.4.22 to 0.4.23 (#6129)
* chore(deps): Bump chrono from 0.4.22 to 0.4.23

Bumps [chrono](https://github.com/chronotope/chrono) from 0.4.22 to 0.4.23.
- [Release notes](https://github.com/chronotope/chrono/releases)
- [Changelog](https://github.com/chronotope/chrono/blob/main/CHANGELOG.md)
- [Commits](https://github.com/chronotope/chrono/compare/v0.4.22...v0.4.23)

---
updated-dependencies:
- dependency-name: chrono
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* refactor: chrono future compat

Integer->timstamp conversions should not silently panic.

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Marco Neumann <marco@crepererum.net>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-14 13:34:09 +00:00
kodiakhq[bot] 05d7d1495e
Merge branch 'main' into dependabot/cargo/hashbrown-0.13.1 2022-11-11 21:26:40 +00:00
Carol (Nichols || Goulding) 0657ad9600
fix: Rename QueryDatabase to QueryNamespace 2022-11-11 16:14:12 -05:00
Carol (Nichols || Goulding) 621560a0dc
fix: Rename QueryDatabaseMeta to QueryNamespaceMeta 2022-11-11 16:14:12 -05:00
Carol (Nichols || Goulding) bdff4e8848
fix: Consistently use 'namespace' instead of 'database' in comments and other internal text 2022-11-11 15:46:04 -05:00
Jake Goulding cc17e5a54b refactor: use a workspace dependency for hashbrown 2022-11-11 13:25:39 -05:00
dependabot[bot] 5024523f00 chore(deps): Bump hashbrown from 0.12.3 to 0.13.1
Bumps [hashbrown](https://github.com/rust-lang/hashbrown) from 0.12.3 to 0.13.1.
- [Release notes](https://github.com/rust-lang/hashbrown/releases)
- [Changelog](https://github.com/rust-lang/hashbrown/blob/master/CHANGELOG.md)
- [Commits](https://github.com/rust-lang/hashbrown/compare/v0.12.3...v0.13.1)

---
updated-dependencies:
- dependency-name: hashbrown
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-11-11 13:24:56 -05:00
Carol (Nichols || Goulding) abcbe19966
fix: Box some Error type fields to make the Error types small
As found by this new Clippy lint:

<https://rust-lang.github.io/rust-clippy/master/index.html#result_large_err>
2022-11-09 10:54:18 -05:00
Carol (Nichols || Goulding) f33e367904
fix: Use is_none instead of == None, thanks Clippy!
Again seems kinda niche and not a huge deal, but apparently this doesn't
rely on `T: PartialEq` so probably good?

<https://rust-lang.github.io/rust-clippy/master/index.html#partialeq_to_none>
2022-11-09 10:54:18 -05:00
Marco Neumann 1a5fc3d772
test: use `EXPLAIN ANALYZE` for SQL metric tests (#6084)
* test: use `EXPLAIN ANALYZE` for SQL metric tests

Needs a bit more infra (due to normalization), but this seems to be
worth it so we can easily hook up more metrics in the future.

* docs: explain regexes
2022-11-09 09:00:27 +00:00
Marco Neumann 903f7bafa7
refactor: expose `ParquetExec` directly to DataFusion phys. plan (#6072)
* refactor: expose `ParquetExec` directly to DataFusion phys. plan

Closes #5897.

* fix: update tracing tests

* refactor: use `EmptyExec`

* refactor: use `target_partitions`

* refactor: improve UUID normalization in query tests

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
2022-11-08 12:19:28 +00:00
Andrew Lamb 034d9b371d
chore: Update datafusion and arrow/arrow-flight/parquet to `26.0.0` (#6061)
* chore: Update datafusion and arrow/arrow-flight/parquet to `26.0.0`

* fix: Update query_functions

* fix: update for TimestampNanosecondArray API changes

* fix: update for TimestampNanosecondArray API changes

* chore: Update flatbuffers and remove rustsec warning

* chore: Update text

* fix: update more test

* fix: Lock ahash to exactly 0.8.0

* fix: Update datafusion pin

* chore: Run cargo hakari tasks

Co-authored-by: Carol (Nichols || Goulding) <carol.nichols@gmail.com>
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-07 11:01:58 +00:00
Marco Neumann f511db380c
refactor: remove table name from chunks (#6063)
It should be always clear from the context to which table a chunk
belongs.

I think having a table name bound to a chunk goes back to a time where
chunks had multiple tables.

Helps with #6049.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-07 10:42:57 +00:00
Andrew Lamb b149dc541a
chore: Export metrics for parquet access to `EXPLAIN ANALYZE` (#6043)
* chore: Export metrics for parquet access

* fix: make parquet_execs non pub
2022-11-03 11:39:16 +00:00
Andrew Lamb 4fb2843d05
refactor: Rename `schema::selection::Selection` to `schema::projection::Projection` (#6037)
* chore: Rename `schema::selection::Selection` to `schema::projection::Projection`

* fix: docs

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-02 18:15:04 +00:00
Andrew Lamb 58838e214e
feat: enable parquet predicate pushdown in IOx (#5930) 2022-11-02 18:00:47 +00:00
Marco Neumann 45b3984aa3
refactor: simplify `QueryChunk` data access (#6015)
* refactor: simplify `QueryChunk` data access

We have only two types for chunks (now that the RUB is gone):

1. In-memory RecordBatches
2. Parquet files

Loads of logic is duplicated in the different `read_filter`
implementations. Also `read_filter` hides a solid amount of logic from
DataFusion, which will prevent certain (future) optimizations. To enable #5897
and to simplify the interface, let the chunks return the data (batches
or metadata for parquet files) directly and let `iox_query` perform the
actual heavy-lifting.

* docs: improve

Co-authored-by: Andrew Lamb <alamb@influxdata.com>

* docs: improve

Co-authored-by: Andrew Lamb <alamb@influxdata.com>

Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-02 08:18:33 +00:00
Andrew Lamb 9c1f0a3644
refactor: move SessionConfig creation into datafusion_utils (#6011)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-31 20:04:49 +00:00
Marco Neumann 072439e428
refactor: mandatory `QueryChunkMeta::summary` (#5997)
With #5963 merged, all chunks now provide a summary (even though it may
not contain data for all columns). So let's make it mandatory, which
also removes a few 🙈-style `.except(...)` calls.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-31 16:38:02 +00:00
Andrew Lamb ace3c11f12
chore: Update datafusion (#6004)
* chore: Update datafusion

* chore: change path

* chore: Run cargo hakari tasks

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-31 16:16:28 +00:00