Commit Graph

147 Commits (f62b27085275ea9713f3787c8b031bab7c3cbdb5)

Author SHA1 Message Date
Marco Neumann f62b270852
fix: gRPC errors regarding group cols (#6314)
* fix: gRPC errors regarding group cols

- missing group col prev. produced an "internal error" but should be
  "invalid argument"
- duplicate group cols produced a panic but should also be "invalid
  argument"

* docs: clarify
2022-12-06 07:36:32 +00:00
Marco Neumann cd6a8a1a82
refactor: DF-driven on-demand mem limit instead of ahead-of-time heuristics (#6313)
* refactor: DF-driven on-demand mem limit instead of ahead-of-time heuristics

Closes #6310.

* refactor: rename and tune default exec mem limits

* fix: ingester2 bits after rebase
2022-12-05 12:38:28 +00:00
Marco Neumann 942a6100b5
fix: check schemas in `pretty_print_batches` (#6309)
* fix: check schemas in `pretty_print_batches`

I think most users of this function (and `assert_batches_eq`) assume
that all batches have the same schema. If not, `pretty_print_batches`
may either fail producing an actual table (some rows may have more or
less columns) or silently produce a table that looks "alright".

* fix: equalize schemas where it is required/desired

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-12-02 12:14:16 +00:00
Marco Neumann ec2e72d223
test: simplify test executors (#6312)
Have a single global test executor w/ reasonable defaults. Also don't
require tests to join/await executor shutdowns (most tests forget this
anyways and will get a runtime warning).

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-12-02 11:38:18 +00:00
Marco Neumann ab4f910111
refactor: improve DF error handling (#6311)
This is required to extract "resource exhausted" errors in more cases.
2022-12-02 11:25:30 +00:00
Marco Neumann e2168ae859
refactor: stream-based series-set conversion (#6285)
* refactor: stream-based series-set conversion

Closes #6216.

* docs: improve

Co-authored-by: Andrew Lamb <alamb@influxdata.com>

* refactor: improve algo docs and tests

* test: fix after rebase

* fix: broken `Series` conversion when slices are present

Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-12-01 17:24:36 +00:00
Andrew Lamb d0f1f6a4fd
chore: Upgrade datafusion to get memory limits (#6297)
* chore: Update datafusion

* fix: use correctly qualified column names

* chore: Run cargo hakari tasks

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-12-01 16:40:26 +00:00
Marco Neumann 01315bc063
refactor: bring back "stream-based `SeriesSetConvert::convert` interface (#6282)" (#6301)
This reverts commit 4a8bb871dc.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-12-01 14:27:43 +00:00
Marco Neumann 6cecc439d4
refactor: revert "simplify `SeriesSet` (#6277)" (#6298)
This reverts commit c41200536e.
2022-12-01 13:30:19 +00:00
Marco Neumann 4a8bb871dc refactor: revert stream-based `SeriesSetConvert::convert` interface (#6282)
This reverts commit dad6dee924.
2022-12-01 12:51:56 +01:00
Marco Neumann dad6dee924
refactor: stream-based `SeriesSetConvert::convert` interface (#6282)
Change the interface of `SeriesSetConvert::convert` to be stream-based.
This is the final interface-prep step before actually implementing #6216.
2022-11-30 17:12:54 +00:00
Marco Neumann c41200536e
refactor: simplify `SeriesSet` (#6277)
`RecordBatch` offers zero-copy slicing, so there is no need to store the
row range manually. This makes #6216 simpler.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-30 16:48:31 +00:00
Marco Neumann fa6f7ee926
refactor: stream-based(TM) `to_series_and_groups`, part 3 (#6275)
* refactor: stream-based(TM) `to_series_and_groups`, part 3

* refactor: remove dead code

Co-authored-by: Andrew Lamb <alamb@influxdata.com>

Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-30 13:21:39 +00:00
Marco Neumann 6eb13712c4
refactor: stream-based(TM) `to_series_and_groups`, part 2 (#6265)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-29 15:00:21 +00:00
Marco Neumann 297ea8be55
refactor: make `IOxSessionContext::exec` non-optional (#6266)
`None` was only used for testing and even than we should probably have a
proper executor instead of panicking for some methods.

Found while working on #6216.
2022-11-29 14:52:32 +00:00
Marco Neumann 514aa60f91
refactor: stream-based(TM) `to_series_and_groups`, part 1 (#6261)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-29 14:16:22 +00:00
Andrew Lamb fc5697b8e7
chore: Update datafusion again (N of N) (#6218)
* chore: Update datafusion again (4 of N)

* fix: Update plans

* fix: Update for renamed API

* fix: Update more plans

* chore: Update to datafusion @ d355f69aae2cc951cfd021e5c0b690861ba0c4ac

* fix: update explain plan tests

* fix: update test after schema error

* chore: Update datafusion again

* fix: Add size() calculation to selectors

* chore: Run cargo hakari tasks

* fix: Update newly added test

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-28 17:09:40 +00:00
Christopher M. Wolff aa7a3a7721
fix: ignore fields when considering tag predicates (#6212)
* fix: ignore fields when considering tag predicates

* chore: update test to not use time column in predicate

* chore: update with review feedback

* chore: update tests to avoid fields refs in RPC preds

This is more like what would be coming off the wire from
Influx RPC.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-28 15:16:55 +00:00
Stuart Carnie 2306c383f3
feat: Introduce InfluxQL to Flight (#6166)
* feat: Introduce InfluxQL to Flight

All InfluxQL queries will fail with an error

* chore: Temper protobuf lint

* chore: Finalize flight.proto changes; fix tests

* chore: Add tests for InfluxQL planner

* chore: Update docs

* chore: Update docs

* chore: Rename back to original

* chore: Use .into() rather than cast

* chore: Use function rather than field

* chore: Improved InfluxQL planner name

* chore: Restore `impl Into<String>` argument

* chore: Add a comment that Go clients are unable to execute InfluxQL

* chore: Add a test for the `--lang` argument and InfluxQL
2022-11-23 00:33:49 +00:00
Andrew Lamb 1a1ea74cb7
chore: Upgrade datafusion again (#6160)
* Revert "Revert "chore: Update datafusion again (#6108)""

This reverts commit 766b3bbeb440618cfe332f6ee7d4f8a8217acc48.

* fix: Respect the partition sort key

* chore: update plans

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-22 19:28:26 +00:00
dependabot[bot] a9db7581cd
chore(deps): Bump tokio from 1.21.2 to 1.22.0 (#6183)
Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.21.2 to 1.22.0.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.21.2...tokio-1.22.0)

---
updated-dependencies:
- dependency-name: tokio
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-21 10:21:24 +00:00
Andrew Lamb 4630bbb956
feat: push down all predicates (#6042)
* feat: push down all predicates

* fix: fmt

* fix: fmt

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-18 16:22:01 +00:00
Marco Neumann 71ffc92559
fix: only push safe select expression through de-dup (#6156)
* fix: only push safe select expression through de-dup

Fixes #6066.

* docs: improve

Co-authored-by: Andrew Lamb <alamb@influxdata.com>

* fix: rebase

* test: ensure we do not split ORs

Co-authored-by: Andrew Lamb <alamb@influxdata.com>
2022-11-18 09:56:11 +00:00
Andrew Lamb 67712b595c
Revert "chore: Update datafusion again (#6108)" (#6159)
This reverts commit fbe9f27f10.
2022-11-16 21:14:55 +00:00
Andrew Lamb fbe9f27f10
chore: Update datafusion again (#6108)
* chore: Update datafusion pin + api code

* chore: Run cargo hakari tasks

* refactor: combine_sort_key is more idomatic and add rationale comments

* refactor: satisfy borrow checker and updated comments

* fix: Add test case for combine_sort_key

* fix:  Apply suggestions from code review

Co-authored-by: Marco Neumann <marco@crepererum.net>

* fix: Add back test for deeply nested expression

* fix: Update output ordering

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: Marco Neumann <marco@crepererum.net>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-16 14:41:52 +00:00
Andrew Lamb 20f1ae1c8f
test: tests in the reorg planner and query tests for merging parquet files (#6137)
* test: tests in the reorg planner and query tests for merging parquet files

* fix: use 20 files

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-15 20:29:44 +00:00
dependabot[bot] a969754819
chore(deps): Bump chrono from 0.4.22 to 0.4.23 (#6129)
* chore(deps): Bump chrono from 0.4.22 to 0.4.23

Bumps [chrono](https://github.com/chronotope/chrono) from 0.4.22 to 0.4.23.
- [Release notes](https://github.com/chronotope/chrono/releases)
- [Changelog](https://github.com/chronotope/chrono/blob/main/CHANGELOG.md)
- [Commits](https://github.com/chronotope/chrono/compare/v0.4.22...v0.4.23)

---
updated-dependencies:
- dependency-name: chrono
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* refactor: chrono future compat

Integer->timstamp conversions should not silently panic.

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Marco Neumann <marco@crepererum.net>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-14 13:34:09 +00:00
kodiakhq[bot] 05d7d1495e
Merge branch 'main' into dependabot/cargo/hashbrown-0.13.1 2022-11-11 21:26:40 +00:00
Carol (Nichols || Goulding) 0657ad9600
fix: Rename QueryDatabase to QueryNamespace 2022-11-11 16:14:12 -05:00
Carol (Nichols || Goulding) 621560a0dc
fix: Rename QueryDatabaseMeta to QueryNamespaceMeta 2022-11-11 16:14:12 -05:00
Carol (Nichols || Goulding) bdff4e8848
fix: Consistently use 'namespace' instead of 'database' in comments and other internal text 2022-11-11 15:46:04 -05:00
Jake Goulding cc17e5a54b refactor: use a workspace dependency for hashbrown 2022-11-11 13:25:39 -05:00
dependabot[bot] 5024523f00 chore(deps): Bump hashbrown from 0.12.3 to 0.13.1
Bumps [hashbrown](https://github.com/rust-lang/hashbrown) from 0.12.3 to 0.13.1.
- [Release notes](https://github.com/rust-lang/hashbrown/releases)
- [Changelog](https://github.com/rust-lang/hashbrown/blob/master/CHANGELOG.md)
- [Commits](https://github.com/rust-lang/hashbrown/compare/v0.12.3...v0.13.1)

---
updated-dependencies:
- dependency-name: hashbrown
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-11-11 13:24:56 -05:00
Carol (Nichols || Goulding) abcbe19966
fix: Box some Error type fields to make the Error types small
As found by this new Clippy lint:

<https://rust-lang.github.io/rust-clippy/master/index.html#result_large_err>
2022-11-09 10:54:18 -05:00
Carol (Nichols || Goulding) f33e367904
fix: Use is_none instead of == None, thanks Clippy!
Again seems kinda niche and not a huge deal, but apparently this doesn't
rely on `T: PartialEq` so probably good?

<https://rust-lang.github.io/rust-clippy/master/index.html#partialeq_to_none>
2022-11-09 10:54:18 -05:00
Marco Neumann 1a5fc3d772
test: use `EXPLAIN ANALYZE` for SQL metric tests (#6084)
* test: use `EXPLAIN ANALYZE` for SQL metric tests

Needs a bit more infra (due to normalization), but this seems to be
worth it so we can easily hook up more metrics in the future.

* docs: explain regexes
2022-11-09 09:00:27 +00:00
Marco Neumann 903f7bafa7
refactor: expose `ParquetExec` directly to DataFusion phys. plan (#6072)
* refactor: expose `ParquetExec` directly to DataFusion phys. plan

Closes #5897.

* fix: update tracing tests

* refactor: use `EmptyExec`

* refactor: use `target_partitions`

* refactor: improve UUID normalization in query tests

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
2022-11-08 12:19:28 +00:00
Andrew Lamb 034d9b371d
chore: Update datafusion and arrow/arrow-flight/parquet to `26.0.0` (#6061)
* chore: Update datafusion and arrow/arrow-flight/parquet to `26.0.0`

* fix: Update query_functions

* fix: update for TimestampNanosecondArray API changes

* fix: update for TimestampNanosecondArray API changes

* chore: Update flatbuffers and remove rustsec warning

* chore: Update text

* fix: update more test

* fix: Lock ahash to exactly 0.8.0

* fix: Update datafusion pin

* chore: Run cargo hakari tasks

Co-authored-by: Carol (Nichols || Goulding) <carol.nichols@gmail.com>
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-07 11:01:58 +00:00
Marco Neumann f511db380c
refactor: remove table name from chunks (#6063)
It should be always clear from the context to which table a chunk
belongs.

I think having a table name bound to a chunk goes back to a time where
chunks had multiple tables.

Helps with #6049.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-07 10:42:57 +00:00
Andrew Lamb b149dc541a
chore: Export metrics for parquet access to `EXPLAIN ANALYZE` (#6043)
* chore: Export metrics for parquet access

* fix: make parquet_execs non pub
2022-11-03 11:39:16 +00:00
Andrew Lamb 4fb2843d05
refactor: Rename `schema::selection::Selection` to `schema::projection::Projection` (#6037)
* chore: Rename `schema::selection::Selection` to `schema::projection::Projection`

* fix: docs

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-02 18:15:04 +00:00
Andrew Lamb 58838e214e
feat: enable parquet predicate pushdown in IOx (#5930) 2022-11-02 18:00:47 +00:00
Marco Neumann 45b3984aa3
refactor: simplify `QueryChunk` data access (#6015)
* refactor: simplify `QueryChunk` data access

We have only two types for chunks (now that the RUB is gone):

1. In-memory RecordBatches
2. Parquet files

Loads of logic is duplicated in the different `read_filter`
implementations. Also `read_filter` hides a solid amount of logic from
DataFusion, which will prevent certain (future) optimizations. To enable #5897
and to simplify the interface, let the chunks return the data (batches
or metadata for parquet files) directly and let `iox_query` perform the
actual heavy-lifting.

* docs: improve

Co-authored-by: Andrew Lamb <alamb@influxdata.com>

* docs: improve

Co-authored-by: Andrew Lamb <alamb@influxdata.com>

Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-02 08:18:33 +00:00
Andrew Lamb 9c1f0a3644
refactor: move SessionConfig creation into datafusion_utils (#6011)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-31 20:04:49 +00:00
Marco Neumann 072439e428
refactor: mandatory `QueryChunkMeta::summary` (#5997)
With #5963 merged, all chunks now provide a summary (even though it may
not contain data for all columns). So let's make it mandatory, which
also removes a few 🙈-style `.except(...)` calls.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-31 16:38:02 +00:00
Andrew Lamb ace3c11f12
chore: Update datafusion (#6004)
* chore: Update datafusion

* chore: change path

* chore: Run cargo hakari tasks

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-31 16:16:28 +00:00
Marco Neumann 8447d46093
refactor: remove `QueryChunkMeta::timestamp_min_max` (#5963)
Use the table summary instead. This allows us to have a single mechanism
that both IOx and DataFusion understand. This basically lifts the "basic
table summary" mechanism that the querier uses to `iox_query` and let
the compactor and ingester use the same mechanism.

While not strictly necessary, simplifying the `QueryChunk[Meta]`
interface helps with #5897.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-28 10:29:16 +00:00
Andrew Lamb a0c0ae91ec
refactor: Simplify manipulations of BooleanArray (#5992)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-28 09:59:18 +00:00
Dom Dwyer 678fb81892 refactor(ingester): use partition buffer FSM
This commit makes use of the partition buffer state machine introduced
in https://github.com/influxdata/influxdb_iox/pull/5943.

This commit significantly changes the buffering, and querying, of data
from a partition, swapping out the existing "DataBuffer" for the new
state machine implementation (itself simplified due to temporary lack of
incremental snapshot generation, see #5944).

This commit simplifies the query path, removing multiple types that
wrapped one-another to pass around various state necessary to perform a
query, with various query functions needing different types or
combinations of types. The query path now operates using a single type
(named "QueryAdaptor") that provides a queryable interface over the set
of RecordBatch returned from a partition.

There is significantly increased testing of the PartitionData itself,
covering data in various states and the ordering of returned RecordBatch
(to ensure correct materialisation of updates). There are also
invariants upheld by the type system / compiler to minimise the
complexities of working with empty batches & states, and many asserts
that ensure (mostly existing!) invariants are upheld.
2022-10-27 10:15:15 +02:00
Carol (Nichols || Goulding) 3145e2c05b
feat: Use workspace dep inheritance for the arrow crate 2022-10-26 10:34:29 -04:00