Commit Graph

129 Commits (eb6abb5d670799e9e1d9a1a4784305aadf5a6914)

Author SHA1 Message Date
Stuart Carnie 2306c383f3
feat: Introduce InfluxQL to Flight (#6166)
* feat: Introduce InfluxQL to Flight

All InfluxQL queries will fail with an error

* chore: Temper protobuf lint

* chore: Finalize flight.proto changes; fix tests

* chore: Add tests for InfluxQL planner

* chore: Update docs

* chore: Update docs

* chore: Rename back to original

* chore: Use .into() rather than cast

* chore: Use function rather than field

* chore: Improved InfluxQL planner name

* chore: Restore `impl Into<String>` argument

* chore: Add a comment that Go clients are unable to execute InfluxQL

* chore: Add a test for the `--lang` argument and InfluxQL
2022-11-23 00:33:49 +00:00
Andrew Lamb 1a1ea74cb7
chore: Upgrade datafusion again (#6160)
* Revert "Revert "chore: Update datafusion again (#6108)""

This reverts commit 766b3bbeb440618cfe332f6ee7d4f8a8217acc48.

* fix: Respect the partition sort key

* chore: update plans

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-22 19:28:26 +00:00
dependabot[bot] a9db7581cd
chore(deps): Bump tokio from 1.21.2 to 1.22.0 (#6183)
Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.21.2 to 1.22.0.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.21.2...tokio-1.22.0)

---
updated-dependencies:
- dependency-name: tokio
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-21 10:21:24 +00:00
Andrew Lamb 4630bbb956
feat: push down all predicates (#6042)
* feat: push down all predicates

* fix: fmt

* fix: fmt

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-18 16:22:01 +00:00
Marco Neumann 71ffc92559
fix: only push safe select expression through de-dup (#6156)
* fix: only push safe select expression through de-dup

Fixes #6066.

* docs: improve

Co-authored-by: Andrew Lamb <alamb@influxdata.com>

* fix: rebase

* test: ensure we do not split ORs

Co-authored-by: Andrew Lamb <alamb@influxdata.com>
2022-11-18 09:56:11 +00:00
Andrew Lamb 67712b595c
Revert "chore: Update datafusion again (#6108)" (#6159)
This reverts commit fbe9f27f10.
2022-11-16 21:14:55 +00:00
Andrew Lamb fbe9f27f10
chore: Update datafusion again (#6108)
* chore: Update datafusion pin + api code

* chore: Run cargo hakari tasks

* refactor: combine_sort_key is more idomatic and add rationale comments

* refactor: satisfy borrow checker and updated comments

* fix: Add test case for combine_sort_key

* fix:  Apply suggestions from code review

Co-authored-by: Marco Neumann <marco@crepererum.net>

* fix: Add back test for deeply nested expression

* fix: Update output ordering

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: Marco Neumann <marco@crepererum.net>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-16 14:41:52 +00:00
Andrew Lamb 20f1ae1c8f
test: tests in the reorg planner and query tests for merging parquet files (#6137)
* test: tests in the reorg planner and query tests for merging parquet files

* fix: use 20 files

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-15 20:29:44 +00:00
dependabot[bot] a969754819
chore(deps): Bump chrono from 0.4.22 to 0.4.23 (#6129)
* chore(deps): Bump chrono from 0.4.22 to 0.4.23

Bumps [chrono](https://github.com/chronotope/chrono) from 0.4.22 to 0.4.23.
- [Release notes](https://github.com/chronotope/chrono/releases)
- [Changelog](https://github.com/chronotope/chrono/blob/main/CHANGELOG.md)
- [Commits](https://github.com/chronotope/chrono/compare/v0.4.22...v0.4.23)

---
updated-dependencies:
- dependency-name: chrono
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* refactor: chrono future compat

Integer->timstamp conversions should not silently panic.

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Marco Neumann <marco@crepererum.net>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-14 13:34:09 +00:00
kodiakhq[bot] 05d7d1495e
Merge branch 'main' into dependabot/cargo/hashbrown-0.13.1 2022-11-11 21:26:40 +00:00
Carol (Nichols || Goulding) 0657ad9600
fix: Rename QueryDatabase to QueryNamespace 2022-11-11 16:14:12 -05:00
Carol (Nichols || Goulding) 621560a0dc
fix: Rename QueryDatabaseMeta to QueryNamespaceMeta 2022-11-11 16:14:12 -05:00
Carol (Nichols || Goulding) bdff4e8848
fix: Consistently use 'namespace' instead of 'database' in comments and other internal text 2022-11-11 15:46:04 -05:00
Jake Goulding cc17e5a54b refactor: use a workspace dependency for hashbrown 2022-11-11 13:25:39 -05:00
dependabot[bot] 5024523f00 chore(deps): Bump hashbrown from 0.12.3 to 0.13.1
Bumps [hashbrown](https://github.com/rust-lang/hashbrown) from 0.12.3 to 0.13.1.
- [Release notes](https://github.com/rust-lang/hashbrown/releases)
- [Changelog](https://github.com/rust-lang/hashbrown/blob/master/CHANGELOG.md)
- [Commits](https://github.com/rust-lang/hashbrown/compare/v0.12.3...v0.13.1)

---
updated-dependencies:
- dependency-name: hashbrown
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-11-11 13:24:56 -05:00
Carol (Nichols || Goulding) abcbe19966
fix: Box some Error type fields to make the Error types small
As found by this new Clippy lint:

<https://rust-lang.github.io/rust-clippy/master/index.html#result_large_err>
2022-11-09 10:54:18 -05:00
Carol (Nichols || Goulding) f33e367904
fix: Use is_none instead of == None, thanks Clippy!
Again seems kinda niche and not a huge deal, but apparently this doesn't
rely on `T: PartialEq` so probably good?

<https://rust-lang.github.io/rust-clippy/master/index.html#partialeq_to_none>
2022-11-09 10:54:18 -05:00
Marco Neumann 1a5fc3d772
test: use `EXPLAIN ANALYZE` for SQL metric tests (#6084)
* test: use `EXPLAIN ANALYZE` for SQL metric tests

Needs a bit more infra (due to normalization), but this seems to be
worth it so we can easily hook up more metrics in the future.

* docs: explain regexes
2022-11-09 09:00:27 +00:00
Marco Neumann 903f7bafa7
refactor: expose `ParquetExec` directly to DataFusion phys. plan (#6072)
* refactor: expose `ParquetExec` directly to DataFusion phys. plan

Closes #5897.

* fix: update tracing tests

* refactor: use `EmptyExec`

* refactor: use `target_partitions`

* refactor: improve UUID normalization in query tests

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
2022-11-08 12:19:28 +00:00
Andrew Lamb 034d9b371d
chore: Update datafusion and arrow/arrow-flight/parquet to `26.0.0` (#6061)
* chore: Update datafusion and arrow/arrow-flight/parquet to `26.0.0`

* fix: Update query_functions

* fix: update for TimestampNanosecondArray API changes

* fix: update for TimestampNanosecondArray API changes

* chore: Update flatbuffers and remove rustsec warning

* chore: Update text

* fix: update more test

* fix: Lock ahash to exactly 0.8.0

* fix: Update datafusion pin

* chore: Run cargo hakari tasks

Co-authored-by: Carol (Nichols || Goulding) <carol.nichols@gmail.com>
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-07 11:01:58 +00:00
Marco Neumann f511db380c
refactor: remove table name from chunks (#6063)
It should be always clear from the context to which table a chunk
belongs.

I think having a table name bound to a chunk goes back to a time where
chunks had multiple tables.

Helps with #6049.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-07 10:42:57 +00:00
Andrew Lamb b149dc541a
chore: Export metrics for parquet access to `EXPLAIN ANALYZE` (#6043)
* chore: Export metrics for parquet access

* fix: make parquet_execs non pub
2022-11-03 11:39:16 +00:00
Andrew Lamb 4fb2843d05
refactor: Rename `schema::selection::Selection` to `schema::projection::Projection` (#6037)
* chore: Rename `schema::selection::Selection` to `schema::projection::Projection`

* fix: docs

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-02 18:15:04 +00:00
Andrew Lamb 58838e214e
feat: enable parquet predicate pushdown in IOx (#5930) 2022-11-02 18:00:47 +00:00
Marco Neumann 45b3984aa3
refactor: simplify `QueryChunk` data access (#6015)
* refactor: simplify `QueryChunk` data access

We have only two types for chunks (now that the RUB is gone):

1. In-memory RecordBatches
2. Parquet files

Loads of logic is duplicated in the different `read_filter`
implementations. Also `read_filter` hides a solid amount of logic from
DataFusion, which will prevent certain (future) optimizations. To enable #5897
and to simplify the interface, let the chunks return the data (batches
or metadata for parquet files) directly and let `iox_query` perform the
actual heavy-lifting.

* docs: improve

Co-authored-by: Andrew Lamb <alamb@influxdata.com>

* docs: improve

Co-authored-by: Andrew Lamb <alamb@influxdata.com>

Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-02 08:18:33 +00:00
Andrew Lamb 9c1f0a3644
refactor: move SessionConfig creation into datafusion_utils (#6011)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-31 20:04:49 +00:00
Marco Neumann 072439e428
refactor: mandatory `QueryChunkMeta::summary` (#5997)
With #5963 merged, all chunks now provide a summary (even though it may
not contain data for all columns). So let's make it mandatory, which
also removes a few 🙈-style `.except(...)` calls.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-31 16:38:02 +00:00
Andrew Lamb ace3c11f12
chore: Update datafusion (#6004)
* chore: Update datafusion

* chore: change path

* chore: Run cargo hakari tasks

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-31 16:16:28 +00:00
Marco Neumann 8447d46093
refactor: remove `QueryChunkMeta::timestamp_min_max` (#5963)
Use the table summary instead. This allows us to have a single mechanism
that both IOx and DataFusion understand. This basically lifts the "basic
table summary" mechanism that the querier uses to `iox_query` and let
the compactor and ingester use the same mechanism.

While not strictly necessary, simplifying the `QueryChunk[Meta]`
interface helps with #5897.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-28 10:29:16 +00:00
Andrew Lamb a0c0ae91ec
refactor: Simplify manipulations of BooleanArray (#5992)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-28 09:59:18 +00:00
Dom Dwyer 678fb81892 refactor(ingester): use partition buffer FSM
This commit makes use of the partition buffer state machine introduced
in https://github.com/influxdata/influxdb_iox/pull/5943.

This commit significantly changes the buffering, and querying, of data
from a partition, swapping out the existing "DataBuffer" for the new
state machine implementation (itself simplified due to temporary lack of
incremental snapshot generation, see #5944).

This commit simplifies the query path, removing multiple types that
wrapped one-another to pass around various state necessary to perform a
query, with various query functions needing different types or
combinations of types. The query path now operates using a single type
(named "QueryAdaptor") that provides a queryable interface over the set
of RecordBatch returned from a partition.

There is significantly increased testing of the PartitionData itself,
covering data in various states and the ordering of returned RecordBatch
(to ensure correct materialisation of updates). There are also
invariants upheld by the type system / compiler to minimise the
complexities of working with empty batches & states, and many asserts
that ensure (mostly existing!) invariants are upheld.
2022-10-27 10:15:15 +02:00
Carol (Nichols || Goulding) 3145e2c05b
feat: Use workspace dep inheritance for the arrow crate 2022-10-26 10:34:29 -04:00
Carol (Nichols || Goulding) 44936f661a
feat: Use workspace dep inheritance for datafusion instead of shim crate 2022-10-26 10:33:56 -04:00
Marco Neumann 9b48437711
refactor: make influx column type mandatory (#5978)
We basically assume everywhere that a column falls into one of the three
known categories (time, tag, field), so lets encode this in our type
system instead of defining "unknown" as "undefined behavior, may or may
not crash".

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-26 11:20:29 +00:00
Nga Tran 9b5327a79c
feat: build a query plan without deduplication (#5810)
* feat: have a logical plan that is aware of no-deduplication

* feat: build physical scan plan that does not do deduplication

* chore: cleaup

* test: logical plans for scan with and without deduplication

* chore: clean up and a small refactor

* refactor: remove asserts on plan and rename make enable_deduplication default

* refactor: rename disable_deduplication to enable_deduplication
2022-10-25 17:56:51 +00:00
Carol (Nichols || Goulding) 2e83e04eab
feat: Use workspace package metadata to reduce differences and repetition 2022-10-24 13:04:09 -04:00
kodiakhq[bot] 57519ec1ba
Merge branch 'main' into crepererum/issue5897i 2022-10-24 16:36:22 +00:00
Marco Neumann a227366432
refactor: do not project chunks in `TestDatabase::chunks` (#5960)
Databases are NOT required to project chunks (in practice this is only
done by the querier for ingester-based chunks). Instead `iox_query`
should (and already does) add the right stream adapters to project
chunks or to create NULL-columns. Removing the special handling from the
test setup makes it easier to understand and also less likely that
`iox_query` starts to rely on this behavior.

Helps with #5897.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-24 16:36:08 +00:00
Marco Neumann 3e4db81bc6 refactor: make `SchemaBuilder::field` fallible
It would be nice if the IOx data type would not be optional and this is
a prep clean-up to achieve that.
2022-10-24 18:12:42 +02:00
Marco Neumann c9b1066b89
refactor: simplify `iox_query::provider::overlap` (#5961)
- remove generic that is basically unused (`group_potential_duplicates`
  is always called w/ `Arc<dyn QueryChunk>`)
- remove half-baked `impl` that is unused

Helps w/ #5897.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-24 15:45:41 +00:00
Marco Neumann 1d440ddb2d
refactor: `IOxReadFilterNode` can always accumulate statistics (#5954)
* refactor: `IOxReadFilterNode` can always accumulate statistics

`IOxReadFilterNode` used to not emit statistics if one chunk has
duplicates or delete predicates. This is wrong (or at least overly
conservative), because the node itself (or the chunks themselves) do NOT
perform dedup or delete predicate filtering. Instead this is done is
done by parent nodes (`DeduplicateExec` and `FilterExec`) and its their
job to propagate statistics correctly.

Helps w/ #5897.

* test: explain setup

Co-authored-by: Andrew Lamb <alamb@influxdata.com>

Co-authored-by: Andrew Lamb <alamb@influxdata.com>
2022-10-24 13:34:22 +00:00
Marco Neumann 284f253846
refactor: remove unused constant (#5956)
Now that we read throw `ParquetExec`, `ROW_GROUP_READ_SIZE` is no longer
used.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-24 11:08:44 +00:00
Marco Neumann e0062f2d40
refactor: do NOT use fake DF context for parquet reading (#5942)
Use the proper top-level DataFusion context and register the object
store there.

Note that we still hide the `ParquetExec` behind an opaque record batch
stream. Fixing that is next on my list.

Helps with #5897.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-24 08:20:26 +00:00
Andrew Lamb e1d37b52b2
refactor: arrow API usage cleanup (#5927)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-20 12:50:16 +00:00
Marco Neumann 04320aced1
refactor: replace `croaring` with `arrow` (#5910)
* refactor: replace `croaring` with `roaring`

With the read buffer gone, roaring bitmaps are only used to calculate
series sets and these calculations are pretty much possible with the
pure-Rust version. Also I don't deem that that performance-critical
(compared to the roaring bitmaps in the read buffer core).

This removes a bunch of dependencies, mostly because `bindgen` is gone.
This also removes our "croaring architecture detection" hack.

* refactor: replace manual roaring sets with arrow

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-20 10:45:41 +00:00
Andrew Lamb 82d6fc3bda
feat: support queries via influxrpc with periods in field names (#5919)
* feat: support queries via influxrpc with periods in field names

* fix: update comments

* fix: more tests

* fix: more tests
2022-10-19 20:09:55 +00:00
Andrew Lamb d706f8221d
chore: Update datafusion and arrow / parquet / arrow-flight 25.0.0 (#5900)
* chore: Update datafusion and  `arrow` / `parquet` / `arrow-flight` 25.0.0

* chore: Update for structure changes

* chore: Update for new projection pushdown

* chore: Run cargo hakari tasks

* fix: fmt

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-18 20:58:47 +00:00
Carol (Nichols || Goulding) c28ac4a3c3
fix: Return an error for unsupported SQL queries (#5876)
* test: Failing tests for unsupported queries

* fix: Catch unsupported SQL operations and error rather than return nothing

* test: Document a few more error messages that come through DataFusion

* refactor: Extract a Step to make query error tests nicer to read and write

* fix: update tests for new error codes

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-18 19:27:29 +00:00
Andrew Lamb 9134ccd6c3
chore: Update datafusion again (#5855)
* chore: Update datafusion

* chore: Updates for changes in datafusion

* chore: more updates

* fix: update doc example

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-13 19:18:57 +00:00
Andrew Lamb d57c99638c
chore: Update datafusion + `arrow`, `arrow-flight`, and `parquet` to 24.0.0.0 (#5792)
* chore: Update datafusion + `arrow`, `arrow-flight`, and `parquet` to 24.0.0.0

* fix: Update for coercion, fix explain plans for change in column name display

* chore: Update datafusion lock

* fix: Update for other API changes

* chore: Update to latest datafusion pin

* chore: Run cargo hakari tasks

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-12 16:19:14 +00:00