influxdb

Commit Graph

Author	SHA1	Message	Date
Marco Neumann	45b3984aa3	refactor: simplify `QueryChunk` data access (#6015 ) * refactor: simplify `QueryChunk` data access We have only two types for chunks (now that the RUB is gone): 1. In-memory RecordBatches 2. Parquet files Loads of logic is duplicated in the different `read_filter` implementations. Also `read_filter` hides a solid amount of logic from DataFusion, which will prevent certain (future) optimizations. To enable #5897 and to simplify the interface, let the chunks return the data (batches or metadata for parquet files) directly and let `iox_query` perform the actual heavy-lifting. * docs: improve Co-authored-by: Andrew Lamb <alamb@influxdata.com> * docs: improve Co-authored-by: Andrew Lamb <alamb@influxdata.com> Co-authored-by: Andrew Lamb <alamb@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-11-02 08:18:33 +00:00
Andrew Lamb	9c1f0a3644	refactor: move SessionConfig creation into datafusion_utils (#6011 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-10-31 20:04:49 +00:00
Marco Neumann	072439e428	refactor: mandatory `QueryChunkMeta::summary` (#5997 ) With #5963 merged, all chunks now provide a summary (even though it may not contain data for all columns). So let's make it mandatory, which also removes a few 🙈-style `.except(...)` calls. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-10-31 16:38:02 +00:00
Andrew Lamb	ace3c11f12	chore: Update datafusion (#6004 ) * chore: Update datafusion * chore: change path * chore: Run cargo hakari tasks Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-10-31 16:16:28 +00:00
Marco Neumann	8447d46093	refactor: remove `QueryChunkMeta::timestamp_min_max` (#5963 ) Use the table summary instead. This allows us to have a single mechanism that both IOx and DataFusion understand. This basically lifts the "basic table summary" mechanism that the querier uses to `iox_query` and let the compactor and ingester use the same mechanism. While not strictly necessary, simplifying the `QueryChunk[Meta]` interface helps with #5897. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-10-28 10:29:16 +00:00
Andrew Lamb	a0c0ae91ec	refactor: Simplify manipulations of BooleanArray (#5992 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-10-28 09:59:18 +00:00
Dom Dwyer	678fb81892	refactor(ingester): use partition buffer FSM This commit makes use of the partition buffer state machine introduced in https://github.com/influxdata/influxdb_iox/pull/5943. This commit significantly changes the buffering, and querying, of data from a partition, swapping out the existing "DataBuffer" for the new state machine implementation (itself simplified due to temporary lack of incremental snapshot generation, see #5944). This commit simplifies the query path, removing multiple types that wrapped one-another to pass around various state necessary to perform a query, with various query functions needing different types or combinations of types. The query path now operates using a single type (named "QueryAdaptor") that provides a queryable interface over the set of RecordBatch returned from a partition. There is significantly increased testing of the PartitionData itself, covering data in various states and the ordering of returned RecordBatch (to ensure correct materialisation of updates). There are also invariants upheld by the type system / compiler to minimise the complexities of working with empty batches & states, and many asserts that ensure (mostly existing!) invariants are upheld.	2022-10-27 10:15:15 +02:00
Carol (Nichols \|\| Goulding)	3145e2c05b	feat: Use workspace dep inheritance for the arrow crate	2022-10-26 10:34:29 -04:00
Carol (Nichols \|\| Goulding)	44936f661a	feat: Use workspace dep inheritance for datafusion instead of shim crate	2022-10-26 10:33:56 -04:00
Marco Neumann	9b48437711	refactor: make influx column type mandatory (#5978 ) We basically assume everywhere that a column falls into one of the three known categories (time, tag, field), so lets encode this in our type system instead of defining "unknown" as "undefined behavior, may or may not crash". Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-10-26 11:20:29 +00:00
Nga Tran	9b5327a79c	feat: build a query plan without deduplication (#5810 ) * feat: have a logical plan that is aware of no-deduplication * feat: build physical scan plan that does not do deduplication * chore: cleaup * test: logical plans for scan with and without deduplication * chore: clean up and a small refactor * refactor: remove asserts on plan and rename make enable_deduplication default * refactor: rename disable_deduplication to enable_deduplication	2022-10-25 17:56:51 +00:00
Carol (Nichols \|\| Goulding)	2e83e04eab	feat: Use workspace package metadata to reduce differences and repetition	2022-10-24 13:04:09 -04:00
kodiakhq[bot]	57519ec1ba	Merge branch 'main' into crepererum/issue5897i	2022-10-24 16:36:22 +00:00
Marco Neumann	a227366432	refactor: do not project chunks in `TestDatabase::chunks` (#5960 ) Databases are NOT required to project chunks (in practice this is only done by the querier for ingester-based chunks). Instead `iox_query` should (and already does) add the right stream adapters to project chunks or to create NULL-columns. Removing the special handling from the test setup makes it easier to understand and also less likely that `iox_query` starts to rely on this behavior. Helps with #5897. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-10-24 16:36:08 +00:00
Marco Neumann	3e4db81bc6	refactor: make `SchemaBuilder::field` fallible It would be nice if the IOx data type would not be optional and this is a prep clean-up to achieve that.	2022-10-24 18:12:42 +02:00
Marco Neumann	c9b1066b89	refactor: simplify `iox_query::provider::overlap` (#5961 ) - remove generic that is basically unused (`group_potential_duplicates` is always called w/ `Arc<dyn QueryChunk>`) - remove half-baked `impl` that is unused Helps w/ #5897. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-10-24 15:45:41 +00:00
Marco Neumann	1d440ddb2d	refactor: `IOxReadFilterNode` can always accumulate statistics (#5954 ) * refactor: `IOxReadFilterNode` can always accumulate statistics `IOxReadFilterNode` used to not emit statistics if one chunk has duplicates or delete predicates. This is wrong (or at least overly conservative), because the node itself (or the chunks themselves) do NOT perform dedup or delete predicate filtering. Instead this is done is done by parent nodes (`DeduplicateExec` and `FilterExec`) and its their job to propagate statistics correctly. Helps w/ #5897. * test: explain setup Co-authored-by: Andrew Lamb <alamb@influxdata.com> Co-authored-by: Andrew Lamb <alamb@influxdata.com>	2022-10-24 13:34:22 +00:00
Marco Neumann	284f253846	refactor: remove unused constant (#5956 ) Now that we read throw `ParquetExec`, `ROW_GROUP_READ_SIZE` is no longer used. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-10-24 11:08:44 +00:00
Marco Neumann	e0062f2d40	refactor: do NOT use fake DF context for parquet reading (#5942 ) Use the proper top-level DataFusion context and register the object store there. Note that we still hide the `ParquetExec` behind an opaque record batch stream. Fixing that is next on my list. Helps with #5897. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-10-24 08:20:26 +00:00
Andrew Lamb	e1d37b52b2	refactor: arrow API usage cleanup (#5927 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-10-20 12:50:16 +00:00
Marco Neumann	04320aced1	refactor: replace `croaring` with `arrow` (#5910 ) * refactor: replace `croaring` with `roaring` With the read buffer gone, roaring bitmaps are only used to calculate series sets and these calculations are pretty much possible with the pure-Rust version. Also I don't deem that that performance-critical (compared to the roaring bitmaps in the read buffer core). This removes a bunch of dependencies, mostly because `bindgen` is gone. This also removes our "croaring architecture detection" hack. * refactor: replace manual roaring sets with arrow Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-10-20 10:45:41 +00:00
Andrew Lamb	82d6fc3bda	feat: support queries via influxrpc with periods in field names (#5919 ) * feat: support queries via influxrpc with periods in field names * fix: update comments * fix: more tests * fix: more tests	2022-10-19 20:09:55 +00:00
Andrew Lamb	d706f8221d	chore: Update datafusion and arrow / parquet / arrow-flight 25.0.0 (#5900 ) * chore: Update datafusion and `arrow` / `parquet` / `arrow-flight` 25.0.0 * chore: Update for structure changes * chore: Update for new projection pushdown * chore: Run cargo hakari tasks * fix: fmt Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-10-18 20:58:47 +00:00
Carol (Nichols \|\| Goulding)	c28ac4a3c3	fix: Return an error for unsupported SQL queries (#5876 ) * test: Failing tests for unsupported queries * fix: Catch unsupported SQL operations and error rather than return nothing * test: Document a few more error messages that come through DataFusion * refactor: Extract a Step to make query error tests nicer to read and write * fix: update tests for new error codes Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-10-18 19:27:29 +00:00
Andrew Lamb	9134ccd6c3	chore: Update datafusion again (#5855 ) * chore: Update datafusion * chore: Updates for changes in datafusion * chore: more updates * fix: update doc example Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-10-13 19:18:57 +00:00
Andrew Lamb	d57c99638c	chore: Update datafusion + `arrow`, `arrow-flight`, and `parquet` to 24.0.0.0 (#5792 ) * chore: Update datafusion + `arrow`, `arrow-flight`, and `parquet` to 24.0.0.0 * fix: Update for coercion, fix explain plans for change in column name display * chore: Update datafusion lock * fix: Update for other API changes * chore: Update to latest datafusion pin * chore: Run cargo hakari tasks Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-10-12 16:19:14 +00:00
Nga Tran	95ed41f140	feat: Projection pushdown for querier -> ingester for rpc queries (#5782 ) * feat: initial step to identify where the projection should be provided * feat: start getting columns of all expressions * chore: format * test: test for the table_chunk_stream * fix: fix a compile error. Thanks @alamb * test: full tests for table_chunk_stream * chore: cleanup * fix: do not cut any columns in case all fields are needed * test: add one more test case of reading all columns * refactor: move code that identify columbs ot push down to a function. Add the use of field_columns * chore: cleanup * refactor: make sream_from_batch support empty batches * chore: cleanup * chore: fix clippy after auto merge Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-10-06 17:21:23 +00:00
Marco Neumann	c4c83e0840	fix: query error propagation (#5801 ) - treat OOM protection as "resource exhausted" - use `DataFusionError` in more places instead of opaque `Box<dyn Error>` - improve conversion from/into `DataFusionError` to preserve more semantics Overall, this improves our error handling. DF can now return errors like "resource exhausted" and gRPC should now automatically generate a sensible status code for it. Fixes #5799.	2022-10-06 08:54:01 +00:00
Dom Dwyer	cd4087e00d	style: add no todo!() or dbg!() lints Some crates had theme, some not - lets be consistent and have the compiler spot dbg!() and todo!() macro calls - they should never be in prod code!	2022-09-29 13:10:07 +02:00
Andrew Lamb	66dbb9541f	chore: Update datafusion and `arrow`/`parquet`/`arrow-flight` to 23.0.0, `thrift` to 0.16.0 (#5694 ) * chore: Update datafusion and `arrow`/`parquet`/`arrow-flight` to 23.0.0 * chore: Update thrift / remove parquet_format * fix: Update APIs * chore: Update lock + Run cargo hakari tasks * fix: use patched version of arrow-rs to work around https://github.com/apache/arrow-rs/issues/2779 * chore: Run cargo hakari tasks Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-09-27 12:50:54 +00:00
Carol (Nichols \|\| Goulding)	c8108f01e7	chore: Upgrade to Rust 1.64 (#5727 ) * chore: Upgrade to Rust 1.64 * fix: Use iter find instead of a for loop, thanks clippy * fix: Remove some needless borrows, thanks clippy * fix: Use then_some rather than then with a closure, thanks clippy * fix: Use iter retain rather than filter collect, thanks clippy Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-09-22 18:04:00 +00:00
dependabot[bot]	ea1e822e3b	chore(deps): Bump itertools from 0.10.4 to 0.10.5 (#5707 ) Bumps [itertools](https://github.com/rust-itertools/itertools) from 0.10.4 to 0.10.5. - [Release notes](https://github.com/rust-itertools/itertools/releases) - [Changelog](https://github.com/rust-itertools/itertools/blob/master/CHANGELOG.md) - [Commits](https://github.com/rust-itertools/itertools/commits) --- updated-dependencies: - dependency-name: itertools dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-09-21 08:15:59 +00:00
Marco Neumann	7e00426d49	refactor: concurrent table scan for "tag values" (#5671 ) Ref #5668.	2022-09-19 14:11:51 +00:00
Marco Neumann	274bd80ecd	refactor: concurrent table scan for "tag keys" (#5670 ) * refactor: concurrent table scan for "tag keys" Ref #5668. * feat: add table name to context metadata	2022-09-19 13:27:18 +00:00
Marco Neumann	ef09573255	refactor: concurrent table scan in "field columns" (#5651 ) * refactor: concurrent table scan in "field columns" Similar to #5647 and #5649. * docs: improve Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-09-19 10:50:25 +00:00
Marco Neumann	e346433914	refactor: concurrent table scan for "table names" (#5649 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-09-15 15:39:00 +00:00
Marco Neumann	159250e776	refactor: concurrent table planning in InfluxRPC (#5647 ) * refactor: concurrent table planning in InfluxRPC Some InfluxRPC can scan multiple tables. Prior to this PR we were always scanning the tables in sequence, adding up potential latencies (catalog, ingester, object store). There is no reason we need to do this, "ordinary" SQL queries would not serialize this way either. So let's scan tables concurrently. This add concurrency to: - read filter - read group - read window aggregate There are other query types that could benefit from a similar treatment. They will be changed in a follow-up. * docs: improve Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * test: explain `Send` assertion * refactor: change `CONCURRENT_TABLE_JOBS` to 10 Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>	2022-09-15 13:55:22 +00:00
dependabot[bot]	7e1f013346	chore(deps): Bump itertools from 0.10.3 to 0.10.4 (#5631 ) Bumps [itertools](https://github.com/rust-itertools/itertools) from 0.10.3 to 0.10.4. - [Release notes](https://github.com/rust-itertools/itertools/releases) - [Changelog](https://github.com/rust-itertools/itertools/blob/master/CHANGELOG.md) - [Commits](https://github.com/rust-itertools/itertools/compare/v0.10.3...v0.10.4) --- updated-dependencies: - dependency-name: itertools dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-09-14 14:02:14 +00:00
Andrew Lamb	45d795055a	feat: Support calling influxql/flux selector aggregates from IOx SQL (#5628 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-09-14 10:37:17 +00:00
Andrew Lamb	1fd31ee3bf	chore: Update datafusion / `arrow` / `arrow-flight` / `parquet` to version 22.0.0 (#5591 ) * chore: Update datafusion / `arrow` / `arrow-flight` / `parquet` to version 22.0.0 * fix: enable dynamic comparison flag * chore: derive Eq for clippy * chore: update explain plans * chore: Update sizes for ReadBuffer encoding * chore: update more tests Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-09-12 17:45:03 +00:00
Marco Neumann	762c2af91e	refactor: do not store chunks in `Deduplicator` (#5617 ) Only store context, settings (if any) and the schema interner within the de-duplicator. Extract a new `Chunks` type that handles the chunk classification and can passed around in a somewhat clean fashion. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-09-12 15:29:27 +00:00
Marco Neumann	8933f47ec1	refactor: make `QueryChunk::partition_id` non-optional (#5614 ) In our data model, a chunk always belongs to a partition[^1], so let's not make this attribute optional. The optional value only leads to -- mostly surprising -- conditional behavior, ranging from "do not equalize the partition sort key" (querier) to "always consider the chunk overlapping" (iox_query when dealing with ingester chunks). [^1]: This is even true when the chunk belongs to a parquet file that is not yet added to the catalog, contrary to what a comment in the ingester stated. The catalog and data model used by the querier are two totally different things.	2022-09-12 13:52:51 +00:00
Marco Neumann	b676049358	fix: apply selection in `TestChunk::read_filter` (#5613 ) * fix: apply selection in `TestChunk::read_filter` TBH I have no idea how this worked so well before, but the chunks are expected to apply the given selection. This is because `IOxReadFilterNode::execute` will wrap the `QueryChunk::read_filter` output into a `SchemaAdapterStream` and this one expects that there are no input columns that are absent in the output schema (i.e. it will only add null columns, it won't remove any). Funnily the `SchemaAdapterStream` error will blame DataFusion for the mess. * test: make `test_storage_rpc_tag_values_grouped_by_measurement_and_tag_key` a bit harder Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-09-12 13:10:37 +00:00
Marco Neumann	caa0dfd1e0	refactor: query code clean ups (#5612 ) * refactor: remove dead code * refactor: `Deduplicator::build_scan_plan` consumes `self` There is no good reason to use the same `Deduplicator` twice. In contrast I'm quite sure that this would lead to nasty bugs, because `split_overlapped_chunks` exists early in some cases so the 2nd plan would have old and new chunks mixed together.	2022-09-12 13:00:56 +00:00
YIXIAO SHI	fa6c26b38d	chore: fix comment typo (#5550 ) Co-authored-by: Dom <dom@itsallbroken.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-09-07 08:57:34 +00:00
YIXIAO SHI	52ae60bf2e	chore: fix comment typo (#5551 ) Co-authored-by: Dom <dom@itsallbroken.com>	2022-09-07 08:49:29 +00:00
Marco Neumann	adeacf416c	ci: fix (#5569 ) * ci: use same feature set in `build_dev` and `build_release` * ci: also enable unstable tokio for `build_dev` * chore: update tokio to 1.21 (to fix console-subscriber 0.1.8 * fix: "must use"	2022-09-06 14:13:28 +00:00
Andrew Lamb	6669d85fb4	chore: Update datafusion + arrow/parquet to `21.0.0` (#5519 ) * chore: Update arrow/arrow-flight/parquet to 21.0.0 * chore: Update datafusion pin * chore: Fix arrow update script * chore: Update Cargo.lock * chore: Update for new API	2022-08-31 13:30:47 +00:00
Sam Arnold	05657ea068	fix: optimizations for metadata fetch and chunk pruning (#5467 ) * fix: hoist repeated computation out of chunk creation We have hundreds of chunks per table, so it is beneficial to only do common work once. * chore: remove TableCache as it is no longer used * fix: prune chunks both before and after metadata fetch Fetching the metadata for all the chunks in a table is expensive, especially when we have a narrow time range query that only needs a few chunks. * chore: fix clippy * fix: fix up some last tests * fix: review comments Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-08-29 14:59:05 +00:00
Andrew Lamb	9aac78d30b	fix: Correctly lexigraphically sort `_field` and `_measurement` with upper case tag keys (#5436 ) Co-authored-by: Dom <dom@itsallbroken.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-08-29 13:45:03 +00:00

1 2 3

105 Commits (e49f2ca5c7e5a427190f15cda99c003480d0f113)