influxdb

Commit Graph

Author	SHA1	Message	Date
Nga Tran	95ed41f140	feat: Projection pushdown for querier -> ingester for rpc queries (#5782 ) * feat: initial step to identify where the projection should be provided * feat: start getting columns of all expressions * chore: format * test: test for the table_chunk_stream * fix: fix a compile error. Thanks @alamb * test: full tests for table_chunk_stream * chore: cleanup * fix: do not cut any columns in case all fields are needed * test: add one more test case of reading all columns * refactor: move code that identify columbs ot push down to a function. Add the use of field_columns * chore: cleanup * refactor: make sream_from_batch support empty batches * chore: cleanup * chore: fix clippy after auto merge Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-10-06 17:21:23 +00:00
Marco Neumann	c4c83e0840	fix: query error propagation (#5801 ) - treat OOM protection as "resource exhausted" - use `DataFusionError` in more places instead of opaque `Box<dyn Error>` - improve conversion from/into `DataFusionError` to preserve more semantics Overall, this improves our error handling. DF can now return errors like "resource exhausted" and gRPC should now automatically generate a sensible status code for it. Fixes #5799.	2022-10-06 08:54:01 +00:00
Dom Dwyer	cd4087e00d	style: add no todo!() or dbg!() lints Some crates had theme, some not - lets be consistent and have the compiler spot dbg!() and todo!() macro calls - they should never be in prod code!	2022-09-29 13:10:07 +02:00
Andrew Lamb	66dbb9541f	chore: Update datafusion and `arrow`/`parquet`/`arrow-flight` to 23.0.0, `thrift` to 0.16.0 (#5694 ) * chore: Update datafusion and `arrow`/`parquet`/`arrow-flight` to 23.0.0 * chore: Update thrift / remove parquet_format * fix: Update APIs * chore: Update lock + Run cargo hakari tasks * fix: use patched version of arrow-rs to work around https://github.com/apache/arrow-rs/issues/2779 * chore: Run cargo hakari tasks Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-09-27 12:50:54 +00:00
Carol (Nichols \|\| Goulding)	c8108f01e7	chore: Upgrade to Rust 1.64 (#5727 ) * chore: Upgrade to Rust 1.64 * fix: Use iter find instead of a for loop, thanks clippy * fix: Remove some needless borrows, thanks clippy * fix: Use then_some rather than then with a closure, thanks clippy * fix: Use iter retain rather than filter collect, thanks clippy Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-09-22 18:04:00 +00:00
dependabot[bot]	ea1e822e3b	chore(deps): Bump itertools from 0.10.4 to 0.10.5 (#5707 ) Bumps [itertools](https://github.com/rust-itertools/itertools) from 0.10.4 to 0.10.5. - [Release notes](https://github.com/rust-itertools/itertools/releases) - [Changelog](https://github.com/rust-itertools/itertools/blob/master/CHANGELOG.md) - [Commits](https://github.com/rust-itertools/itertools/commits) --- updated-dependencies: - dependency-name: itertools dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-09-21 08:15:59 +00:00
Marco Neumann	7e00426d49	refactor: concurrent table scan for "tag values" (#5671 ) Ref #5668.	2022-09-19 14:11:51 +00:00
Marco Neumann	274bd80ecd	refactor: concurrent table scan for "tag keys" (#5670 ) * refactor: concurrent table scan for "tag keys" Ref #5668. * feat: add table name to context metadata	2022-09-19 13:27:18 +00:00
Marco Neumann	ef09573255	refactor: concurrent table scan in "field columns" (#5651 ) * refactor: concurrent table scan in "field columns" Similar to #5647 and #5649. * docs: improve Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-09-19 10:50:25 +00:00
Marco Neumann	e346433914	refactor: concurrent table scan for "table names" (#5649 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-09-15 15:39:00 +00:00
Marco Neumann	159250e776	refactor: concurrent table planning in InfluxRPC (#5647 ) * refactor: concurrent table planning in InfluxRPC Some InfluxRPC can scan multiple tables. Prior to this PR we were always scanning the tables in sequence, adding up potential latencies (catalog, ingester, object store). There is no reason we need to do this, "ordinary" SQL queries would not serialize this way either. So let's scan tables concurrently. This add concurrency to: - read filter - read group - read window aggregate There are other query types that could benefit from a similar treatment. They will be changed in a follow-up. * docs: improve Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * test: explain `Send` assertion * refactor: change `CONCURRENT_TABLE_JOBS` to 10 Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>	2022-09-15 13:55:22 +00:00
dependabot[bot]	7e1f013346	chore(deps): Bump itertools from 0.10.3 to 0.10.4 (#5631 ) Bumps [itertools](https://github.com/rust-itertools/itertools) from 0.10.3 to 0.10.4. - [Release notes](https://github.com/rust-itertools/itertools/releases) - [Changelog](https://github.com/rust-itertools/itertools/blob/master/CHANGELOG.md) - [Commits](https://github.com/rust-itertools/itertools/compare/v0.10.3...v0.10.4) --- updated-dependencies: - dependency-name: itertools dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-09-14 14:02:14 +00:00
Andrew Lamb	45d795055a	feat: Support calling influxql/flux selector aggregates from IOx SQL (#5628 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-09-14 10:37:17 +00:00
Andrew Lamb	1fd31ee3bf	chore: Update datafusion / `arrow` / `arrow-flight` / `parquet` to version 22.0.0 (#5591 ) * chore: Update datafusion / `arrow` / `arrow-flight` / `parquet` to version 22.0.0 * fix: enable dynamic comparison flag * chore: derive Eq for clippy * chore: update explain plans * chore: Update sizes for ReadBuffer encoding * chore: update more tests Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-09-12 17:45:03 +00:00
Marco Neumann	762c2af91e	refactor: do not store chunks in `Deduplicator` (#5617 ) Only store context, settings (if any) and the schema interner within the de-duplicator. Extract a new `Chunks` type that handles the chunk classification and can passed around in a somewhat clean fashion. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-09-12 15:29:27 +00:00
Marco Neumann	8933f47ec1	refactor: make `QueryChunk::partition_id` non-optional (#5614 ) In our data model, a chunk always belongs to a partition[^1], so let's not make this attribute optional. The optional value only leads to -- mostly surprising -- conditional behavior, ranging from "do not equalize the partition sort key" (querier) to "always consider the chunk overlapping" (iox_query when dealing with ingester chunks). [^1]: This is even true when the chunk belongs to a parquet file that is not yet added to the catalog, contrary to what a comment in the ingester stated. The catalog and data model used by the querier are two totally different things.	2022-09-12 13:52:51 +00:00
Marco Neumann	b676049358	fix: apply selection in `TestChunk::read_filter` (#5613 ) * fix: apply selection in `TestChunk::read_filter` TBH I have no idea how this worked so well before, but the chunks are expected to apply the given selection. This is because `IOxReadFilterNode::execute` will wrap the `QueryChunk::read_filter` output into a `SchemaAdapterStream` and this one expects that there are no input columns that are absent in the output schema (i.e. it will only add null columns, it won't remove any). Funnily the `SchemaAdapterStream` error will blame DataFusion for the mess. * test: make `test_storage_rpc_tag_values_grouped_by_measurement_and_tag_key` a bit harder Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-09-12 13:10:37 +00:00
Marco Neumann	caa0dfd1e0	refactor: query code clean ups (#5612 ) * refactor: remove dead code * refactor: `Deduplicator::build_scan_plan` consumes `self` There is no good reason to use the same `Deduplicator` twice. In contrast I'm quite sure that this would lead to nasty bugs, because `split_overlapped_chunks` exists early in some cases so the 2nd plan would have old and new chunks mixed together.	2022-09-12 13:00:56 +00:00
YIXIAO SHI	fa6c26b38d	chore: fix comment typo (#5550 ) Co-authored-by: Dom <dom@itsallbroken.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-09-07 08:57:34 +00:00
YIXIAO SHI	52ae60bf2e	chore: fix comment typo (#5551 ) Co-authored-by: Dom <dom@itsallbroken.com>	2022-09-07 08:49:29 +00:00
Marco Neumann	adeacf416c	ci: fix (#5569 ) * ci: use same feature set in `build_dev` and `build_release` * ci: also enable unstable tokio for `build_dev` * chore: update tokio to 1.21 (to fix console-subscriber 0.1.8 * fix: "must use"	2022-09-06 14:13:28 +00:00
Andrew Lamb	6669d85fb4	chore: Update datafusion + arrow/parquet to `21.0.0` (#5519 ) * chore: Update arrow/arrow-flight/parquet to 21.0.0 * chore: Update datafusion pin * chore: Fix arrow update script * chore: Update Cargo.lock * chore: Update for new API	2022-08-31 13:30:47 +00:00
Sam Arnold	05657ea068	fix: optimizations for metadata fetch and chunk pruning (#5467 ) * fix: hoist repeated computation out of chunk creation We have hundreds of chunks per table, so it is beneficial to only do common work once. * chore: remove TableCache as it is no longer used * fix: prune chunks both before and after metadata fetch Fetching the metadata for all the chunks in a table is expensive, especially when we have a narrow time range query that only needs a few chunks. * chore: fix clippy * fix: fix up some last tests * fix: review comments Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-08-29 14:59:05 +00:00
Andrew Lamb	9aac78d30b	fix: Correctly lexigraphically sort `_field` and `_measurement` with upper case tag keys (#5436 ) Co-authored-by: Dom <dom@itsallbroken.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-08-29 13:45:03 +00:00
Carol (Nichols \|\| Goulding)	d4a472f775	fix: Remove let bindings that are immediately returned As now caught by clippy. https://rust-lang.github.io/rust-clippy/master/index.html#let_and_return	2022-08-11 15:21:02 -04:00
Carol (Nichols \|\| Goulding)	3a501a4a10	fix: Remove an immediate ref to a deref Caught by clippy now. https://rust-lang.github.io/rust-clippy/master/index.html#borrow_deref_ref	2022-08-11 15:04:14 -04:00
Carol (Nichols \|\| Goulding)	b982bdaf2f	fix: Derive Eq when we derive PartialEq and members can derive Eq Allow this in generated code that we don't control, though. Recommended by clippy now. https://rust-lang.github.io/rust-clippy/master/index.html#derive_partial_eq_without_eq	2022-08-11 15:04:06 -04:00
Marco Neumann	90fec1365f	feat: intern schemas during query planning (#5215 ) * feat: intern schemas during query planning Helps with #5202. * refactor: `SchemaMerger::build` shall return an `Arc` * feat: `SchemaMerger::with_interner` * refactor: hash-based schema interning	2022-08-11 12:28:51 +00:00
Andrew Lamb	b834bc630c	chore: more readability improvements to sort keys (#5366 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-08-10 17:59:25 +00:00
Andrew Lamb	ce3e2c3a15	chore: make terminology in iox_query::Provider consistent (remove super notation) (#5349 ) * chore: make terminology in iox_query::Provider consistent (remove super notation) * refactor: be more specific about which sort key is meant * refactor: rename another sort_key --> output_sort_key * refactor: rename additional sort_key to output_sort_key * refactor: rename sort_key --> chunk_sort_key Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-08-10 10:59:47 +00:00
Andrew Lamb	16ddc5efc6	chore: Update datafusion / arrow/parquet/arrow-flight and prost/tonic ecosystem (#5360 ) * chore: Update datafusion and arrow * chore: Update Cargo.lock * chore: update to Decimal128 * chore: Update tonic/prost/pbjson/etc * chore: Run cargo hakari tasks * fix: doctest in generated types Co-authored-by: CircleCI[bot] <circleci@influxdata.com>	2022-08-09 17:30:44 +00:00
Marco Neumann	fc1870ff76	fix: chunk pruning stats (#5319 ) - emit a warning if we cannot even attempt to prune chunks due to an error. This is always either a missing feature or a bug (even though it does not impact correctness but _only_ performance). Also see https://github.com/influxdata/conductor/issues/1107 - change metrics to clearly differentiate between "could not prune" and "not pruned" - add new "not pruned" observer hook (this was missing for some reason, the "pruned" hook existed though)	2022-08-05 10:50:31 +00:00
Andrew Lamb	e82214ed38	chore: fix `cargo audit`, update deps to get new chrono (#5316 ) * chore: update deps to get new chrono * chore: Run cargo hakari tasks * chore: migrate away from deprecated API Co-authored-by: CircleCI[bot] <circleci@influxdata.com>	2022-08-04 20:49:28 +00:00
Marco Neumann	0d714878ca	feat: chunk pruning metrics (#5273 ) * refactor: make could-not-prune reason a static string * refactor: introduce `QuerierTableArgs` * feat: chunk pruning metrics Closes #4974. * refactor: address review comments * refactor: use static typing for not-pruned reason * refactor: pass chunk to not-pruned observer and use it for some metrics Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-08-04 15:29:21 +00:00
Nga Tran	34ccc9c7f5	chore: Revert "chore: Revert "refactor: bump batch size (#5251 )" (#5288 )" (#5300 ) This reverts commit `471b8be92f`. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-08-04 13:19:46 +00:00
Nga Tran	471b8be92f	chore: Revert "refactor: bump batch size (#5251 )" (#5288 ) This reverts commit `bb172f8fa8`.	2022-08-03 14:23:45 +00:00
Marco Neumann	bb172f8fa8	refactor: bump batch size (#5251 ) This is what DataFusion uses by default and I don't see a reason why we should use such small batch sizes. The affect is probably only visible in certain filter-aggregate queries that don't focus on a single series (because there we likely end up with 1 or 2 batches only, esp. after #5250) for coarse-grained filters, esp. when the filter key is not the first sort key. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-08-01 13:49:58 +00:00
Andrew Lamb	9215a534d0	chore: Update datafusion and `arrow`/`parquet`/`arrow-flight` to `19.0.0` (#5229 ) * chore: Update datafusion and `arrow`/`parquet`/`arrow-flight` to `19.0.0` * chore: Run cargo hakari tasks * fix: Update for API changes * fix: clippy Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-28 08:10:47 +00:00
Marco Neumann	9a9a1a4777	feat: limit per-table chunk data for every query (#5223 ) * feat: `QueryChunk::as_any` * feat: allo `ChunkPruner::prune_chunks` to fail * feat: limit per-table chunk data for every query Closes #5211. * fix: address review comments Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>	2022-07-27 13:20:05 +00:00
Andrew Lamb	fbf672015e	refactor: Reduce ceremony requried to create a `Span` from `SpanContext` (#5181 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-22 11:19:38 +00:00
Marco Neumann	0561423475	refactor: enforce proper `IOxSessionContext` (#5158 ) - remove `IOxSessionContext::default()` because untracked contexts should only be created by tests - remove `Option<IOxSessionContext>` because it is a typed workaround for `IOxSessionContext::default` Tests should use `IOxSessionContext::testing` and all _normal_ users should create proper contexts. I suspect this will help tracing or at least prevent silent regressions. See #5129. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-20 16:25:43 +00:00
Marko Mikulicic	b8236e2b9d	fix: Fix SeriesKey sort order for special _measurement and _field (#5150 ) * fix: Fix SeriesKey sort order for special _measurement and _field * fix: Update expected test output * fix: Update more tests * fix: Re-sort tag key when using binary encoding Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>	2022-07-20 08:45:17 +00:00
Marco Neumann	b8d9799a26	feat: wire span all the way to `QuerierTable::chunks` (#5134 ) * feat: pass context to `QueryDatabase::chunks` * feat: wire span all the way to `QuerierTable::chunks` This is required for #5129.	2022-07-19 14:12:55 +00:00
Nga Tran	c8f4000f04	feat: Select compaction candidates (#5131 ) * feat: initial implementation for selecting compaction candidates * feat: 2 catalog functions to choose the most thorughput partitions to compact and the selecting candidate function itself * test: tests for the new 2 queries * feat: more tests and metrics for chooing compaction candidates * chore: Apply self suggestions from self review * chore: cleanup * chore: fix doc comment * chore: Apply suggestions from code review Co-authored-by: Carol (Nichols \|\| Goulding) <193874+carols10cents@users.noreply.github.com> * refactor: address review comments * fix: get the right time provider for the tests * refactor: remove the left over compaction_ * fix: typos * fix: make the param name and env name consistent * refactor: make relevant iSomething to uSomething * fix: typo Co-authored-by: Carol (Nichols \|\| Goulding) <193874+carols10cents@users.noreply.github.com>	2022-07-18 18:05:13 +00:00
Andrew Lamb	e2d871b00b	chore: Update datafusion and arrow/parquet/arrow-flight to `18.0.0` (#5079 ) * chore: Update datafusion to 10.0.0, arrow/parquet/arrow-flight to 18 * chore: Run cargo hakari tasks * fix: update cargo pin Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-18 15:01:03 +00:00
Marco Neumann	9c2b6cd96c	fix: always pass proper context to `InfluxRpcPlanner` (#5144 ) There were some instances were we forgot to pass context (and therefore tracing) information to `InfluxRpcPlanner`. This removes the `Default` implementation requires to always pass a context when creating `InfluxRpcPlanner` to prevent this type of bug. Ref #5129. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-18 14:45:22 +00:00
dependabot[bot]	9b67de2f43	chore(deps): Bump tokio from 1.19.2 to 1.20.0 Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.19.2 to 1.20.0. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.19.2...tokio-1.20.0) --- updated-dependencies: - dependency-name: tokio dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>	2022-07-14 01:21:43 +00:00
Marco Neumann	b1b2cb5d4a	feat: load read buffer on demand (#5091 ) * refactor: extract `select_schema` * refactor: improve `InternalLostInputField` error message * test: improve SQL runner output * feat: load read buffer on demand Closes #5032. * refactor: move `[Half]OwnedSelection` to `schema` crate`	2022-07-13 08:51:40 +00:00
Marco Neumann	f1467cf4d8	refactor: try to query "cheap" chunks first (#5075 ) This should help a lot once #5032 is implemented. Currently it doesn't really make a difference. See #5037, which also proposes a more advanced but more complex system. The team however agreed to try something simple first. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-08 11:19:25 +00:00
Andrew Lamb	c46e1c6347	chore: Update datafusion + arrow/parquet/arrow-flight to `17.0.0` (#5021 ) * fix: correct nullability declaration of system tables * chore: Update datafusion and arrow/parquet/arrow-flight * chore: Run cargo hakari tasks * fix: Update tests * fix: Update tests * fix: predicate pruning * fix: add some tests * fix: query_functions * fix: fix read_buffer test * fix: fix clippy Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-07 19:22:15 +00:00

1 2

79 Commits (7202dddab6d9ede46c74664c0675fe349da2fd13)