influxdb

Commit Graph

Author	SHA1	Message	Date
Andrew Lamb	de79619e71	chore: Update datafusion (#8355 ) * chore: Update datafusion pin * fix: Update for change in API * chore: Update plan --------- Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-07-31 15:41:00 +00:00
Joe-Blount	1bed99567c	chore: add DF metrics to compaction spans (#8270 ) * chore: add DF metrics to compaction spans * chore: update string for test verification * chore: update comment --------- Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-07-20 15:00:22 +00:00
Marco Neumann	0173c50ba1	fix: use correct error code when querier is shutting down (#8282 ) When a long running query is in process and the querier is shutting down, it might happen that the executor (= thread pool and tokio executor responsible for the CPU-bound DataFusion execution) is shut down while the query is running. From a "systems interaction" PoV I think this is totally fine and I would like to avoid some weird ref-counting. Or in other words: if the system is shutting down, shut it down. However the error was treated as "internal" which is not useful. The client should rather be informed that its server was gone and that it is OK (and desired) to retry. So as per <https://grpc.github.io/grpc/core/md_doc_statuscodes.html> I think this should signal "unavailable". This change wires the error code in such a way that the gRPC service layer can properly inspect it and then changes the error mapping. Ref https://github.com/influxdata/idpe/issues/17917 . Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-07-20 12:08:22 +00:00
Christopher M. Wolff	33e41fc5cb	fix: improve error for malformed gap fill query (#8252 ) * fix: improve error for malformed gap fill query * fix: code review feedback	2023-07-17 21:20:34 +00:00
Christopher M. Wolff	b916a89159	fix: recurse through SubqueryAlias when finding gap fill time range (#8249 )	2023-07-17 19:39:30 +00:00
Carol (Nichols \|\| Goulding)	a9b788b58f	feat: Collate chunks based on their partition hash id if they have it	2023-07-17 10:34:01 -04:00
Carol (Nichols \|\| Goulding)	313baca8b6	fix: Use sort_by rather than sort_by_key to use references These places are sorting by `PartitionId` currently, which implements `Copy`, but are about to be changed to be sorted on `PartitionHashId`, which does not implement `Copy`.	2023-07-17 09:56:55 -04:00
Carol (Nichols \|\| Goulding)	10a0f8e3bf	fix: Remove ::default() when constructing unit structs As recommended by https://rust-lang.github.io/rust-clippy/master/index.html#default_constructed_unit_structs	2023-07-14 10:50:55 -04:00
Dom Dwyer	7f7d1f2ee7	fix(ingester): projection without time column The ingester can project arbitrary columns at query time, and has no special requirement that the "time" column be part of that projection. Because the timestamp summary generation explicitly requires the time column to exist, it panics when there's no "time" column in the projection - this is a bit of a modelling mismatch more than anything.	2023-07-13 14:22:48 +02:00
Andrew Lamb	b24f9c81ba	chore: Update DataFusion pin, updates for API changed (#8199 )	2023-07-11 13:36:38 +00:00
Andrew Lamb	3ce11d8d66	chore: Update DataFusion (#8190 ) * chore: Update DataFusion * chore: Run cargo hakari tasks * fix: Update for API changes * fix: use display format * chore: Update explain plan output * fix: update plans --------- Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-07-10 09:54:50 +00:00
Marco Neumann	0bcf85d48c	refactor: de-dup code	2023-07-03 17:24:59 +02:00
Carol (Nichols \|\| Goulding)	8ebf390d9c	feat: Try to prune ingester partitions by partition key This is hacktastic.	2023-07-03 17:24:58 +02:00
Carol (Nichols \|\| Goulding)	b76fdab1a4	refactor: Move querier::df_stats to iox_query::chunk_statistics so it can be shared with ingester	2023-07-03 17:24:55 +02:00
Marco Neumann	ce6a2fb613	refactor: remove `QueryChunk::column_values` (#8111 ) Similar to #8109. This was once implemented by the RUB but as it stands right now, no chunk implements this anymore. If we ever want to bring this back, we should use the output of `QueryChunk::data` instead (i.e. use a data-based implementation instead of a per-chunk one). Closes #8096.	2023-07-03 09:03:21 +00:00
Andrew Lamb	4a1f8db254	chore: Update datafusion + arrow/arrow-flight/parquet to patched version `42.0.0` (#8113 ) * Revert "Revert "chore: Update datafusion + arrow/arrow-flight/parquet to version `42.0.0` (#8036)" (#8049)" This reverts commit `fb0674fc01`. * chore: Update Cargo and hakari * chore: Update to patched version --------- Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-06-30 12:59:31 +00:00
Marco Neumann	b982ee180e	refactor: remove `QueryChunk::column_names` (#8109 ) This interface was once specially implemented by the RUB. The only actual implementation of it is within the querier that just forwards it to a simple schema scan. Lift this semantic to `iox_query_influxrpc` instead so all the chunks can use it. If we ever want to optimize this again, we should use `QueryChunk::data` instead (i.e. instead of implementing it within the chunk it should use the data method and do something smart based on that). First half of #8096.	2023-06-29 13:43:10 +00:00
Marco Neumann	ca31c1eade	feat: hook up tokio metrics (#8050 ) * feat: metrics for main tokio runtime * feat: instrument executor tokio runtime --------- Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-06-29 11:11:44 +00:00
Marco Neumann	dcb4a9bb5c	refactor: fuse `QueryChunk` and `QueryChunkMeta` (#8107 ) Closes #8095.	2023-06-29 11:02:48 +00:00
Marco Neumann	4638b89d93	refactor: migrate retention to proper predicates (#8092 ) Do not (ab)use per-chunk delete predicates for the retention policy. Instead use a per-table predicate. This makes the code way cleaner, since the scoping is correct (i.e. delete predicates are a table-wide attribute, not a chunk-based one) and it is consistent time predicates that the user providers (e.g. via `WHERE time > x`). It also allows us to remove delete predicates (in their current, non-scalable form) from the query path. A potential future version would likely not use per chunk predicates (and "is processed" markers) but use the timestamp / chunk order to determine to which data the predicate should be applied. Note that the lowering of the retention policy changed slightly from ```text (time > (now() - retention)) AND (time < MAX) ``` to ```text time > (now() - retention) ``` Since the `MAX` cut is just an artifact of the lowering and was unnecessary. Closes #7409. Closes #7410.	2023-06-29 08:36:37 +00:00
dependabot[bot]	b15c6062a9	chore(deps): Bump tokio from 1.28.2 to 1.29.0 (#8100 ) Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.28.2 to 1.29.0. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.28.2...tokio-1.29.0) --- updated-dependencies: - dependency-name: tokio dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-06-28 13:18:08 +00:00
dependabot[bot]	990044dcb2	chore(deps): Bump indexmap from 1.9.3 to 2.0.0 (#8073 ) * chore(deps): Bump indexmap from 1.9.3 to 2.0.0 Bumps [indexmap](https://github.com/bluss/indexmap) from 1.9.3 to 2.0.0. - [Changelog](https://github.com/bluss/indexmap/blob/master/RELEASES.md) - [Commits](https://github.com/bluss/indexmap/compare/1.9.3...2.0.0) --- updated-dependencies: - dependency-name: indexmap dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> * chore: Run cargo hakari tasks --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-06-26 08:52:51 +00:00
Marco Neumann	7322f238fb	docs: query processing (#8033 ) * docs: query processing Closes https://github.com/influxdata/idpe/issues/17770 . * docs: apply recommendations Co-authored-by: Stuart Carnie <stuart.carnie@gmail.com> Co-authored-by: Andrew Lamb <alamb@influxdata.com> * docs: improve description of the flight protocol * docs: link `LogicalPlan` * docs: link `ExecutionPlan` * docs: improve wording * docs: improve query planning docs --------- Co-authored-by: Stuart Carnie <stuart.carnie@gmail.com> Co-authored-by: Andrew Lamb <alamb@influxdata.com>	2023-06-23 09:13:14 +00:00
dependabot[bot]	74a48a8f63	chore(deps): Bump itertools from 0.10.5 to 0.11.0 (#8060 ) * chore(deps): Bump itertools from 0.10.5 to 0.11.0 Bumps [itertools](https://github.com/rust-itertools/itertools) from 0.10.5 to 0.11.0. - [Changelog](https://github.com/rust-itertools/itertools/blob/master/CHANGELOG.md) - [Commits](https://github.com/rust-itertools/itertools/compare/v0.10.5...v0.11.0) --- updated-dependencies: - dependency-name: itertools dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> * chore: Run cargo hakari tasks --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-06-23 08:11:56 +00:00
Andrew Lamb	fb0674fc01	Revert "chore: Update datafusion + arrow/arrow-flight/parquet to version `42.0.0` (#8036 )" (#8049 ) This reverts commit `70ffedadc7`. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-06-22 11:03:25 +00:00
Andrew Lamb	70ffedadc7	chore: Update datafusion + arrow/arrow-flight/parquet to version `42.0.0` (#8036 ) * chore: Update datafusion + arrow/arrow-flight/parquet to version `42.0.0` * chore: Update for new APIs --------- Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-06-21 16:11:36 +00:00
Stuart Carnie	e10b8c93c8	chore: Update DataFusion and other dependencies (#8014 ) * chore: Update DataFusion pin * chore: Update API changes * chore: Don't use deprecated API * chore: Run cargo hakari tasks * chore: Update tests due to changes in logical plan nodes from DF update * chore: Fix broken links in docs * chore: Adjust changes to expected output --------- Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>	2023-06-16 10:39:36 +00:00
Andrew Lamb	5889c96501	chore: Update `datafusion` and other dependencies (#7981 ) * chore: Update DatFaFusion pin * chore: Update other dependencies * chore: Update hakari * fix: Update for API changes * fix: Update explain plan * fix: Update influxql plans * fix: rustdoc links --------- Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-06-16 09:48:55 +00:00
Marco Neumann	8ef1b64f6a	fix: remove de-dup even if we have many partitions (#8004 ) See optimizer pipeline here: `5d0bb68c5b/iox_query/src/physical_optimizer/mod.rs (L33-L35)` After generating the naive initial plan w/ many partitions, we must consider more than 100 partitions to split the key space. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-06-15 15:25:25 +00:00
Andrew Lamb	17c0d837b3	chore: Update DataFusion, arrow, object_store pins (#7942 ) * chore: Update DataFusion, arrow, object_store pins * chore: Update for hakari * chore: Update for new APIs * fix: update test --------- Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-06-07 17:08:31 +00:00
Andrew Lamb	f571aeb445	chore: Update DataFusion pin (#7916 ) * chore: Update DataFusion pin * chore: Update cargo * fix: update for API changes * fix: Update plans * chore: Update for new api * fix: Update plans * chore: Update for API changes more --------- Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-06-05 18:38:59 +00:00
Stuart Carnie	da682d8c53	chore: clippy 🧠	2023-06-04 07:23:11 +10:00
Marco Neumann	fa5011197c	refactor: migrate `iox_query` to use DataFusion statistics (#7908 ) This is the major part of #7470. Additional clean ups (e.g. to remove the actual types from `data_types`) will follow. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-06-02 09:18:59 +00:00
Marco Neumann	72ff001d33	feat: aggregator for DataFusion statistics (#7904 ) * feat: aggregator for DataFusion statistics Required to implement #7470, esp. to implement the statistics folding done within `RecordBatchesExec`. * docs: improve	2023-06-01 16:11:30 +00:00
Andrew Lamb	a48f681e56	feat(parquet): reduce and limit buffering when writing parquet files (#7880 ) * feat: limit buffering when writing parquet files ("combined solution") * chore: Run cargo hakari tasks --------- Co-authored-by: Raphael Taylor-Davies <r.taylordavies@googlemail.com> Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-05-31 13:27:32 +00:00
Andrew Lamb	1ff76b7bf2	chore: use workspace dependencies for `object_store`	2023-05-26 07:03:42 -04:00
Marco Neumann	bc18c6dc5f	refactor: re-land #7815 . (#7852 ) * refactor: consolidate pruning code Let's have a single chunk pruning implementation in our code, not two. Also removes a bit of crust from `QueryChunk` since it is technically no longer responsible for pruning (this part has been pushed into the querier for early pruning and bits for the `iox_query_influxrpc` for some RPC shenanigans). * test: regression test for incident * fix: chunk pruning * docs: add some test notes	2023-05-24 09:46:49 +00:00
Dom Dwyer	928a4d163e	build: remove unused dependencies from crates This commit fixes loads of crates (47!) had unused dependencies, or mis-configured dependencies (test deps as normal deps). I added the "unused_crate_dependencies" to all crates to help prevent this mess from growing again! https://doc.rust-lang.org/beta/nightly-rustc/rustc_lint_defs/builtin/static.UNUSED_CRATE_DEPENDENCIES.html This has the minor downside of false-positives when specifying dev-dependencies for test/bench binaries - these are files in /test or /benches (not normal tests). This commit includes a workaround, importing them in lib.rs (gated by a feature flag). I think the trade-off of better dependency management is worth it!	2023-05-23 14:55:43 +02:00
Marco Neumann	6c0f50a473	revert: refactor: consolidate pruning code (#7815 ) (#7847 ) This reverts commit `db9fe92981`. Likely causing an incident, see https://app.incident.io/incidents/267 .	2023-05-23 08:01:53 +00:00
Marco Neumann	db9fe92981	refactor: consolidate pruning code (#7815 ) Let's have a single chunk pruning implementation in our code, not two. Also removes a bit of crust from `QueryChunk` since it is technically no longer responsible for pruning (this part has been pushed into the querier for early pruning and bits for the `iox_query_influxrpc` for some RPC shenanigans).	2023-05-22 08:42:20 +00:00
Andrew Lamb	6344fe8c3f	chore: Add rationale for `clippy::future_not_send` (#7822 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-05-18 16:58:56 +00:00
Marco Neumann	7e64264eef	refactor: remove `RedudantSort` optimizer pass (#7809 ) * test: add dedup test for multiple partitions and ranges * refactor: remove `RedudantSort` optimizer pass Similar to #7807 this is now covered by DataFusion, as demonstrated by the fact that all query tests (incl. explain tests) still pass. The good thing is: passes that are no longer required don't require any upstreaming, so this also closes #7411.	2023-05-17 09:30:04 +00:00
Marco Neumann	931b4488bd	refactor: remove `SortPushdown` optimizer pass (#7807 ) DataFusion is now smart enough to do that using the builtin passes. No `EXPLAIN` tests regressed. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-05-17 06:26:31 +00:00
Nga Tran	ca12f1c03d	fix: correctly recurse in `ParquetSortness` (#7778 ) * test: reproducer for idpe_17556 * fix: `ParquetSortness` and partial opt 1. correctly handle cases where `ParquetSortness` would optimize one child branch but not the other 2. handle cases where `ParquetSortness` recusion should stop a bit clearer (using `TreeNodeRewriter`) 3. rename query tests to be a bit clearer 4. add test case with many (but not too many) duplicate files and an ingester (basically a prod use case where the compactor is slightly behind) --------- Co-authored-by: Marco Neumann <marco@crepererum.net> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-05-17 06:09:23 +00:00
Marco Neumann	d3ff945117	refactor: remove output sorting from scan provider (#7798 ) This is somewhat a left-over from the old phys. plan construction where we tried to fold in the sorts at the right place. Now the optimizer takes care of that, so we can just express this as a standard logical node (the same as SQL and InfluxQL). This makes the plan construction a bit cleaner since the actual scan provider only performs the minimal work that is required by DataFusion and the users (SQL, InfluxQL, reorg) request what they actually need. The tests in `iox_query::frontend::reorg::tests` that assert the tests still pass and proof that the actual physical plans are identical w/ this approach. Closes #7785. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-05-17 05:59:04 +00:00
Chunchun Ye	2bb6445668	chore: update DataFusion and arrow / arrow-flight / parquet to `39.0.0` (#7793 ) * chore: update DataFusion and arrow/parquet/arrow-flight to 39.0.0 * chore: update DataFusion and arrow/parquet/arrow-flight to 39.0.0 in workspace-hack/Cargo.toml * chore: Run cargo hakari tasks * chore: fix CI test and lint * chore: update csv schema * refactor: remove type-annotate for `Arc` --------- Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-05-16 13:42:26 +00:00
Andrew Lamb	7735e7c95b	chore: Update DataFusion again (#7777 ) * chore: Update datafusion again * chore: Run cargo hakari tasks --------- Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-05-15 12:38:45 +00:00
Andrew Lamb	2860d87fe1	chore: Update DataFusion (#7756 ) * chore: Update DataFusion pin * chore: Update explain plans * chore: Run cargo hakari tasks --------- Co-authored-by: CircleCI[bot] <circleci@influxdata.com>	2023-05-05 18:58:18 +00:00
Christopher M. Wolff	55b35367ac	test: add test for gap fill query missing time bounds (#7747 ) * test: add test for gap fill query missing time bounds * chore: update unit test --------- Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-05-04 21:01:45 +00:00
Christopher M. Wolff	05688799c4	fix: handle aliases in gapfill aggregate columns (#7725 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-05-03 15:20:14 +00:00

1 2 3 4 5 ...

349 Commits (c8242c74696bd849e8b296f7b255d909babd7bd5)