influxdb

Commit Graph

Author	SHA1	Message	Date
Andrew Lamb	c46e1c6347	chore: Update datafusion + arrow/parquet/arrow-flight to `17.0.0` (#5021 ) * fix: correct nullability declaration of system tables * chore: Update datafusion and arrow/parquet/arrow-flight * chore: Run cargo hakari tasks * fix: Update tests * fix: Update tests * fix: predicate pruning * fix: add some tests * fix: query_functions * fix: fix read_buffer test * fix: fix clippy Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-07 19:22:15 +00:00
Marco Neumann	aacdeaca52	refactor: prep work for #5032 (#5060 ) * refactor: remove parquet chunk ID to `ChunkMeta` * refactor: return `Arc` from `QueryChunk::summary` This is similar to how we handle other chunk data like schemas. This allows a chunk to change/refine its "believe" over its own payload while it is passed around in the query stack. Helps w/ #5032.	2022-07-07 13:21:48 +00:00
Sam Arnold	e193913ed3	fix: optimize field columns for all-time predicates (#5046 ) * fix: optimize field columns for all-time predicates Also fix timestamp range to allow selecting points at MAX_NANO_TIME * fix: clamp end to MIN_NANO_TIME for safety * refactor: add contains_all method to TimestampRange	2022-07-06 12:01:28 +00:00
Marco Neumann	16bd3e67c0	refactor: unify `apply_predicate_to_metadata` (#5030 ) Instead of using some hand-rolled timestamp-based logic (or just "unknown") all over the place, just use logic introduced in #5017. This requires slightly improved table summaries within the querier that at least has min/max for the timestamp column. For that, the former `IngesterChunk`-specific `calculate_summary` method was extended to `create_basic_summary` to include that data and is now also used by `QuerierParquetChunk`. Note: `QuerierRBChunk` already has detailled metrics that are provided by the read buffer implementation. Should we ever need even better pruning for `QuerierParquetChunk` (or `IngesterChunk`) then we _only_ need add extra data to the table summaries. Closes #4976. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-05 12:51:59 +00:00
Sam Arnold	03f456d8fd	fix: optimize tag_keys to go only to schema when predicate is empty (#4985 ) * docs: fix comment * test: add test for delete behaviour * fix: tag_keys optimization for empty predicate Also need to eliminate 'true' predicates from simplified predicate so is_empty works correctly. * refactor: use lit instead of spelling out literal true Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-05 12:45:25 +00:00
Marco Neumann	05189fdb00	fix: ensure panic is propagated throw `AdapterStream` (#4980 ) Prior to this change background tasks that we feed into `AdapterStream` can panic but that would just end the stream without any user-visible error (except for the panic message on stdout/stderr). This was found while developing #4964. I have proposed another fix in #4966 but found that I actually developed an existing solution a 2nd time: `watch_task`. But I also see a major issue with the existing API: one can create `AdapterStream` with ordinary tokio tasks that are not watched at all, leaving the burden to the implementor to check for that (and actually we forgot that in `parquet_file`). So this change takes a slightly different approach: The `AdapterStream` does NOT accept ordinary join handles any longer but requires that you pass a "watched task". The newly introduced `WatchedTask` does the same as we did manually before: wrapping a future into a tokio task, watch it and wrap the watcher into a task. It is now way more difficult to do anything stupid (sure you can still mix up the tasks and the channels, but we need at least some flexibility here to allow for "split" and potential future fan-in/out constructs). Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-30 07:57:58 +00:00
Andrew Lamb	01fb2e132d	chore: Update datafusion pin (#4969 ) * chore: Update datafusion pin * fix: Update for api * fix: Explicitly set coalsce batch size * fix: Update batch size as well * fix: update tests for new explain plan, and improved coercion	2022-06-29 17:52:37 +00:00
Nga Tran	cfcc4b8426	refactor: change level 1 to level 2 preparing for next design changes (#4954 ) * refactor: change level 1 to level 2 preparing for next design changes * fix: make level-2 consistent everywhere * chore: remove unused comments * refactor: change all the name level_1 to level_2 to completely replace 1 with 2 to amke everything consistent * chore: add correspinding constants for the comapction levels in the comments Co-authored-by: Dom <dom@itsallbroken.com>	2022-06-29 14:08:58 +00:00
Marco Neumann	3b78bf1c48	refactor: remove binary parquet file MD from compactor (#4938 ) * refactor: simplify sort key calculation * refactor: use schema from catalog instead from file * refactor: do not request parquet file MD in compactor * test: ensure that `QueryableParquetChunk` works correctly	2022-06-27 15:11:15 +00:00
Nga Tran	35dacf388b	feat: Compact now can split compacted results into multiple non-overlapped files based on config max file size (#4918 ) * feat: split times of compacting results based on the max file size * feat: cosider max file size while computing split time * test: tests for comput_split_time * feat: first step to teach the function split_the_steam to know how to split data into n streams using n-1 input PhysicalExprs * feat: make StreamSplitNode support a list of expression * docs: explain how StreamSplitNode works * feat: Teach compute_split_time to split a time range into many contiguous ranges and split compacted result into multiple non-overlapped files based on the config comapction_max_size_bytes * chore: cleanup * chore: clean up doc * chore: address review comments Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-23 18:54:03 +00:00
Andrew Lamb	776c34e03d	chore: Update datafusion (#4927 ) * chore: Update datafusion * fix: Update for API changes Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-23 09:30:43 +00:00
Andrew Lamb	164e75f328	refactor: Remove unused `Option` (#4839 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-15 10:24:51 +00:00
Andrew Lamb	e91d00b10c	chore: Update datafusion + `arrow`/`parquet`/`arrow-flight` to `16.0.0 (#4851 ) * chore: TEMP Update DataFusion to pre-release * chore: update arrow et al to 16.0.0 * chore: Run cargo hakari tasks * fix: update reader read_dictionary API * chore: Update to real Datafusion release * fix: Update parquet API * fix: update test Co-authored-by: CircleCI[bot] <circleci@influxdata.com>	2022-06-14 16:31:40 +00:00
Andrew Lamb	34e8659876	refactor: consolidate plan creation from `QueryChunk`s in `iox_query` (#4837 ) * refactor: consolidate plan creation from Chunks * docs: update docstrings Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-14 14:36:07 +00:00
Andrew Lamb	9fdbfb05e7	refactor: Use scan_and_filter in ReorgPlanner (#4822 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-10 17:31:25 +00:00
Nga Tran	13c57d524a	feat: Change data type of catalog partition's sort_key from a string to an array of string (#4801 ) * feat: Change data type of catalog Postgres partition's sort_key from a string to an array of string * test: add column with comma * fix: use new protonuf field to avoid incompactible * fix: ensure sort_key is an empty array rather than NULL * refactor: address review comments * refactor: address more comments * chore: clearer comments * chore: Update iox_catalog/migrations/20220607102200_change_sort_key_type_to_array.sql * chore: Update iox_catalog/migrations/20220607102200_change_sort_key_type_to_array.sql * fix: Rename migration so it will be applied after Co-authored-by: Marko Mikulicic <mkm@influxdata.com>	2022-06-10 13:31:31 +00:00
Andrew Lamb	11cec18edc	refactor: Move `scan_and_filter` into a `common` module for reuse (#4823 ) * refactor: remove unused error variants * refactor: move scan_and_filter into a module so it can be reused * docs: update comments about pruning	2022-06-10 11:15:47 +00:00
Andrew Lamb	2ec7764fdd	refactor: rename builder like predicate methods to be `with_` (#4808 ) * refactor: rename builder like predicate methods to be `with_` * fix: merge conflict Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-09 11:26:03 +00:00
Andrew Lamb	107e5f7284	docs: Add some docs about `StreamSplit` (#4810 ) * docs: Add some docs about `StreamSplit` * docs: fix struct name	2022-06-09 10:53:34 +00:00
Andrew Lamb	f34282be2c	fix: Do not run DataFusion optimizer pass twice (#4809 ) * fix: Do not run DataFusion optimizer pass twice * docs: improve docstring and logging	2022-06-08 21:01:22 +00:00
Andrew Lamb	afc1c12062	refactor: consolidate `PredicateBuilder` into `Predicate` (#4799 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-08 12:21:24 +00:00
Andrew Lamb	8e96a2721d	chore: Update datafusion (again) (#4788 ) * chore: Update datafusion * chore: Update imports * refactor: update API usage * refactor: clean up some uses of binary_expr * fix: remove unused export * fix: update explain output * chore: update more explain tests Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-07 08:17:56 +00:00
dependabot[bot]	e03bf94420	chore(deps): Bump tokio from 1.18.2 to 1.19.1 (#4783 ) Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.18.2 to 1.19.1. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.18.2...tokio-1.19.1) --- updated-dependencies: - dependency-name: tokio dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-06 14:15:12 +00:00
Carol (Nichols \|\| Goulding)	e1061ce623	docs: Don't attempt to list out chunk types exhaustively	2022-06-03 16:29:29 -04:00
Andrew Lamb	3592aa52d8	chore: Update datafusion + `arrow`/`parquet`/`arrow-flight` to `15.0.0` (#4743 ) * chore: Update datafusion + `arrow`/`parquet`/`arrow-flight` to `15.0.0` * chore: Update APIs * chore: Run cargo hakari tasks * feat: normalize parquet file metadata * chore: update size tests * chore: add docs on metadata stripping * chore: TEMP UPDATE TO DF BRANCH * chore: Update for new API * fix: Update to latest DF * fix: cargo hakari Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: Raphael Taylor-Davies <r.taylordavies@googlemail.com>	2022-06-03 10:32:26 +00:00
Ryan Russell	d279deddad	docs(various): Improve Readability (#4768 ) Signed-off-by: Ryan Russell <git@ryanrussell.org> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-02 18:01:06 +00:00
Andrew Lamb	95e6a8ed46	chore: Update datafusion (again) (#4679 ) * chore: Update datafusion deps * fix: fix for changes in ScalarValue * fix: fix for using TableSource rather than TableProvider Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-05-24 15:54:39 +00:00
Dom Dwyer	8ff1a73797	revert: fix: compaction deadlock This reverts commit `00b5c1b296`. This change reverts the StreamSplitExec plan to using bounded, blocking channels, with the possibility of deadlock added to the docs. This is now tolerable because of the concurrent consumption of both output partitions in the compactor.	2022-05-24 14:12:00 +01:00
Marco Neumann	addc45327e	fix: ensure that query tokio background tasks are canceled (#4643 ) * fix: ensure that query tokio background tasks are canceled While I am not entirely sure if this explains some of the memory leaks I am seeing in prod, not canceling the tasks correctly certainly makes debugging way harder and also renders certain form of throttling (e.g. max. concurrent queries) somewhat ineffective. Note that parquet file downloads are currently NOT canceled because tokios `spawn_blocking` cannot be canceled. * refactor: `Vec` -> `Option` * refactor: `spawn_blocking` creates a join handle, even though it is useless Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-05-20 07:18:52 +00:00
Marco Neumann	52346642a0	ci: fix cargo deny (#4629 ) * ci: fix cargo deny * chore: downgrade `socket2`, version 0.4.5 was yanked * chore: rename `query` to `iox_query` `query` is already taken on crates.io and yanked and I am getting tired of working around that.	2022-05-18 09:38:35 +00:00

... 3 4 5 6 7

330 Commits (4638b89d9334a0f28398bd46175e3f72741fc26a)