influxdb

Commit Graph

Author	SHA1	Message	Date
Marco Neumann	7907a2bae3	fix: column summary conversion for "unknown" TS (#4379 ) * fix: column summary conversion for "unknown" TS Both IOx and DataFusion have the same data model for min/max statistics: `Option<Option<i64>>` (or any other inner type) The interpretation is: 1. `None`: Value unknown. 2. `Some(None)`: Value known to be NULL. 3. `Some(Some(x))`: Value known and non NULL. The bug was that during the conversion from the IOx statistics type to the DataFusion statistics type for timestamps, case 1 was converted into case 2. Up until now this didn't make a difference between timestamps were basically known all the time, but during the development of NG there are cases where the timestamps are unknown (this might change, but the query engine should be correct w/o assuming that). * docs: explain test Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-04-22 07:44:55 +00:00
Andrew Lamb	e67cc9dbce	chore: Update datafusion again (#4385 ) * chore: Update datafusion * fix: Update imports Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-04-21 21:05:16 +00:00
Carol (Nichols \|\| Goulding)	c7a1c496cf	fix: incorrect overlapped grouping (#4082 ) * test: Failing test for finding overlapped groups * test: Failing test for query overlap too :( * fix: Group parquet files overlapped by time correctly Inspired by https://towardsdatascience.com/overlapping-time-period-problem-b7f1719347db Not sure what the real name for this algorithm is * refactor: Group items without an intermediate hashmap needed * chore: cleanup Co-authored-by: NGA-TRAN <nga-tran@live.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-04-21 18:51:30 +00:00
Andrew Lamb	73bed810da	chore: Update arrow, arrow-flight, parquet, tonic, prost, etc (#4357 ) * chore: Update datafusion * chore: Update arrow/arrow-flight/parquet to 12 * chore: update datafusion correctly * chore: Update prost, tonic, and dependents * fix: Fixup some api changes * fix: Update test output in db * fix: Update test output in parquet_file * fix: remove old pbjson types * fix: Add "--experimental_allow_proto3_optional" flag * chore: Run cargo hakari tasks * fix: compile error * chore: Update heappy Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-04-20 11:12:17 +00:00
Andrew Lamb	e3d83fe757	chore: update datafusion (#4342 ) * chore: update datafusion * fix: Update imports for change in datafusion organization	2022-04-19 13:38:12 +00:00
Nga Tran	2a601c3099	fix: Revert "chore: Revert "fx: Revert "fix: Revert "feat: Use the sort key stored in the catalog during compaction" (#4299 )" (#4303 )" (#4327 )" (#4328 ) * fix: Revert "chore: Revert "fx: Revert "fix: Revert "feat: Use the sort key stored in the catalog during compaction" (#4299)" (#4303)" (#4327)" This reverts commit `7e5d719027`. * chore: resolve merge conflict Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-04-18 15:27:39 +00:00
Nga Tran	8e2d158a37	test: deadlock test and add more debug log (#4319 ) * test: use Paul deadlock reproducer and add more debug log * test: remove compare many output rows * test: verify the test putput * chore: cleanup Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-04-14 18:06:22 +00:00
Nga Tran	7e5d719027	chore: Revert "fix: Revert "fix: Revert "feat: Use the sort key stored in the catalog during compaction" (#4299 )" (#4303 )" (#4327 ) This reverts commit `fe8d9948d5`.	2022-04-14 17:11:55 +00:00
Carol (Nichols \|\| Goulding)	fe8d9948d5	fix: Revert "fix: Revert "feat: Use the sort key stored in the catalog during compaction" (#4299 )" (#4303 ) This reverts commit `7ddbf7c025`. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-04-14 15:42:28 +00:00
Dom Dwyer	31fdeaaabc	refactor: log split worker panics at error level When the split background worker panics, it now causes an ERROR level log to be emitted.	2022-04-14 15:39:35 +01:00
Dom Dwyer	00b5c1b296	fix: compaction deadlock This commit resolves the compaction deadlock described in #4306. The deadlock occurs during StreamSplitExec execution, where a background worker is spawned to read input record batches and partition them into two groups. This code pushes the resulting split record batches into two channels - one for records that match a given predicate, and another channel for those that do not. These channels buffer at most 2 record batches each. The compactor that executes this plan reads the resulting partitions sequentially to completion. Completion is indicated by reading until the results stream ends, which ends when the underlying channel is closed, and therefore the split worker task must have finished and closed the results channel for the partition to be successfully read. While the compactor is reading from the first partition, the worker is attempting to push record batches into the second partition and blocks due to the channel capacity being reached. The worker never drops the channel for the first partition, so the compactor never finishes reading the first partition, and nothing is reading the second partition to unblock the worker. Deadlock!	2022-04-14 15:39:35 +01:00
Nga Tran	3070d78e8c	chore: add more compactor debug info (#4310 ) * chore: add more compactor debug info * chore: Apply suggestions from code review Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * chore: fix format Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-04-13 19:22:19 +00:00
Carol (Nichols \|\| Goulding)	7ddbf7c025	fix: Revert "feat: Use the sort key stored in the catalog during compaction" (#4299 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-04-13 14:11:10 +00:00
kodiakhq[bot]	21f748062e	Merge branch 'main' into cn/sort-in-compactor	2022-04-13 12:43:31 +00:00
Andrew Lamb	e96aed6949	chore: add comments and `trace` calls to query provider regarding sort keys (#4274 ) * chore: add comments and debug to query provider * docs: Update query/src/provider.rs Co-authored-by: Carol (Nichols \|\| Goulding) <193874+carols10cents@users.noreply.github.com> Co-authored-by: Carol (Nichols \|\| Goulding) <193874+carols10cents@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-04-12 16:36:39 +00:00
Carol (Nichols \|\| Goulding)	55fe3b8d50	feat: Use the sort key stored in the catalog during compaction Fixes #4249.	2022-04-11 14:09:45 -04:00
Andrew Lamb	be4ebe2563	feat: Add more context to error messages (#4263 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-04-11 10:51:50 +00:00
Nga Tran	f838cb78a2	fix: not to add IOxReadFilterNode for empty non-duplicated chunks (#4264 ) * fix: not to add IOxReadFilterNode for no data of non-duplicated chunks if there is already scan node for overlapped/duplicated chunks * refactor: address review comments * chore: Apply suggestions from code review Co-authored-by: Carol (Nichols \|\| Goulding) <193874+carols10cents@users.noreply.github.com> Co-authored-by: Carol (Nichols \|\| Goulding) <193874+carols10cents@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-04-08 21:03:22 +00:00
Andrew Lamb	bbbdcc75a8	feat: `QuerierDatabase::chunks` returns `Result` (#4260 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-04-08 18:54:17 +00:00
Andrew Lamb	34e65c23fa	fix: Update for signature change (#4252 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-04-08 11:21:07 +00:00
Carol (Nichols \|\| Goulding)	b16fcc284d	feat: Add new columns to the sort key during compaction Connects to #4196.	2022-04-06 09:31:42 -04:00
Carol (Nichols \|\| Goulding)	9043966443	docs: Fix some typos in comments as I noticed them	2022-03-31 16:34:47 -04:00
Andrew Lamb	22b24bdab3	chore: Update datafusion again (#4148 ) * chore: update datafusoon * refactor: Update for DataFusion API changes * chore: TEMP TEMP change df to local copy * chore: Update to datafusion again * fix: Update Cargo.lock * fix: logical conflict	2022-03-30 16:51:48 +00:00
Marco Neumann	20bbb88dc5	refactor: remove table name from `TableSummary` (#4170 ) This allows us to remove the table name from the low-level chunk representations (like `ParquetFile`, RUB, ...) since table names are already tracked by the higher-level data structures (e.g. catalog, catalog chunk) that manage the low-level chunk representations. This is similar to #4167. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-03-30 13:24:00 +00:00
Marco Neumann	2b76c31157	refactor: make statistics null counts optional (#4160 ) Min/max values and distinct counts are already optional, so let's make the null counts optional as well. This will be helpful for NG to deal w/ partial statistics (e.g. we only populate stats for the time column). Note that the total count is still mandatory, but we normally have the chunk/file-level row count at hand.	2022-03-29 17:47:57 +00:00
dependabot[bot]	17af5fcbd1	chore(deps): Bump tokio-util from 0.7.0 to 0.7.1 (#4154 ) * chore(deps): Bump tokio-util from 0.7.0 to 0.7.1 Bumps [tokio-util](https://github.com/tokio-rs/tokio) from 0.7.0 to 0.7.1. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](https://github.com/tokio-rs/tokio/compare/tokio-util-0.7.0...tokio-util-0.7.1) --- updated-dependencies: - dependency-name: tokio-util dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> * chore: Run cargo hakari tasks Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-03-29 08:39:02 +00:00
Andrew Lamb	5c69a3f43b	chore: Update deps: datafusion, arrow/arrow-flight/parquet to 11, zstd to 0.11 (#4119 ) * chore: update datafusion * chore(deps): Bump arrow from 10.0.0 to 11.0.0 Bumps [arrow](https://github.com/apache/arrow-rs) from 10.0.0 to 11.0.0. - [Release notes](https://github.com/apache/arrow-rs/releases) - [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG.md) - [Commits](https://github.com/apache/arrow-rs/compare/10.0.0...11.0.0) --- updated-dependencies: - dependency-name: arrow dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> * chore(deps): Bump arrow-flight from 10.0.0 to 11.0.0 Bumps [arrow-flight](https://github.com/apache/arrow-rs) from 10.0.0 to 11.0.0. - [Release notes](https://github.com/apache/arrow-rs/releases) - [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG.md) - [Commits](https://github.com/apache/arrow-rs/compare/10.0.0...11.0.0) --- updated-dependencies: - dependency-name: arrow-flight dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> * chore: update parquet to 11.0.0 * fix: error on create schema, test for same * fix: upgrade zstd * chore: Run cargo hakari tasks * fix: fix logical merge conflict * fix: hakari * fix: hakari * fix: update newly introduced dep Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-03-24 15:27:36 +00:00
Andrew Lamb	b83b000590	chore: Update datafusion (#4071 ) * chore: update to datafusion 5936edc2a94d5fb20702a41eab2b80695961b9dc * chore: Update apis to match datafusion changes	2022-03-22 13:17:41 +00:00
Marco Neumann	c9908b260c	refactor: dyn-dispatch database in query subsystem (#4083 ) * refactor: dyn-dispatch database in query subsystem This is similar to #4080 but concerns the database itself. For #3934. * docs: improve wording Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-03-22 09:15:52 +00:00
Marco Neumann	d1df95df87	refactor: dyn-dispatch chunks in query subsystem - this is what DataFusion is doing as well; it's also fast enough because the number of chunks in a query is not THAT massive (it's not like we are doing row-level dyn dispatching) - it simplifies abstracting over different databases - it allows us to drop our enum-based dispatching that we have for `DbChunk` and that we would also need for the querier (e.g. depending on if a chunk is backed by a parquet file or ingester data) - it likely speeds up compile times because the `query` is no longer contains massive amounts of generic code For #3934.	2022-03-21 12:47:54 +01:00
Marco Neumann	ca152e7934	refactor: avoid generics in `QueryDatabase` A step to make this trait object-safe. Ref #3934.	2022-03-21 10:45:05 +01:00
Marco Neumann	0071b85c22	refactor: make `ExecutionContextProvider` object-safe Ref #3934.	2022-03-21 10:40:53 +01:00
Marco Neumann	169fa2fb2f	refactor: make `QueryChunk` object-safe This makes it way easier to dyn-type database implementations. The only real change is that we make `QueryChunk::Error` opaque. Nobody is going to inspect that anyways, it's just printed to the user. This is a follow-up of #4053. Ref #3934.	2022-03-18 11:40:31 +01:00
Marco Neumann	0850a93f20	refactor: make `QueryDatabase::chunks` async (#4047 ) For OG we can determine the chunks w/o any IO, for NG however this might require a few catalog queries. This is likely not the last change of this sort, i.e. the whole schema handling is currently sync as well. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-03-17 12:55:25 +00:00
Nga Tran	5a29d070ea	feat: Implement the compact function for NG Compactor (#4001 ) * feat: initial implementation of compact a given list of overlapped parquet files * feat: Add QueryableParquetChunk and some refactoring * feat: build queryable parquet chunks for parquet files with tombstones * feat: second half the implementation for Compactor's compact. Tests will be next * fix: comments for trait funnctions fof QueryChunkMeta * test: add tests for compactor's compact function * fix: typos * refactor: address Jake's review comments * refactor: address Andrew's comments and add one more test for files in different order in the vector Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-03-11 20:25:19 +00:00
Andrew Lamb	2c3d30ca32	chore: Update datafusion, arrow, flight and parquet (#4000 ) * chore: Update datafusion, arrow, flight and parquet * fix: api change * fix: fmt * fix: update test metadata size * fix: Update sizes in parquet test * fix: more metadata size update	2022-03-10 12:24:47 +00:00
Marco Neumann	77f6153f72	refactor: remove `QueryDatabase::chunk_summaries` (#3977 ) - This is not used by the query engine at all. - The query engine should not care about ALL chunks but only about the chunks it gets via `QueryDatabase::chunks` (which includes a table name and a predicate). - All other users of that API are NOT really query-related.	2022-03-08 11:34:26 +00:00
Marco Neumann	5cc1c697fc	refactor: remove `QueryDatabase::partition_addr` (#3976 ) - This was not actually used by the query engine. - The query engine doesn't have a concept of a "partition", it only cares about chunks. - Unbound access to all partitions in the database is quite expensive (esp. on NG).	2022-03-08 11:17:31 +00:00
Raphael Taylor-Davies	80fb75d90b	feat: add a flag to enable per-partition tracing (#3928 ) * feat: add a flag to enable per-partition tracing * chore: rename constant * feat: use BooleanFlag and cache result	2022-03-07 13:49:23 +00:00
Raphael Taylor-Davies	7b28fb4366	feat: improve trace naming (#3931 ) * feat: improve trace naming * test: test span description Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-03-07 11:49:19 +00:00
Andrew Lamb	9d8bceccbf	test: Add test to verify deduplicating is working (#3937 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-03-04 20:05:17 +00:00
Andrew Lamb	e09f39d6a0	chore: Update datafusion (#3943 ) * chore: Update datafusion * refactor: update for new datafusion * chore: Run cargo hakari tasks Co-authored-by: CircleCI[bot] <circleci@influxdata.com>	2022-03-04 19:37:46 +00:00
Raphael Taylor-Davies	e304613546	feat: include trace ID in query log (#3912 ) (#3923 ) * feat: include trace ID in query log (#3912) * chore: fmt * chore: lint	2022-03-03 17:50:49 +00:00
Edd Robinson	de7c46c9bb	feat: add read_window_aggregate tracing	2022-03-03 14:30:27 +00:00
Edd Robinson	ea32bc366a	feat: add read_group tracing	2022-03-03 14:27:01 +00:00
Edd Robinson	32baaa1ee7	feat: add tracing to field_columns	2022-03-03 14:27:01 +00:00
Edd Robinson	787a848bf5	feat: add tracing for tag_values	2022-03-03 14:27:01 +00:00
Edd Robinson	6a6fbf73ae	feat: add tracing support tag_keys	2022-03-03 14:27:01 +00:00
Edd Robinson	998e205c2c	feat: trace table_names	2022-03-03 14:27:01 +00:00
Edd Robinson	301ae886ce	feat: add tracing down to the chunk level (#3804 ) * refactor: wire exectution context to Deduplicator * feat: example trace to chunk read_filter * refactor: make execution context required * refactor: expose metadata API * refactor: more span context for chunk read_filter * refactor: fix build * refactor: push context into result stream * refactor: make executor optional	2022-03-03 14:27:00 +00:00

1 2 3 4 5 ...

570 Commits (77d8967c8e04be7357abca6bd8b264bb8a25dd51)