influxdb

Commit Graph

Author	SHA1	Message	Date
Dom Dwyer	3514667a04	docs: fix comments Fix two lines.	2023-05-16 15:25:47 +02:00
Dom Dwyer	9b211df053	test(ingester): persist & persistence metrics Adds a test that asserts (manually triggered) persistence generates a file, uploads it to object storage, inserts metadata into the catalog, and emits various persistence metrics.	2023-05-16 14:20:30 +02:00
Dom Dwyer	74210b6257	refactor(ingester): emit Parquet file metrics Register the ParquetFileInstrumentation as a PersistCompletionObserver in the persist subsystem.	2023-05-16 14:20:30 +02:00
Dom Dwyer	3114c67cf1	feat: persisted Parquet file attribute metrics Implements a PersistCompletionObserver that records various attributes of the generated and persisted Parquet file as histogram metrics to capture the distribution of values: * File size * Row count * Column count * Time range of data (max - min timestamp) These metrics will give us insight into the generated files instead of relying on intuition when tuning various configuration parameters.	2023-05-16 14:20:29 +02:00
Dom Dwyer	507ccc2eb5	refactor: parquet metadata in persist notification Changes the CompletedPersist notification data structure to embed the generated parquet file's metadata for completion observers.	2023-05-16 14:20:29 +02:00
Dom	06a2345708	Merge branch 'main' into dependabot/cargo/uuid-1.3.3	2023-05-16 10:36:46 +01:00
dependabot[bot]	3462e29859	chore(deps): Bump uuid from 1.3.2 to 1.3.3 Bumps [uuid](https://github.com/uuid-rs/uuid) from 1.3.2 to 1.3.3. - [Release notes](https://github.com/uuid-rs/uuid/releases) - [Commits](https://github.com/uuid-rs/uuid/compare/1.3.2...1.3.3) --- updated-dependencies: - dependency-name: uuid dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>	2023-05-16 02:00:24 +00:00
Carol (Nichols \|\| Goulding)	7268ea5c29	refactor: Extract a test helper function to create a basic table	2023-05-15 14:31:24 -04:00
Carol (Nichols \|\| Goulding)	57bedb1c2d	refactor: Extract a test helper function to create a basic namespace	2023-05-15 14:20:38 -04:00
Dom	6aa634c1b9	Merge branch 'main' into cn/move-peas	2023-05-15 13:29:42 +01:00
Kaya Gökalp	5fe8affb18	refactor: accept NamespaceName with Namespace create (#7774 ) Co-authored-by: Dom <dom@itsallbroken.com>	2023-05-15 10:03:55 +00:00
dependabot[bot]	fba9836f2a	chore(deps): Bump pin-project from 1.0.12 to 1.1.0 Bumps [pin-project](https://github.com/taiki-e/pin-project) from 1.0.12 to 1.1.0. - [Release notes](https://github.com/taiki-e/pin-project/releases) - [Changelog](https://github.com/taiki-e/pin-project/blob/main/CHANGELOG.md) - [Commits](https://github.com/taiki-e/pin-project/compare/v1.0.12...v1.1.0) --- updated-dependencies: - dependency-name: pin-project dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>	2023-05-15 02:02:32 +00:00
Carol (Nichols \|\| Goulding)	1770d0f4d8	fix: Move ingester-querier gRPC communication to its own crate	2023-05-12 13:28:30 -04:00
Carol (Nichols \|\| Goulding)	e60f703e95	fix: Rename router2 to router Including an alias and a test for continuing to support `influxdb_iox run router2`.	2023-05-09 22:01:39 -04:00
Carol (Nichols \|\| Goulding)	596673d515	refactor: Create a new ColumnsByName type to abstract over TableSchema columns And allow usage of just the columns when that's all that's needed without leaking the BTreeMap implementation detail everywhere	2023-05-09 14:54:58 +02:00
Dom	372ec8ef96	Merge branch 'main' into cn/delete-experiments	2023-05-09 10:17:30 +01:00
Carol (Nichols \|\| Goulding)	6506dd25a0	fix: Remove vestiges of topic	2023-05-08 20:24:56 -04:00
Carol (Nichols \|\| Goulding)	0849ce6f2b	fix: Rename ingester2_test_ctx to ingester_test_ctx	2023-05-08 20:23:02 -04:00
Carol (Nichols \|\| Goulding)	56916cf942	fix: Rename ingester2 to ingester	2023-05-08 12:03:05 -04:00
Carol (Nichols \|\| Goulding)	6c2ce01f1e	fix: Remove old ingester and ioxd_ingester	2023-04-07 11:06:37 -04:00
Marco Neumann	5f43f2a719	refactor: remove old query planning code (#7449 ) Closes #7406. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-04-06 16:05:08 +00:00
dependabot[bot]	66982f988b	chore(deps): Bump object_store from 0.5.5 to 0.5.6 (#7433 ) Bumps [object_store](https://github.com/apache/arrow-rs) from 0.5.5 to 0.5.6. - [Release notes](https://github.com/apache/arrow-rs/releases) - [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG-old.md) - [Commits](https://github.com/apache/arrow-rs/commits) --- updated-dependencies: - dependency-name: object_store dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Dom <dom@itsallbroken.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-04-04 08:43:34 +00:00
Carol (Nichols \|\| Goulding)	9a27736c65	docs: Fix some typos	2023-03-31 12:44:12 -04:00
dependabot[bot]	4eedb7ea77	chore(deps): Bump async-trait from 0.1.66 to 0.1.68 (#7374 ) * chore(deps): Bump async-trait from 0.1.66 to 0.1.68 Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.66 to 0.1.68. - [Release notes](https://github.com/dtolnay/async-trait/releases) - [Commits](https://github.com/dtolnay/async-trait/compare/0.1.66...0.1.68) --- updated-dependencies: - dependency-name: async-trait dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> * chore: Run cargo hakari tasks --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: CircleCI[bot] <circleci@influxdata.com>	2023-03-30 10:14:36 +00:00
dependabot[bot]	9cbcdc7672	chore(deps): Bump tokio from 1.26.0 to 1.27.0 (#7373 ) Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.26.0 to 1.27.0. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.26.0...tokio-1.27.0) --- updated-dependencies: - dependency-name: tokio dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-03-30 09:36:04 +00:00
Marco Neumann	20ec47b00b	feat: virtual chunk order col (#7240 ) * feat: introduce `CHUNK_ORDER_COLUMN_NAME` * feat: impl `ChunkOrder` everywhere * feat: `ChunkOrder::get` * feat: emit chunk order column for `RecordBatchesExec` * feat: `chunk_order_field` * feat: chunk order col for parquet chunks * feat: optional chunk order col handling for dedup --------- Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-03-17 09:39:21 +00:00
Carol (Nichols \|\| Goulding)	cc7c44f76a	chore: Upgrade to Rust 1.68 (#7175 ) * chore: Upgrade to Rust 1.68 * fix: Remove unnecessary into_iter, thanks Clippy! * fix: Use the size of the type, not a reference to the type... oops. Thanks clippy! * fix: Return block directly instead of creating a variable Thanks clippy! --------- Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-03-12 13:22:20 +00:00
dependabot[bot]	3689827793	chore(deps): Bump paste from 1.0.11 to 1.0.12 (#7130 ) Bumps [paste](https://github.com/dtolnay/paste) from 1.0.11 to 1.0.12. - [Release notes](https://github.com/dtolnay/paste/releases) - [Commits](https://github.com/dtolnay/paste/compare/1.0.11...1.0.12) --- updated-dependencies: - dependency-name: paste dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-03-06 10:40:41 +00:00
dependabot[bot]	8f3a9396d0	chore(deps): Bump async-trait from 0.1.64 to 0.1.66 (#7129 ) Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.64 to 0.1.66. - [Release notes](https://github.com/dtolnay/async-trait/releases) - [Commits](https://github.com/dtolnay/async-trait/compare/0.1.64...0.1.66) --- updated-dependencies: - dependency-name: async-trait dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-03-06 10:13:29 +00:00
dependabot[bot]	3256fcc72e	chore(deps): Bump object_store from 0.5.4 to 0.5.5 Bumps [object_store](https://github.com/apache/arrow-rs) from 0.5.4 to 0.5.5. - [Release notes](https://github.com/apache/arrow-rs/releases) - [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG-old.md) - [Commits](https://github.com/apache/arrow-rs/compare/object_store_0.5.4...object_store_0.5.5) --- updated-dependencies: - dependency-name: object_store dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>	2023-03-03 02:00:51 +00:00
dependabot[bot]	c538cac4ef	chore(deps): Bump tokio from 1.25.0 to 1.26.0 (#7107 ) * chore(deps): Bump tokio from 1.25.0 to 1.26.0 Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.25.0 to 1.26.0. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.25.0...tokio-1.26.0) --- updated-dependencies: - dependency-name: tokio dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> * chore: Run cargo hakari tasks --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: Dom <dom@itsallbroken.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-03-02 09:50:39 +00:00
Dom Dwyer	be661890c5	docs: module-level overviews Adds one-liner documentation of what each module contains - this is helpful to understand what is where, when looking at the rendered docs.	2023-03-01 14:27:05 +01:00
Carol (Nichols \|\| Goulding)	faae5eb438	chore: Rerun cargo hakari manage-deps	2023-02-27 11:56:15 +01:00
Andrew Lamb	f93baf7693	chore: Update DataFusion and `arrow` / `arrow-flight` / `parquet` to `33.0.0` (#7045 ) * chore: Update DataFusion and arrow/arrow-flight/parquet to 33.0.0 * fix: Update test output * fix: update more test output * fix: Update querier test output * chore: Run cargo hakari tasks * test: fix formatting Fix formatting of batch pretty printing. * test: fix formatting Fix formatting of batch pretty printing. * test: fix formatting for selector tests --------- Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: Dom Dwyer <dom@itsallbroken.com> Co-authored-by: Christopher Wolff <chris.wolff@influxdata.com>	2023-02-22 21:24:20 +00:00
kodiakhq[bot]	48e6cce746	Merge branch 'main' into cn/less-shard-id	2023-02-22 15:05:37 +00:00
dependabot[bot]	aa7d458a81	chore(deps): Bump tokio-stream from 0.1.11 to 0.1.12 (#7035 ) Bumps [tokio-stream](https://github.com/tokio-rs/tokio) from 0.1.11 to 0.1.12. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](https://github.com/tokio-rs/tokio/compare/tokio-stream-0.1.11...tokio-stream-0.1.12) --- updated-dependencies: - dependency-name: tokio-stream dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Dom <dom@itsallbroken.com>	2023-02-21 16:31:55 +00:00
Carol (Nichols \|\| Goulding)	fb5aa25c5b	fix: Separate most_recent_n into filtering by shard and not	2023-02-17 12:56:51 -05:00
Dom Dwyer	2d46a364dc	feat: namespace soft-delete support This commit adds initial support for "soft" namespace deletion, where the actual records & data remain, but are no longer queryable / writeable. Soft deletion is eventually consistent - users can expect to continue writing to and reading from a bucket after issuing a soft delete call, until the various components either restart, or have their caches flushed. The components treat soft-deleted namespaces differently: * router: ignore soft deleted namespaces * ingester: accept soft deleted namespaces * compactor: accept soft deleted namespaces * querier: ignore soft deleted namespaces * various gRPC services: ignore soft deleted namespaces This ensures that the ingester & compactor do not see rows "vanishing" from the database, and continue to make forward progress. Writes for the deleted namespace that are buffered in the ingester will be persisted as normal, allowing us to support "un-delete" operations where the system is restored to a the state at which the delete was issued (rather than loosing the buffered data). Follow-on work is required to ensure GC drops the orphaned parquet files after the configured GC time, and optimisations such as not compacting parquet from soft-deleted namespaces seems like a trivial win.	2023-02-13 12:01:35 +01:00
dependabot[bot]	0cbd9f6a82	chore(deps): Bump tokio-util from 0.7.5 to 0.7.7 (#6964 ) --- updated-dependencies: - dependency-name: tokio-util dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-02-13 10:10:53 +00:00
dependabot[bot]	c0c9b51b9e	chore(deps): Bump tokio-util from 0.7.4 to 0.7.5 (#6941 ) Bumps [tokio-util](https://github.com/tokio-rs/tokio) from 0.7.4 to 0.7.5. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](https://github.com/tokio-rs/tokio/compare/tokio-util-0.7.4...tokio-util-0.7.5) --- updated-dependencies: - dependency-name: tokio-util dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-02-10 09:42:00 +00:00
dependabot[bot]	0ecde75af5	chore(deps): Bump object_store from 0.5.3 to 0.5.4 (#6900 ) Bumps [object_store](https://github.com/apache/arrow-rs) from 0.5.3 to 0.5.4. - [Release notes](https://github.com/apache/arrow-rs/releases) - [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG-old.md) - [Commits](https://github.com/apache/arrow-rs/compare/object_store_0.5.3...object_store_0.5.4) --- updated-dependencies: - dependency-name: object_store dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-02-08 09:40:11 +00:00
Dom Dwyer	a633964f2b	feat(catalog): return max table limit in schema The maximum number of tables is part of the Namespace, which is already loaded in its entirety. This commit copies the value into the NamespaceSchema, making it available for the router to utilise.	2023-02-06 17:33:55 +01:00
Raphael Taylor-Davies	d3601a59f8	chore: update DataFusion, upgrade `arrow` `arrow-flight` and `parquet` to `32.0.0` (#6756 ) * chore: update DataFusion * fix: test * chore: format * chore: clippy * chore: update arrow * chore: arrow upgrade fallout * chore: Run cargo hakari tasks * chore: remove failing warm compaction test * fix: flight error propagation * chore: update parquet size * fix: Update error message * chore: Update parquet metadata test --------- Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: Andrew Lamb <alamb@influxdata.com> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-02-06 11:35:39 +00:00
Carol (Nichols \|\| Goulding)	38b204c604	fix: Update test expectation, need to investigate	2023-02-03 13:06:20 -05:00
Carol (Nichols \|\| Goulding)	30fea67701	fix: Move variables within format strings. Thanks clippy! Changes made automatically using `cargo clippy --fix`.	2023-02-03 13:06:17 -05:00
Carol (Nichols \|\| Goulding)	fbfbe1adb4	fix: Remove track_caller from async fns as it's a no-op Rust 1.67 now says: warning: `#[track_caller]` on async functions is a no-op = note: see issue #87417 <https://github.com/rust-lang/rust/issues/87417> for more information = note: `#[warn(ungated_async_fn_track_caller)]` on by default	2023-02-03 13:06:01 -05:00
dependabot[bot]	d0e6b16450	chore(deps): Bump bytes from 1.3.0 to 1.4.0 Bumps [bytes](https://github.com/tokio-rs/bytes) from 1.3.0 to 1.4.0. - [Release notes](https://github.com/tokio-rs/bytes/releases) - [Changelog](https://github.com/tokio-rs/bytes/blob/master/CHANGELOG.md) - [Commits](https://github.com/tokio-rs/bytes/compare/v1.3.0...v1.4.0) --- updated-dependencies: - dependency-name: bytes dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>	2023-02-01 00:30:56 +00:00
dependabot[bot]	6f032b1d57	chore(deps): Bump async-trait from 0.1.63 to 0.1.64 (#6769 ) Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.63 to 0.1.64. - [Release notes](https://github.com/dtolnay/async-trait/releases) - [Commits](https://github.com/dtolnay/async-trait/compare/0.1.63...0.1.64) --- updated-dependencies: - dependency-name: async-trait dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-01-31 10:18:27 +00:00
dependabot[bot]	ed7d02a225	chore(deps): Bump tokio from 1.24.2 to 1.25.0 Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.24.2 to 1.25.0. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](https://github.com/tokio-rs/tokio/commits/tokio-1.25.0) --- updated-dependencies: - dependency-name: tokio dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>	2023-01-30 01:57:27 +00:00
Nga Tran	b8a80869d4	feat: introduce a new way of max_sequence_number for ingester, compactor and querier (#6692 ) * feat: introduce a new way of max_sequence_number for ingester, compactor and querier * chore: cleanup * feat: new column max_l0_created_at to order files for deduplication * chore: cleanup * chore: debug info for chnaging cpu.parquet * fix: update test parquet file Co-authored-by: Marco Neumann <marco@crepererum.net>	2023-01-26 10:52:47 +00:00
Carol (Nichols \|\| Goulding)	4658510102	fix: For Ingester2, persist a particular namespace on demand and share MiniClusters This should hopefully help CI from running out of Postgres connections 😬 The old architecture will still need to be non-shared and persist everything.	2023-01-25 10:36:56 -05:00
dependabot[bot]	0114e7ee50	chore(deps): Bump async-trait from 0.1.61 to 0.1.63 (#6660 ) Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.61 to 0.1.63. - [Release notes](https://github.com/dtolnay/async-trait/releases) - [Commits](https://github.com/dtolnay/async-trait/compare/0.1.61...0.1.63) --- updated-dependencies: - dependency-name: async-trait dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-01-23 08:41:27 +00:00
Carol (Nichols \|\| Goulding)	8783623a19	docs: This method doesn't block until the data is persisted	2023-01-19 16:44:30 -05:00
Carol (Nichols \|\| Goulding)	59914906b6	fix: Only reset persist everything flag if data has been persisted	2023-01-19 16:44:30 -05:00
Carol (Nichols \|\| Goulding)	3dbaeedca6	feat: Try implementing the persist api in a diffferent way	2023-01-19 16:44:30 -05:00
Carol (Nichols \|\| Goulding)	81f5f3b75f	feat: Implement the persist service gRPC API on the old ingester for query_tests2 to use	2023-01-19 16:44:30 -05:00
Andrew Lamb	f639bf3e23	chore: refactor ingester to use upstream arrow-flight (#6622 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-01-19 15:16:13 +00:00
Andrew Lamb	8410998408	chore: Update datafusion to Jan 17, 2023 (2 / 2) and arrow/parquet `30.0.1` (#6604 ) * chore: Update datafusion to Jan 9, 2023 (2 / 2) and arrow/parquet `30.0.1` * chore: Update for changes in arrow ipc * chore: Run cargo hakari tasks Co-authored-by: CircleCI[bot] <circleci@influxdata.com>	2023-01-18 15:51:24 +00:00
dependabot[bot]	b49cc2e35e	chore(deps): Bump tokio from 1.24.0 to 1.24.1 (#6545 ) Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.24.0 to 1.24.1. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.24.0...tokio-1.24.1) --- updated-dependencies: - dependency-name: tokio dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-01-10 09:48:44 +00:00
dependabot[bot]	e31c84a794	chore(deps): Bump async-trait from 0.1.60 to 0.1.61 (#6533 ) Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.60 to 0.1.61. - [Release notes](https://github.com/dtolnay/async-trait/releases) - [Commits](https://github.com/dtolnay/async-trait/compare/0.1.60...0.1.61) --- updated-dependencies: - dependency-name: async-trait dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-01-09 07:44:35 +00:00
Nga Tran	b856edf826	feat: function to get parttion candidates from partition table (#6519 ) * feat: function to get parttion candidates from partition table * chore: cleanup * fix: make new_file_at the same value as created_at * chore: cleanup Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-01-06 16:20:45 +00:00
Raphael Taylor-Davies	e1036a0c63	refactor: cleanup schema boxing (#6511 ) * refactor: cleanup Schema boxing * chore: clippy	2023-01-06 10:57:39 +00:00
Andrew Lamb	6843eee1d2	feat: Extract encoding from `RecordBatch` --> `FlightData` from flight implementations (#6460 ) * feat: Extract encoding from `RecordBatch` --> `FlightData` from flight implementations Refactor existing flight server impl * fix: Apply suggestions from code review Co-authored-by: Carol (Nichols \|\| Goulding) <193874+carols10cents@users.noreply.github.com> * fix: fixup code review comments * fix: update for more details * fix: Update names / types Co-authored-by: Carol (Nichols \|\| Goulding) <193874+carols10cents@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-01-04 13:36:16 +00:00
Carol (Nichols \|\| Goulding)	7c6ccdb6d7	fix: Use keys and values functions. Thanks clippy!	2022-12-21 14:32:35 -05:00
Dom Dwyer	adc6fcfb04	feat(catalog): linearise sort key updates Updating the sort key is not commutative and MUST be serialised. The correctness of the current catalog interface relies on the caller serialising updates globally, something it cannot reasonably assert in a distributed system. This change of the catalog interface pushes this responsibility to the catalog itself where it can be effectively enforced, and allows a caller to detect parallel updates to the sort key.	2022-12-20 12:31:00 +01:00
dependabot[bot]	8478d41bcb	chore(deps): Bump paste from 1.0.10 to 1.0.11 (#6430 ) Bumps [paste](https://github.com/dtolnay/paste) from 1.0.10 to 1.0.11. - [Release notes](https://github.com/dtolnay/paste/releases) - [Commits](https://github.com/dtolnay/paste/compare/1.0.10...1.0.11) --- updated-dependencies: - dependency-name: paste dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-12-19 10:31:05 +00:00
dependabot[bot]	c72734473c	chore(deps): Bump async-trait from 0.1.59 to 0.1.60 (#6433 ) Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.59 to 0.1.60. - [Release notes](https://github.com/dtolnay/async-trait/releases) - [Commits](https://github.com/dtolnay/async-trait/compare/0.1.59...0.1.60) --- updated-dependencies: - dependency-name: async-trait dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-12-19 10:09:23 +00:00
kodiakhq[bot]	66c610f7b1	Merge branch 'main' into cn/ingester-persisted-file-count	2022-12-14 14:58:31 +00:00
dependabot[bot]	e108a8b6c9	chore(deps): Bump paste from 1.0.9 to 1.0.10 (#6384 ) Bumps [paste](https://github.com/dtolnay/paste) from 1.0.9 to 1.0.10. - [Release notes](https://github.com/dtolnay/paste/releases) - [Commits](https://github.com/dtolnay/paste/compare/1.0.9...1.0.10) --- updated-dependencies: - dependency-name: paste dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-12-13 06:03:05 +00:00
Carol (Nichols \|\| Goulding)	1c7f322a4e	feat: Keep track of and report number of Parquet files persisted Per partition and starting over each time the ingester restarts. Fixes #6334.	2022-12-12 11:45:00 -05:00
Carol (Nichols \|\| Goulding)	2fd2d05ef6	feat: Identify each run of an ingester with a Uuid And send that UUID in the Flight response for queries to that ingester run. Fixes #6333.	2022-12-08 17:22:52 -05:00
dependabot[bot]	1d38d400f0	chore(deps): Bump object_store from 0.5.1 to 0.5.2 (#6339 ) * chore(deps): Bump object_store from 0.5.1 to 0.5.2 Bumps [object_store](https://github.com/apache/arrow-rs) from 0.5.1 to 0.5.2. - [Release notes](https://github.com/apache/arrow-rs/releases) - [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG-old.md) - [Commits](https://github.com/apache/arrow-rs/compare/object_store_0.5.1...object_store_0.5.2) --- updated-dependencies: - dependency-name: object_store dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> * chore: Run cargo hakari tasks Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-12-06 07:53:54 +00:00
Marco Neumann	942a6100b5	fix: check schemas in `pretty_print_batches` (#6309 ) * fix: check schemas in `pretty_print_batches` I think most users of this function (and `assert_batches_eq`) assume that all batches have the same schema. If not, `pretty_print_batches` may either fail producing an actual table (some rows may have more or less columns) or silently produce a table that looks "alright". * fix: equalize schemas where it is required/desired Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-12-02 12:14:16 +00:00
Marco Neumann	ec2e72d223	test: simplify test executors (#6312 ) Have a single global test executor w/ reasonable defaults. Also don't require tests to join/await executor shutdowns (most tests forget this anyways and will get a runtime warning). Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-12-02 11:38:18 +00:00
Andrew Lamb	fc520e0c0f	refactor: Remove unecessary optimize_record_batch (#6262 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-11-29 13:35:46 +00:00
Dom Dwyer	9eafa9dbed	style: consistent import ordering Reorder all imports in the ingester to match a consistent order: * stdlib * external crates * intra-crate imports This helps prevent merge conflicts & keeps everything tidy.	2022-11-22 14:11:10 +01:00
Dom Dwyer	ee8b728c32	refactor: decouple Shard & BufferTree Splits out the nested tree of namespace -> tables -> partitions (referred to as the "buffer tree") from the Shard which previously held the namespace map. This allows the BufferTree to exist without a shard, or many trees to exist within a shard, etc.	2022-11-22 14:11:10 +01:00
Marco Neumann	e4c12fa6a5	fix: slice flight response batches (#6205 ) * fix: slice flight response batches Same as #6094 but for the Apache Flight interface. Ref https://github.com/influxdata/idpe/issues/16073. * refactor: use `RecordBatch::slice` Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-11-22 12:25:23 +00:00
dependabot[bot]	04c00bbb62	chore(deps): Bump bytes from 1.2.1 to 1.3.0 (#6199 ) Bumps [bytes](https://github.com/tokio-rs/bytes) from 1.2.1 to 1.3.0. - [Release notes](https://github.com/tokio-rs/bytes/releases) - [Changelog](https://github.com/tokio-rs/bytes/blob/master/CHANGELOG.md) - [Commits](https://github.com/tokio-rs/bytes/commits) --- updated-dependencies: - dependency-name: bytes dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-11-22 08:23:24 +00:00
Dom Dwyer	097f0acb85	refactor: move SequenceNumberRange Moves the SequenceNumberRange type out of "data" and into the root to be reused outside of the data module. This construct is universally useful across all the ingester code.	2022-11-21 16:11:55 +01:00
Dom Dwyer	1938c18c50	refactor: decouple DmlSink error type Allows different DmlSink implementations to return different error types. This allows for small, concise errors that are local to the DmlSink implementation and specific to it. This helps avoid bloated "kitchen sink" error types.	2022-11-21 15:29:13 +01:00
Dom Dwyer	64c9d87b9b	refactor: move DmlSink Extracts the DmlSink trait into its own module - it is independent of the Kafka handler and will be reused.	2022-11-21 15:02:24 +01:00
dependabot[bot]	a9db7581cd	chore(deps): Bump tokio from 1.21.2 to 1.22.0 (#6183 ) Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.21.2 to 1.22.0. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.21.2...tokio-1.22.0) --- updated-dependencies: - dependency-name: tokio dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-11-21 10:21:24 +00:00
Dom Dwyer	85c8d16680	refactor: add a message to unreachable!() Adds a message to say an impossible thing is impossible.	2022-11-18 17:33:58 +01:00
Dom Dwyer	9dc32f1c16	refactor: remove names from DML init Fixes conflicts introduced by #6170.	2022-11-18 17:31:56 +01:00
Dom	59b3c793d3	Merge branch 'main' into dom/ingester-rpc-write	2022-11-18 16:21:07 +00:00
Dom Dwyer	9351e01068	refactor: log dml apply errors Ensures DML apply errors are recorded in the ingester logs.	2022-11-18 16:48:31 +01:00
Dom Dwyer	16eed699fd	refactor: avoid needless partition key clone Moves the trace! invocation to before the DmlWrite init to avoid having to clone the partition key.	2022-11-18 16:46:14 +01:00
Carol (Nichols \|\| Goulding)	9751512d44	fix: Insert columns in schema in ingester tests where we have table names	2022-11-18 10:40:40 -05:00
Carol (Nichols \|\| Goulding)	02c3083192	fix: Remove table names from Dml operations	2022-11-18 10:40:38 -05:00
Dom Dwyer	90dd9906f6	feat(ingester): rpc write endpoint Adds a handler implementation of the gRPC WriteService to receive direct RPC writes from a router. This code is currently unused.	2022-11-18 16:36:19 +01:00
Dom Dwyer	229e2adbb1	refactor: split gRPC services into modules Splits the everything-grpc-in-one-file into smaller, per-service modules.	2022-11-18 15:51:54 +01:00
Nga Tran	49a9565240	feat: gRPC that creates namespace (#6103 ) * feat: create namespace API call in router Co-authored-by: Nga Tran <nga-tran@live.com> * chore: treat retention as ns except in CLI * fix: overflow in nanosecond calc * fix: retention test after changing it from hours to ns * chore: comment clarification in cli; better response type for error in ns API * fix: correct some rebase mistakes * chore: merge namespace create & create_with_retention; renamed ns create test helper fn & const * fix: ns autocreation test was wrong after rebase * fix: mem catalog has default 1hr retention, accidently removed in rebase * chore: remove mem catalogs default 1hr retention; make it settable in sets & router Co-authored-by: Luke Bond <luke.n.bond@gmail.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-11-18 13:02:12 +00:00
Nga Tran	6f7b1e2e26	feat: reject writes that are outside the retention period (#6148 ) * feat: reject writes that are outside the retention period * feat: add retention validator into handler stack * chore: Apply suggestions from code review Co-authored-by: Dom <dom@itsallbroken.com> * refactor: address review comments * test: unit tests fot retention validation * chore: address review comments * test: more unit tests and integration tests * refactor: make time inside retention period for emphemeral_mode test * fix: 2 hours Co-authored-by: Dom <dom@itsallbroken.com>	2022-11-17 20:55:58 +00:00
kodiakhq[bot]	1a49fa4864	Merge branch 'main' into cn/test-refactor	2022-11-17 14:01:36 +00:00
Dom Dwyer	5afe58d4d2	refactor: remove unused errors These error states are no longer possible after several refactors, but do not cause a "not used" lint because of macro magic.	2022-11-17 13:53:54 +01:00
Carol (Nichols \|\| Goulding)	d4715a9fde	fix: Simplify tests by using and creating more test helpers The most important part of this is creating the DmlWrites in one spot.	2022-11-16 21:48:43 -05:00
Carol (Nichols \|\| Goulding)	4e2b68a7c5	fix: Simplify test by not actually creating a catalog namespace This isn't actually needed for what this test is testing.	2022-11-16 21:06:44 -05:00
Carol (Nichols \|\| Goulding)	b6286767b0	fix: Validating the schema in ingester tests isn't necessary The router validates schemas; schema validation shouldn't be tested in the ingester	2022-11-16 21:05:51 -05:00
Carol (Nichols \|\| Goulding)	c7b9866483	feat: Have make_write_op take the table name as an argument to be more flexible	2022-11-16 21:05:46 -05:00
Carol (Nichols \|\| Goulding)	d0218fb025	refactor: Simplify tests by using make_write_op helper function	2022-11-16 21:00:10 -05:00
Carol (Nichols \|\| Goulding)	cac241b7ad	refactor: Extract shared test setup for ingester data tests	2022-11-16 21:00:10 -05:00
Carol (Nichols \|\| Goulding)	256ded7e00	fix: Move a NamespaceData test into its module	2022-11-16 21:00:10 -05:00
Marco Neumann	62851afc27	feat: add querier->ingester circuit breaker (#6147 ) * feat: add log ingester memory pressure persist * feat: add querier->ingester circuit breaker Closes #4608. * docs: explain high-level circuit breaker * docs: improve Co-authored-by: Andrew Lamb <alamb@influxdata.com> * test: add additional test assertion * refactor: upgrade info to warning log Co-authored-by: Andrew Lamb <alamb@influxdata.com>	2022-11-16 10:50:33 +00:00
Carol (Nichols \|\| Goulding)	c27d3a22d2	fix: Remove namespace argument from test helper function	2022-11-14 16:46:04 -05:00
Carol (Nichols \|\| Goulding)	3943faf998	fix: Remove namespace from DmlWrite and DmlDelete constructors	2022-11-14 16:46:04 -05:00
Carol (Nichols \|\| Goulding)	f78195f7c7	fix: Remove namespace name field from DmlWrite and DmlDelete But leave the argument in their constructors for now. Not all numbers in tests can be 42, Dom.	2022-11-14 16:46:04 -05:00
Carol (Nichols \|\| Goulding)	c203e8295f	test: Keep track of namespaces by ID in ingester TestContext	2022-11-14 16:46:04 -05:00
kodiakhq[bot]	6c1e9f04ef	Merge branch 'main' into dom/deferred-table-name	2022-11-14 18:22:46 +00:00
Carol (Nichols \|\| Goulding)	fd898cea2a	docs: Correct grammar and update outdated comment	2022-11-14 13:21:55 -05:00
dependabot[bot]	a969754819	chore(deps): Bump chrono from 0.4.22 to 0.4.23 (#6129 ) * chore(deps): Bump chrono from 0.4.22 to 0.4.23 Bumps [chrono](https://github.com/chronotope/chrono) from 0.4.22 to 0.4.23. - [Release notes](https://github.com/chronotope/chrono/releases) - [Changelog](https://github.com/chronotope/chrono/blob/main/CHANGELOG.md) - [Commits](https://github.com/chronotope/chrono/compare/v0.4.22...v0.4.23) --- updated-dependencies: - dependency-name: chrono dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> * refactor: chrono future compat Integer->timstamp conversions should not silently panic. Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Marco Neumann <marco@crepererum.net> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-11-14 13:34:09 +00:00
Dom Dwyer	413b7c8f4a	refactor: use table name from catalog Changes the TableData within the ingester to utilise a TableNameResolver to fetch the TableName via the catalog on demand / in the background, instead of using the table name sent over the write. This change causes the ingester to perform a catalog query in the background (or on demand) to resolve the table name. This is a pre-requisite for removing the table name from the write wire format.	2022-11-14 11:32:22 +01:00
Dom Dwyer	0df6c7877c	refactor: indirect DeferredLoad<TableName> init Like the NamespaceNameProvider, this commit adds a TableNameProvider to provide decoupled initialisation of a DeferredLoad<TableName> instead of hard-coding in a catalog instance / query code, and plumbs it into position to be used when initialising a TableName.	2022-11-14 11:32:21 +01:00
Dom Dwyer	8dae6d3994	perf(ingester): address tables by ID only Changes the buffer tree to address TableData by their ID only (removing support for addressing tables by their string names). This removes the double reference book keeping / twin indexes and associated overhead. As part of this change, the TableName is now wrapped in a DeferredLoad in preparation for removal of the names in the DmlOperation wire format. This commit also switches the map of TableData within the NamespaceData (the parent node) to use the ArcMap for faster lookups and DRY exactly-once initialisation.	2022-11-14 11:27:19 +01:00
Dom Dwyer	d8fc9ff258	test: fix testing deadlocks The MemCatalog suffers from deadlocks when attempting to obtain a second ref to RepoCollection: https://github.com/influxdata/influxdb_iox/issues/3859	2022-11-14 10:50:10 +01:00
Dom Dwyer	9e97866b48	refactor: internalise PartitionProvider Removes the need to leak the PartitionProvider outside of the ingester crate. This will allow the PartitionProvider to utilise a DeferredLoad<TableName> without having to make the DeferredLoad and TableName pub.	2022-11-14 10:50:05 +01:00
Marco Neumann	746032af0f	fix: compatibility after hashbrown upgrade - Some methods need explicit types - `hashbrown::HashMap` now takes 32 bytes, not 64	2022-11-11 13:25:39 -05:00
Jake Goulding	cc17e5a54b	refactor: use a workspace dependency for hashbrown	2022-11-11 13:25:39 -05:00
dependabot[bot]	5024523f00	chore(deps): Bump hashbrown from 0.12.3 to 0.13.1 Bumps [hashbrown](https://github.com/rust-lang/hashbrown) from 0.12.3 to 0.13.1. - [Release notes](https://github.com/rust-lang/hashbrown/releases) - [Changelog](https://github.com/rust-lang/hashbrown/blob/master/CHANGELOG.md) - [Commits](https://github.com/rust-lang/hashbrown/compare/v0.12.3...v0.13.1) --- updated-dependencies: - dependency-name: hashbrown dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>	2022-11-11 13:24:56 -05:00
Dom	2e7a1391f8	Merge branch 'main' into dom/deferred-namespace-name	2022-11-11 17:39:10 +00:00
Dom Dwyer	0f6470c390	refactor: use correct description for retries Use the correct description for namespace query retries.	2022-11-11 18:38:30 +01:00
Dom Dwyer	1e5d3f31af	docs: clearer code comments / docs Remove redundant comments & clarify returns.	2022-11-11 18:38:29 +01:00
Dom	18c86ca44f	refactor: named unused return Co-authored-by: Carol (Nichols \|\| Goulding) <193874+carols10cents@users.noreply.github.com>	2022-11-11 17:32:42 +00:00
Nga Tran	9c4266c503	refactor: first step to remove unused retention_duration (#6113 ) * refactor: first step to remove unused retention_duration * refactor: remove retenion_duration from update catalog Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-11-11 15:21:06 +00:00
Dom Dwyer	2521aedb6a	perf(ingester): address namespaces by ID only Removes reliance on string name identifiers for namespaces in the ingester buffer tree, reducing the memory usage of the namespace index and associated overhead. The namespace name is required (though unused by IOx) in the IoxMetadata embedded within a parquet file, and therefore the name is necessary at persist time. For this reason, a DeferredLoad is used to query the catalog (by ID) for the name, at some uniformly random duration of time after initialisation of the NamespaceData, up to a maximum of 1 minute later. This ensures the query remains off the hot ingest path, and the jitter prevents spikes in catalog load during replay/ingester startup. As an additional / easy optimisation, the persist code causes a pre-fetch of the name in the background while compacting, hiding the query latency should it not have already been resolved. In order to keep the the ingester buffer & catalog decoupled / easily testable, this commit uses a provider/factory trait NamespaceNameProvider and corresponding implementation (NamespaceNameResolver) in a similar fashion to the PartitionResolver, allowing easy mocking for tests, and composition for prod code, allowing future optimisations such as pre-fetching / caching the "hot" namespace names at startup. Internal string identifier removal is a pre-requisite for removing string identifiers from the write wire format (#4880).	2022-11-11 14:37:21 +01:00
Dom Dwyer	611acc1ad2	refactor: plumb in DeferredLoad<NamespaceName> Changes the ingester's buffer tree to use the deferred loading primitive to resolve the namespace name for NamespaceData. Note that the loader is initialised with the name in the first place - this commit just introduces the use of the deferred loading primitive, and doesn't change where the name is sourced from.	2022-11-11 14:37:20 +01:00
Dom Dwyer	3adc66a4b2	feat: Display impl for DeferredLoad This lets deferred loads be used in place of a non-differed T, such as log context fields. If the value has not been resolved, the display impl returns "<unresolved>".	2022-11-11 14:37:19 +01:00
Dom Dwyer	76ed1afb01	perf(ingester): support prefetch deferred loads Allow a caller to signal to the DeferredLoad that the value it may or may not have to materialise will be used imminently, optimistically hiding the latency of resolving the value (typically a catalog query).	2022-11-11 14:37:18 +01:00
Dom Dwyer	d1cfa9d08b	refactor: remove redundant shard data init Removes confusingly unused shard data initialisation.	2022-11-11 13:27:15 +01:00
Dom	02be6ba7e4	refactor: generic deferred loader helper (#6095 ) * refactor: generic deferred loader helper Splits the DeferredSortKey loader introduced in #5807 into two parts - a generic helper type that implements deferred/background loading of values, and SortKey specific logic for use with it. As this will be more widley used, this implementation features improved behaviour of the deferred loader under concurrent demand requests (multiple calls to get() do not attempt to concurrently resolve the value), as well as complete cancellation safety (cancelling the get() doesn't affect the liveness of the background task). * docs: doc-link & minor comment amendments Fixes naming, adds missing doc-links, and expands some code comments. * test: bound wait times to avoid hangs Adds timeouts to all .await of the code under test, ensuring tests don't hang if something goes wrong.	2022-11-10 19:16:51 +00:00
Nga Tran	93e11d4c91	chore: Revert "feat: flag partitions for delete (#6075 )" (#6111 ) This reverts commit `77a2541172`.	2022-11-10 17:01:39 +00:00
Carol (Nichols \|\| Goulding)	dd013c5402	fix: Update the expected size in a test I tracked down the source of the size difference to the difference in `mem::size_of::<mutable_batch::column::ColumnData>`. I believe this enum is now able to take advantage of this niche-filling optimization: <https://github.com/rust-lang/rust/pull/94075/>	2022-11-09 10:54:18 -05:00
Carol (Nichols \|\| Goulding)	fa46951524	fix: Remove needless deref done by auto deref, thanks Clippy!	2022-11-09 10:54:18 -05:00
Nga Tran	77a2541172	feat: flag partitions for delete (#6075 ) * feat: flag partition for delete * fix: compare the right date and time * chore: Run cargo hakari tasks * chore: cleanup * fix: typos * chore: rust style tidy ups in catalog Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: Luke Bond <luke.n.bond@gmail.com>	2022-11-09 12:06:23 +00:00
Dom	d9c97795fc	feat: use IDs in ingester query API (#6093 ) * refactor: NS+table ID (instead of name) in querier<>ingester * feat(ingester): use IDs for query API Changes the ingester to utilise the ID fields (instead of names) sent over the query wire message wrapped within the Flight API. BREAKING: this changes the "query-ingester" CLI command arguments which now expects the namespace & table IDs, rather than their names. * refactor(ingester): add more query logging context Updates the log messages during query execution to include more context fields. * style: remove unused import Co-authored-by: Marco Neumann <marco@crepererum.net>	2022-11-09 11:25:13 +00:00
Dom Dwyer	38b0459994	test: simplify tests / remove catalog Remove the catalog from tests that only initialised an implementation in order to call buffer_operation().	2022-11-08 17:02:01 +01:00
Dom Dwyer	226f14a97f	perf(ingester): remove table lookup query Now DML operations contain the table ID, the ingester has all necessary data to initialise the TableData buffer node without having to query the catalog. This also removes the catalog from the buffer_operation() call path, simplifying testing.	2022-11-08 17:00:44 +01:00
Dom Dwyer	225c3b97c1	perf(ingester): remove namespace lookup query Now DML operations contain the namespace ID, the ingester has all necessary data to initialise the NamespaceData buffer node without having to query the catalog.	2022-11-08 16:57:53 +01:00
Dom Dwyer	8ebea0df37	feat: table/namespace IDs in write protocol Expose the Table and Namespace IDs encoded within the serialised DML write (added in #6036). This makes the IDs available for use in the consumers, ending the transition period. This commit DOES NOT remove the strings sent over the wire.	2022-11-08 16:57:53 +01:00
Dom	b7f7ee6a13	Merge branch 'main' into dom/mutex-pushdown	2022-11-08 14:57:32 +00:00
Dom Dwyer	b73d07c22b	perf(ingester): granular per-partition locking This commit pushes the existing table-level mutex down to the partition. This allows the ingester to gather data from multiple partitions within a single table in parallel, and reduces contention between ingest/query workloads.	2022-11-08 15:45:59 +01:00
Dom Dwyer	b8181119e1	refactor: push down per-partition op skipping This moves the logic that skips operations that do not need to be applied to a partition during shard replay from the table level, to the partition level.	2022-11-08 15:45:52 +01:00
Dom Dwyer	4c8882e33a	docs: ref link to fix PR	2022-11-08 15:17:46 +01:00
Dom Dwyer	d71f023a57	refactor: inline helpers Inline the hash generation & key comparator.	2022-11-08 15:17:46 +01:00
Dom Dwyer	8dd7f2c603	refactor: accept owned key for insert() Changes the bounds on the ArcMap to accept an owned key, avoiding an extra allocation. Cleans up the bounds on other fn to ensure the borrowed key impl Eq and is the ref type of K.	2022-11-08 15:17:46 +01:00
Dom Dwyer	bbc2afe2a1	refactor: extract key equality checking Creates a shared fn for checking key equality to DRY the various chaining checks.	2022-11-08 15:17:46 +01:00
Dom Dwyer	8eaccd518b	fix: cross-thread map entry visibility This commit changes the ArcMap HashBuilder to use the same instance as the underlying HashMap hasher. This prevents divergent hashing across threads that MAY initialise a hasher with a different seed.	2022-11-08 15:17:46 +01:00
Dom Dwyer	66a6e8e929	test: cross-thread hashmap entry visibility At the time of this commit, this test fails. Performing a get() on a key previously inserted by another thread should not fail.	2022-11-08 15:17:46 +01:00
Dom Dwyer	fbd25a06d0	revert: push down per-partition op skipping This reverts commit `425fd46def`.	2022-11-08 10:31:51 +01:00
Dom Dwyer	7ac0857a28	revert: granular per-partition locking This reverts commit `79d24fa350`.	2022-11-08 10:31:37 +01:00
Dom Dwyer	79d24fa350	perf(ingester): granular per-partition locking This commit pushes the existing table-level mutex down to the partition. This allows the ingester to gather data from multiple partitions within a single table in parallel, and reduces contention between ingest/query workloads.	2022-11-07 13:45:03 +01:00
Dom Dwyer	425fd46def	refactor: push down per-partition op skipping This moves the logic that skips operations that do not need to be applied to a partition during shard replay from the table level, to the partition level.	2022-11-07 13:45:03 +01:00
kodiakhq[bot]	5e297e259b	Merge branch 'main' into dom/arcmap-get_or_insert_with	2022-11-07 11:47:00 +00:00
Andrew Lamb	034d9b371d	chore: Update datafusion and arrow/arrow-flight/parquet to `26.0.0` (#6061 ) * chore: Update datafusion and arrow/arrow-flight/parquet to `26.0.0` * fix: Update query_functions * fix: update for TimestampNanosecondArray API changes * fix: update for TimestampNanosecondArray API changes * chore: Update flatbuffers and remove rustsec warning * chore: Update text * fix: update more test * fix: Lock ahash to exactly 0.8.0 * fix: Update datafusion pin * chore: Run cargo hakari tasks Co-authored-by: Carol (Nichols \|\| Goulding) <carol.nichols@gmail.com> Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-11-07 11:01:58 +00:00
Dom Dwyer	2b9e0e173f	refactor: rename ArcMap::get_or_insert_with() Renames ArcMap::get_or_else() to ArcMap::get_or_insert_with() for consistency with the stdlib HashMap Entry.	2022-11-07 11:56:55 +01:00
Marco Neumann	f511db380c	refactor: remove table name from chunks (#6063 ) It should be always clear from the context to which table a chunk belongs. I think having a table name bound to a chunk goes back to a time where chunks had multiple tables. Helps with #6049. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-11-07 10:42:57 +00:00
YIXIAO SHI	586035b34d	chore: delete metric duplicate character (#6057 ) * chore: delete metric duplicate character * fix: failure ci test case * fix: failure ci test case * fix: failure ci test case Co-authored-by: Andrew Lamb <alamb@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-11-07 10:04:31 +00:00
Dom Dwyer	6fa48731aa	feat: NamespaceId in DmlDelete Changes the DmlDelete to contain the NamespaceId for which it should be applied, propagating this value over the wire. Like the existing IDs within the DmlWrite, these values are marked unsafe to use due to avoid the consumers utilising them accidentally during deployment. Unlike DmlWrite, the DmlDelete is completely unused, so this is less of an issue.	2022-11-03 13:57:40 +01:00
Dom Dwyer	30f69ce4f6	feat: ArcMap values() snapshot Returns a snapshot of the values within an ArcMap.	2022-11-03 11:49:01 +01:00
Dom Dwyer	17890a9906	feat: add ArcMap map type Implements a map of K -> Arc<V> with exactly-once initialisation semantics. This map can be used to ensure a given key maps to singleton instances of V; exactly what all the nodes in the ingester "buffer tree" of shard -> namespace -> table -> partition require. This impl contains unused funcs (silenced with an allow(dead_code)) due to it being picked from a future branch.	2022-11-03 11:29:09 +01:00
Andrew Lamb	4fb2843d05	refactor: Rename `schema::selection::Selection` to `schema::projection::Projection` (#6037 ) * chore: Rename `schema::selection::Selection` to `schema::projection::Projection` * fix: docs Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-11-02 18:15:04 +00:00
Dom Dwyer	ddd6ab0ba4	refactor(write_buffer): pass IDs in wire format This commit is part of a two-part change in order to add the table & namespace IDs to the write buffer wire format. This commit forms the first half; changing the producer to send the IDs. In this commit the new ID values are never read on the consumer side, ensuring there is no consumer dependency on them. This ensures they remain operational during a rollout, where the consumer may be updated to the latest code dependent on the IDs before the producer is updated to send them. This also ensures we have a window of time where where the consumers can be rolled back after being updated, and still handle replaying messages in Kafka.	2022-11-02 13:28:56 +01:00
Marco Neumann	45b3984aa3	refactor: simplify `QueryChunk` data access (#6015 ) * refactor: simplify `QueryChunk` data access We have only two types for chunks (now that the RUB is gone): 1. In-memory RecordBatches 2. Parquet files Loads of logic is duplicated in the different `read_filter` implementations. Also `read_filter` hides a solid amount of logic from DataFusion, which will prevent certain (future) optimizations. To enable #5897 and to simplify the interface, let the chunks return the data (batches or metadata for parquet files) directly and let `iox_query` perform the actual heavy-lifting. * docs: improve Co-authored-by: Andrew Lamb <alamb@influxdata.com> * docs: improve Co-authored-by: Andrew Lamb <alamb@influxdata.com> Co-authored-by: Andrew Lamb <alamb@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-11-02 08:18:33 +00:00
Marco Neumann	072439e428	refactor: mandatory `QueryChunkMeta::summary` (#5997 ) With #5963 merged, all chunks now provide a summary (even though it may not contain data for all columns). So let's make it mandatory, which also removes a few 🙈-style `.except(...)` calls. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-10-31 16:38:02 +00:00
Carol (Nichols \|\| Goulding)	dad1ad1318	feat: Add the catalog service to ingester, querier, and compactor So that `remote get` that uses the catalog service can work no matter what kind of server you contact.	2022-10-28 10:49:26 -04:00
Carol (Nichols \|\| Goulding)	53445af25d	chore: Alphabetize some dependencies I can't handle not knowing where to look for a dependency or knowing where to add a new dependency.	2022-10-28 10:34:25 -04:00
Andrew Lamb	e9d04ffcb5	feat: Log how long each persist plan takes to complete (#5989 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-10-28 13:52:39 +00:00
kodiakhq[bot]	1567227b49	Merge branch 'main' into dom/require-partition-key	2022-10-28 10:31:22 +00:00
Marco Neumann	8447d46093	refactor: remove `QueryChunkMeta::timestamp_min_max` (#5963 ) Use the table summary instead. This allows us to have a single mechanism that both IOx and DataFusion understand. This basically lifts the "basic table summary" mechanism that the querier uses to `iox_query` and let the compactor and ingester use the same mechanism. While not strictly necessary, simplifying the `QueryChunk[Meta]` interface helps with #5897. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-10-28 10:29:16 +00:00
Dom Dwyer	72a358e52f	refactor(dml): PartitionKey required for writes Changes the DmlWrite type to require a PartitionKey be specified, instead of accepting an Option. This requirement was already in place - the write buffer upheld an invariant that all writes contained a partition key value (was not "None") or it panicked at runtime when attempting to enqueue the write. It is now possible to encode this invariant in the type system, which is what this change does.	2022-10-28 10:57:30 +02:00
Dom Dwyer	5d2f4a0ad1	docs: fix issue URL for memory tracking bug	2022-10-27 10:15:15 +02:00
Dom Dwyer	f6416675c2	docs: mark hyperlink in rustdoc comments	2022-10-27 10:15:15 +02:00
Dom Dwyer	678fb81892	refactor(ingester): use partition buffer FSM This commit makes use of the partition buffer state machine introduced in https://github.com/influxdata/influxdb_iox/pull/5943. This commit significantly changes the buffering, and querying, of data from a partition, swapping out the existing "DataBuffer" for the new state machine implementation (itself simplified due to temporary lack of incremental snapshot generation, see #5944). This commit simplifies the query path, removing multiple types that wrapped one-another to pass around various state necessary to perform a query, with various query functions needing different types or combinations of types. The query path now operates using a single type (named "QueryAdaptor") that provides a queryable interface over the set of RecordBatch returned from a partition. There is significantly increased testing of the PartitionData itself, covering data in various states and the ordering of returned RecordBatch (to ensure correct materialisation of updates). There are also invariants upheld by the type system / compiler to minimise the complexities of working with empty batches & states, and many asserts that ensure (mostly existing!) invariants are upheld.	2022-10-27 10:15:15 +02:00
Carol (Nichols \|\| Goulding)	88c3a1f5e7	feat: Use workspace dep inheritance for the arrow-flight crate	2022-10-26 10:34:54 -04:00
Carol (Nichols \|\| Goulding)	3145e2c05b	feat: Use workspace dep inheritance for the arrow crate	2022-10-26 10:34:29 -04:00
Carol (Nichols \|\| Goulding)	44936f661a	feat: Use workspace dep inheritance for datafusion instead of shim crate	2022-10-26 10:33:56 -04:00
Carol (Nichols \|\| Goulding)	2e83e04eab	feat: Use workspace package metadata to reduce differences and repetition	2022-10-24 13:04:09 -04:00
Dom Dwyer	39f826518b	revert: use histogram to record TTBR This reverts commit `c63312ce12`. This change fixed a low-priority alert when there was no traffic flowing through the system. The loss in TTBR value fidelity due to bucketing is a greater concern as it affects live, high-volume clusters and hinders operational insight.	2022-10-24 10:27:22 +02:00
Dom Dwyer	7b3fa43209	refactor: disable incremental snapshot generation This commit removes the on-demand, incremental snapshot generation driven by queries. This functionality is "on hold" due to concerns documented in: https://github.com/influxdata/influxdb_iox/issues/5805 Incremental snapshots will be introduced alongside incremental compactions of those same snapshots.	2022-10-21 17:41:43 +02:00
Dom	db83053be7	Merge branch 'main' into dom/buffer-fsm	2022-10-21 16:32:54 +01:00
Dom Dwyer	8ca72ceff1	docs: fix state mod comments	2022-10-21 17:32:19 +02:00
Dom Dwyer	c8fdd76033	feat(ingester): partition buffer state machine This commit introduces code that is intended to replace the current implicit state machine used by PartitionData. The existing code is still in use, the new code is NOT used in this commit. A follow-up commit will switch over to minimise the diff. This change has two main goals; * encapsulation & simplification for callers * robust implementation so developing correct additions is easier This is a significant refactor of the partition buffering logic to encapsulate the various states of data (buffering, snapshot, persisting and the mixed states between them) within the Partition. This alleviates the rest of the system from having to be concerned with the differences between "buffering" data, and "unpersisted data", "snapshot data", "persisting data", "persisting with snapshots" etc - callers now invoke a method called get_query_data() and they are provided with all the relevant data for a partition. This abstraction change alone significantly reduces code and test complexity in the rest of the ingester. For the second goal, the new implementation leverages an explicit state machine, encoded using typestates. Typestate ensures compile-time correctness of transitions and method calls, and the explicit FSM itself helps ensure the system progresses in the desired manner - this fixes and helps prevent bugs caused by implicit states such as: https://github.com/influxdata/influxdb_iox/issues/5805 This state machine makes the system states explicit and self-descriptive, helping to reduce the cost of developer on-boarding (no prior knowledge of "how this bit works") and reduces ongoing developer burden. This explicit nature also de-risks adding new functionality - it should be relatively easy to add concurrent snapshot generation or incremental compaction without introducing bugs. The state transition logic is abstracted away from callers, minimising the overhead of this strategy.	2022-10-21 14:25:51 +02:00
Carol (Nichols \|\| Goulding)	59e1c1d5b9	feat: Pass trace id through Flight requests from querier to ingester Fixes #5723.	2022-10-20 08:55:30 -04:00
Andrew Lamb	83e3a96c19	fix: improve ttbr histogram metric description (#5909 ) Co-authored-by: Dom <dom@itsallbroken.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-10-20 09:03:58 +00:00
Dom	ea7b4a0de6	Merge branch 'main' into dom/ingester-integration-tests	2022-10-19 13:36:54 +01:00
Marco Neumann	eb5a661ab3	refactor: prep work for #5897 (#5907 ) * refactor: add ID to `ParquetStorage` * refactor: remove duplicate code * refactor: use dedicated `StorageId`	2022-10-19 11:54:42 +00:00
Dom Dwyer	0c0a38c484	refactor: more verbose shard reset logs Adds a little more context to the "shard reset" logs.	2022-10-19 12:28:02 +02:00
Dom Dwyer	40f1937e63	test: write buffer seeking tests Asserts write buffer seeking behaviour, including: * Seeking past already persisted data correctly * Skipping to next available op in non-contiguous offset stream * Skipping to next available op for dropped ops due to retention * Panics when seeking beyond available data (into the future) Removes a pair of tests that covered some of the above due to their tight coupling with ingester internals.	2022-10-19 12:28:02 +02:00
Dom Dwyer	7729494f61	test: write, query & progress API coverage This commit adds a new test that exercises all major external APIs of the ingester: * Writing data via the write buffer * Waiting for data to be readable via the progress API * Querying data and and asserting the contents This should provide basic integration coverage for the Ingester internals. This commit also removes a similar test (though with less coverage) that was tightly coupled to the existing buffering structures.	2022-10-19 11:51:15 +02:00
Dom Dwyer	b12d472a17	test(ingester): add integration TestContext Adds a test helper type that maintains the in-memory state for a single ingester integration test, and provides easy-to-use methods to manipulate and inspect the ingester instance.	2022-10-19 11:51:15 +02:00
Dom Dwyer	d0b546109f	refactor: impl converting IngesterQueryResponse An existing function to map the complex IngesterQueryResponse type to a simple set of RecordBatch existed in test code - this has been lifted onto an inherent method on the response type itself for reuse.	2022-10-19 11:51:15 +02:00
dependabot[bot]	b5574c07b7	chore(deps): Bump async-trait from 0.1.57 to 0.1.58 (#5904 ) Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.57 to 0.1.58. - [Release notes](https://github.com/dtolnay/async-trait/releases) - [Commits](https://github.com/dtolnay/async-trait/compare/0.1.57...0.1.58) --- updated-dependencies: - dependency-name: async-trait dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-10-19 09:40:26 +00:00
Andrew Lamb	d706f8221d	chore: Update datafusion and arrow / parquet / arrow-flight 25.0.0 (#5900 ) * chore: Update datafusion and `arrow` / `parquet` / `arrow-flight` 25.0.0 * chore: Update for structure changes * chore: Update for new projection pushdown * chore: Run cargo hakari tasks * fix: fmt Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-10-18 20:58:47 +00:00
Dom Dwyer	c63312ce12	refactor: use histogram to record TTBR Changes the TTBR metric from a gauge to a histogram so that observations maintain a time dimension.	2022-10-18 16:29:09 +02:00
Andrew Lamb	8021b8be0b	fix: Use Display rather than Debug when logging errors (#5859 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-10-14 14:43:11 +00:00
Luke Bond	475c8a0704	fix: only emit ttbr metric for applied ops (#5854 ) * fix: only emit ttbr metric for applied ops * fix: move DmlApplyAction to s/w accessible * chore: test for skipped ingest; comments and log improvements * fix: fixed ingester test re skipping write Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-10-14 12:06:49 +00:00
Carol (Nichols \|\| Goulding)	efb964c390	feat: Enforce table column limits from the schema cache (#5819 ) * fix: Avoid some allocations by collecting instead of inserting into a vec * refactor: Encode that adding columns is for one table at a time * test: Add another test of column limits * test: Add below/above limit tests for create_or_get_many * fix: Explicitly DO NOT check column limits when inserting many columns * feat: Cache the max_columns_per_table on the NamespaceSchema * feat: Add a function to validate column limits in-memory * fix: Provide more useful information when over column limits * fix: Swap types to remove intermediate allocation * docs: Explain the interactions of the cache and the column limits * test: Actually set up test that showcases column limit race condition * fix: Allow writing to existing columns even if table is over column limit Co-authored-by: Dom <dom@itsallbroken.com>	2022-10-14 11:34:17 +00:00
Andrew Lamb	9134ccd6c3	chore: Update datafusion again (#5855 ) * chore: Update datafusion * chore: Updates for changes in datafusion * chore: more updates * fix: update doc example Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-10-13 19:18:57 +00:00
kodiakhq[bot]	3039b5877b	Merge branch 'main' into dom/no-persist-lookups	2022-10-13 15:13:36 +00:00
Dom Dwyer	86d28d3359	fix: update cached sort key Once persist() has successfully updated the sort key in the catalog, set the partition sort key cache to reflect the new value.	2022-10-13 17:12:07 +02:00

... 2 3 4 5 6 ...

767 Commits (aac4166bf013cead2bf1da6769e079b32abfd806)