influxdb

Commit Graph

Author	SHA1	Message	Date
Dom Dwyer	d1ca29c029	fix(ingester): connect to assigned Kafka partition During initialisation, the ingester connects to the Kafka brokers - this involves per-partition leadership discovery & connection establishment. These connections are then retained for the lifetime of the process. Prior to this commit, the ingester would establish a connection to all partition leaders for a given topic. After this commit, the ingester connects to only the partition leaders it is going to consume from (for those shards that it is assigned.)	2022-09-07 13:21:06 +02:00
Marco Neumann	adeacf416c	ci: fix (#5569 ) * ci: use same feature set in `build_dev` and `build_release` * ci: also enable unstable tokio for `build_dev` * chore: update tokio to 1.21 (to fix console-subscriber 0.1.8 * fix: "must use"	2022-09-06 14:13:28 +00:00
dependabot[bot]	9ba9128887	chore(deps): Bump httparse from 1.7.1 to 1.8.0 (#5516 ) Bumps [httparse](https://github.com/seanmonstar/httparse) from 1.7.1 to 1.8.0. - [Release notes](https://github.com/seanmonstar/httparse/releases) - [Commits](https://github.com/seanmonstar/httparse/compare/v1.7.1...v1.8.0) --- updated-dependencies: - dependency-name: httparse dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-09-01 10:27:36 +00:00
Carol (Nichols \|\| Goulding)	74c9529062	fix: Rename KafkaPartition to ShardIndex	2022-08-29 14:07:18 -04:00
Dom Dwyer	175cae2f56	feat: capture Kafka message size distribution Adds instrumentation to the low-level (post-aggregation) Kafka client, capturing the uncompressed, approximate message size (calculated as the sum of all Record::approximate_size() returns, ignoring largely static framing overhead).	2022-08-29 14:08:51 +02:00
Dom Dwyer	80eb8efbe5	refactor: WARN for full aggregators Emit a WARN log line whenever an aggregator becomes full - this will help identify tuning opportunities.	2022-08-29 14:08:51 +02:00
Marko Mikulicic	4beb721a9a	fix: Revert Bump dotenvy from 0.15.1 to 0.15.2 (#5450 ) (#5455 ) This reverts commit `84acbd2fad`. Closes #5454 Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-08-24 09:10:09 +00:00
dependabot[bot]	84acbd2fad	chore(deps): Bump dotenvy from 0.15.1 to 0.15.2 (#5450 ) Bumps [dotenvy](https://github.com/allan2/dotenvy) from 0.15.1 to 0.15.2. - [Release notes](https://github.com/allan2/dotenvy/releases) - [Changelog](https://github.com/allan2/dotenvy/blob/master/CHANGELOG.md) - [Commits](https://github.com/allan2/dotenvy/commits/v0.15.2) --- updated-dependencies: - dependency-name: dotenvy dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-08-23 11:24:42 +00:00
Dom Dwyer	8b054c14a8	test: update batching tests for new aggregator Previously aggregated writes were merged into a single Kafka Record - this meant that all merged ops would be placed into the same Record, and therefore receive the same sequence number once published to Kafka. The new aggregator batches at the Record level, therefore aggregated writes now get their own distinct sequence number. This commit updates the batching tests to reflect this new sequence number assignment behaviour.	2022-08-22 12:59:43 +02:00
Dom Dwyer	312def5acd	refactor: assert writes partitioned The previous aggregator impl would assert that writes had been partitioned before aggregating them (or rather, that the DML write had a partition key assigned). This should be true for all writes passing through the write buffer, irrespective of which aggregator is used, therefore this assert is moved "up" into the write buffer itself.	2022-08-22 12:52:37 +02:00
Dom Dwyer	a66d16576d	refactor: use dyn TimeProvider in RecordAggregator For ease of integration with the existing tests, use dyn TimeProvider in the RecordAggregator.	2022-08-22 12:50:50 +02:00
Dom Dwyer	37727105b5	refactor: remove redundant timestamp conversions Removes the existing, copy-pasted timestamp conversion code to remove redundant conversions.	2022-08-22 11:06:36 +02:00
Dom Dwyer	59c2d84d1e	refactor: use RecordAggregator Replaces the DmlAggregator with the simpler RecordAggregator. Metrics gathered as part of #5323 shows there is practically no benefit to the additional complexity of the DmlAggregator over the simpler RecordAggregator impl.	2022-08-18 17:12:23 +02:00
Dom Dwyer	30e23f6e82	feat: simple RecordAggregator for write buffer This commit adds a new write buffer aggregator used by rskafka to increase the size of Kafka messages on the wire. The Kafka write buffer impl is the only impl to perform aggregation. This Aggregator impl maps IOx-specific DML operations to rskafka Records with no additional processing - it can be thought of as an IOx-specific adaptor over rskafka's RecordAggregator. By delegating batching of Record instances to rskakfa's simple RecordAggregator, we minimise code complexity / bug surface area / LoC.	2022-08-18 11:42:58 +01:00
Dom Dwyer	bd88ac6149	refactor: parallelise Kafka partition client init Changes the Kafka write buffer impl to parallelise initialisation of the PartitionClient instances. Now that the PartitionClient constructor also performs leader discovery (using cached metadata, influxdata/rskafka#164) and establishes a broker connection (influxdata/rskafka#166) executing them in parallel will cause a proportional decrease in the time taken to bring IOx up.	2022-08-12 14:45:23 +02:00
Dom	dbe6b4947c	Merge branch 'main' into dom/bump-rskafka	2022-08-12 09:20:37 +01:00
Dom Dwyer	7118334774	build: bump rskafka Bump rskafka to pick up connection pre-warming: https://github.com/influxdata/rskafka/pull/166	2022-08-12 10:13:25 +02:00
Carol (Nichols \|\| Goulding)	3a501a4a10	fix: Remove an immediate ref to a deref Caught by clippy now. https://rust-lang.github.io/rust-clippy/master/index.html#borrow_deref_ref	2022-08-11 15:04:14 -04:00
Dom Dwyer	faa1db9a24	build: bump rskafka Bump rskafka & fix minor breakage in order to pick up client pre-warming: https://github.com/influxdata/rskafka/pull/165	2022-08-11 17:26:06 +02:00
Dom Dwyer	7174f38f3f	build: bump rskafka Bumps rskafka to HEAD to pick up: https://github.com/influxdata/rskafka/pull/164	2022-08-11 14:00:21 +02:00
Marco Neumann	6b8b922fe7	fix: do not loose data when Kafka reports that offset is above watermark (#5322 ) * fix: do not loose data when Kafka reports that offset is above watermark This can happen in certain cluster rebalance settings. This is also linked to https://github.com/influxdata/rskafka/issues/147 but for the upstream issue I currently have no idea how to fix it, so let's at least harden IOx against it. Fixes #5128. * refactor: panic for `SequenceNumberAfterWatermark`	2022-08-11 07:32:04 +00:00
Andrew Lamb	16ddc5efc6	chore: Update datafusion / arrow/parquet/arrow-flight and prost/tonic ecosystem (#5360 ) * chore: Update datafusion and arrow * chore: Update Cargo.lock * chore: update to Decimal128 * chore: Update tonic/prost/pbjson/etc * chore: Run cargo hakari tasks * fix: doctest in generated types Co-authored-by: CircleCI[bot] <circleci@influxdata.com>	2022-08-09 17:30:44 +00:00
Dom Dwyer	87e4290e1f	refactor(write_buffer): database_name -> topic_name Previously IOx mapped a single database to a single kafka topic - this is no longer the case, so referring to the kafka topic name as the "database name" name is confusing.	2022-08-08 15:24:35 +02:00
Dom Dwyer	c133cf22c6	refactor: use kafka produce instrumentation This commit changes the IOx write buffer initialisation code to add the KafkaProducerMetrics instrumentation to the per-partition Kafka clients.	2022-08-08 15:24:35 +02:00
Dom Dwyer	284a3069ce	feat: Kafka client produce() instrumentation Adds a decorator over the underlying kafka client to capture the latency distribution of the low-level kafka writes, independent of the aggregation/DML batching framework that sits "above" this client. The latency measurements include the serialisation overhead, protocol overhead, and actual network I/O.	2022-08-08 15:24:35 +02:00
kodiakhq[bot]	0ba3ae1e0d	Merge branch 'main' into dom/instrument-kafka-produce	2022-08-04 15:13:49 +00:00
Dom Dwyer	77fd967517	feat: instrument kafka aggregated DML batch size The Kafka write buffer implementation (and only the Kafka impl) merges together successive DML writes for the same namespace & partition within a window of time. This commit records the number of DML writes that have been merged together to form a single batched op before it is dispatched to Kafka.	2022-08-04 16:48:56 +02:00
Dom Dwyer	1cad7e13ec	build: bump rskafka to latest Includes minor code changes needed to support the rskafka HEAD commit. Breaking changes made in https://github.com/influxdata/rskafka/issues/160	2022-08-04 15:02:11 +02:00
Marco Neumann	273b3cc165	chore: replace `dotenv` with `dotenvy` (#5285 ) The latter one is a maintained fork. This avoids having both crates after #5282.	2022-08-03 12:41:38 +00:00
Marco Neumann	87bdabb38a	feat: log external span for query gRPC requests (#5187 ) * feat: log external span for query gRPC requests This should simplify the correlation with our binlog data. * refactor: address review comments Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-28 12:53:12 +00:00
dependabot[bot]	9b67de2f43	chore(deps): Bump tokio from 1.19.2 to 1.20.0 Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.19.2 to 1.20.0. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.19.2...tokio-1.20.0) --- updated-dependencies: - dependency-name: tokio dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>	2022-07-14 01:21:43 +00:00
Marco Neumann	9e09f77a45	fix: fix overeager Kafka message flushing (#5113 ) * test: add (failing) test to ensure that interleaved partition writes are aggregated correctly * fix: fix overeager Kafka message flushing	2022-07-13 12:32:03 +00:00
Andrew Lamb	280698f9f5	feat: Increase `DmlWrite` operation throughput by pipelining kafka read and decode (#5066 ) * feat: pipeline kafka read and decode * docs: Update write_buffer/src/kafka/mod.rs Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-08 13:18:21 +00:00
Andrew Lamb	8f5210ea3e	test: add test for "duration since production" in kafka `write_buffer` implementation (#5043 ) * test: add test for timestamps in kafka write buffer * refactor: move timestamp batching test to generic tests Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-07 10:27:27 +00:00
Andrew Lamb	5944f27e77	refactor: avoid write buffer cloning in `store_operation` (#5042 ) * refactor: avoid write buffer cloning in `store_operation` * fix: update usage Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-06 06:57:03 +00:00
Andrew Lamb	0c705fecf1	refactor: Clean up timestamp handling logic and avoid a conversion (#4988 ) * refactor: Clean up timestamp handling logic * fix: Remove unused timestamp function Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-01 01:07:21 +00:00
Marco Neumann	751bdce88a	fix: pass write buffer tests w/o Kafka (#4923 ) Fixes interaction of `maybe_skip_kafka_integration!` and `should_panic` by ensuring that `maybe_skip_kafka_integration!` panics to skip `should_panic` tests. Without that it is not possible to just run `cargo test -p write_buffer`. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-22 10:41:40 +00:00
Dom Dwyer	c1f7154031	feat: propagate partition key through kafka Changes the kafka message wire format to include the partition key for serialised DML writes on the wire. After this commit, the kafka messages will contain the partition key for each op, but this information will go unused in the ingester - this enables us to roll out the producer side, before making the value's presence necessary on the consumer side. A follow-up PR will change the ingester to utilise this embedded partition key. This has the unfortunate side effect of making the partition key part of the public gRPC write API: https://github.com/influxdata/influxdb_iox/issues/4866	2022-06-20 13:42:51 +01:00
Marco Neumann	0fbff981ec	chore(deps): Bump sqlx to 0.6.0 and uuid to 1 (#4894 ) Closes #4889. Closes #4890. Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-06-17 10:28:28 +00:00
Dom Dwyer	43b3f22411	fix: respect partition key when batching dml ops This commit changes the kafka write aggregator to only merge DML ops destined for the same partition. Prior to this commit the aggregator was merging DML ops that had different partition keys, causing data to be persisted in incorrect partitions: https://github.com/influxdata/influxdb_iox/issues/4787	2022-06-16 14:05:32 +01:00
Dom Dwyer	4df2964566	refactor: store PartitionKey in DmlWrite Carry the PartitionKey in the DmlWrite, allowing the batch to be associated with a specific partition key.	2022-06-15 15:48:54 +01:00
Andrew Lamb	50697906b1	refactor: Make `DMLWrite::sequence_number` a `SequenceNumber` (#4817 )	2022-06-09 19:36:37 +00:00
dependabot[bot]	04c685b3b7	chore(deps): Bump tokio-util from 0.7.2 to 0.7.3 (#4784 ) Bumps [tokio-util](https://github.com/tokio-rs/tokio) from 0.7.2 to 0.7.3. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](https://github.com/tokio-rs/tokio/compare/tokio-util-0.7.2...tokio-util-0.7.3) --- updated-dependencies: - dependency-name: tokio-util dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-06-06 14:46:27 +00:00
dependabot[bot]	e03bf94420	chore(deps): Bump tokio from 1.18.2 to 1.19.1 (#4783 ) Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.18.2 to 1.19.1. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.18.2...tokio-1.19.1) --- updated-dependencies: - dependency-name: tokio dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-06 14:15:12 +00:00
Carol (Nichols \|\| Goulding)	e5e08e5b16	test: Add a test of reset_to_earliest for all write buffer implementations This is the basic test case; I've filed #4651 for the more complex test needing deletion of records from the write buffer.	2022-05-20 20:48:17 -04:00
Carol (Nichols \|\| Goulding)	ab72c93a5e	docs: Updating wrapping, content, and grammar of comments	2022-05-20 10:51:07 -04:00
Carol (Nichols \|\| Goulding)	c811bebdb7	feat: Add ingester CLI option to skip to oldest available WB seq num The default behavior of the ingester is to panic if the min unpersisted sequence number in the catalog is unknown to the write buffer due to the retention policies having evicted that sequence number. Specifying `--skip-to-oldest-available` changes this behavior to skip to the oldest sequence number the write buffer does have available and go from there. Fixes #4624.	2022-05-20 10:51:07 -04:00
Carol (Nichols \|\| Goulding)	b3f97bdb9d	test: Capture existing behavior for unknown sequence number	2022-05-20 10:51:06 -04:00
Marco Neumann	12937ee724	feat: add SOCKS5 support to Kafka write buffer (#4623 )	2022-05-17 15:21:35 +00:00
dependabot[bot]	259d2486c1	chore(deps): Bump tokio-util from 0.7.1 to 0.7.2 (#4605 ) Bumps [tokio-util](https://github.com/tokio-rs/tokio) from 0.7.1 to 0.7.2. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](https://github.com/tokio-rs/tokio/compare/tokio-util-0.7.1...tokio-util-0.7.2) --- updated-dependencies: - dependency-name: tokio-util dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-05-16 11:42:31 +00:00
Carol (Nichols \|\| Goulding)	068096e7e1	fix: Rename data_types2 to data_types	2022-05-06 14:45:39 -04:00
Carol (Nichols \|\| Goulding)	0541c6e40f	fix: Remove data_types crate where it's no longer used	2022-05-06 14:45:39 -04:00
Carol (Nichols \|\| Goulding)	44209faa8e	fix: Move write buffer data types to write_buffer crate	2022-05-06 14:45:38 -04:00
Carol (Nichols \|\| Goulding)	236edb9181	fix: Move Sequence type to data_types2	2022-05-06 14:45:38 -04:00
Carol (Nichols \|\| Goulding)	afdff2b1db	fix: Move DatabaseName to data_types2	2022-05-06 14:45:37 -04:00
Carol (Nichols \|\| Goulding)	1ea4a40b1f	fix: Move NonEmptyString to data_types2	2022-05-06 14:45:37 -04:00
Carol (Nichols \|\| Goulding)	3ab0788a94	fix: Move DeletePredicate types to data_types2	2022-05-06 14:45:37 -04:00
dependabot[bot]	420c306caa	chore(deps): Bump tokio from 1.17.0 to 1.18.0 (#4453 ) Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.17.0 to 1.18.0. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.17.0...tokio-1.18.0) --- updated-dependencies: - dependency-name: tokio dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-04-28 08:21:17 +00:00
二手掉包工程师	4b47d723b1	refactor: Rename time to iox_time (#4416 ) Signed-off-by: hi-rustin <rustin.liu@gmail.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-04-26 00:19:59 +00:00
Andrew Lamb	73bed810da	chore: Update arrow, arrow-flight, parquet, tonic, prost, etc (#4357 ) * chore: Update datafusion * chore: Update arrow/arrow-flight/parquet to 12 * chore: update datafusion correctly * chore: Update prost, tonic, and dependents * fix: Fixup some api changes * fix: Update test output in db * fix: Update test output in parquet_file * fix: remove old pbjson types * fix: Add "--experimental_allow_proto3_optional" flag * chore: Run cargo hakari tasks * fix: compile error * chore: Update heappy Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-04-20 11:12:17 +00:00
dependabot[bot]	694ffd2238	chore(deps): Bump httparse from 1.6.0 to 1.7.0 (#4277 ) Bumps [httparse](https://github.com/seanmonstar/httparse) from 1.6.0 to 1.7.0. - [Release notes](https://github.com/seanmonstar/httparse/releases) - [Commits](https://github.com/seanmonstar/httparse/compare/v1.6.0...v1.7.0) --- updated-dependencies: - dependency-name: httparse dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-04-12 09:37:08 +00:00
Dom Dwyer	506cdebf38	refactor: remove manual Debug impl Derive the debug impl so it prints all the fields (specifically the "number of sequencers configured" is pretty helpful in a test). Manual impls drift over time and are more effort than the derive!	2022-04-05 12:02:07 +01:00
Andrew Lamb	a384448b92	refactor: rename Sequence::id and Sequence::number field names (#4190 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-03-31 15:17:58 +00:00
dependabot[bot]	17af5fcbd1	chore(deps): Bump tokio-util from 0.7.0 to 0.7.1 (#4154 ) * chore(deps): Bump tokio-util from 0.7.0 to 0.7.1 Bumps [tokio-util](https://github.com/tokio-rs/tokio) from 0.7.0 to 0.7.1. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](https://github.com/tokio-rs/tokio/compare/tokio-util-0.7.0...tokio-util-0.7.1) --- updated-dependencies: - dependency-name: tokio-util dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> * chore: Run cargo hakari tasks Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-03-29 08:39:02 +00:00
Dom Dwyer	ce08292b89	fix: maybe_auto_create_directories infinite loop Fixes symlinking in maybe_auto_create_directories() - previously it would create a symlink specifying the target path relative to the working dir, and not relative to the symlink. If the working dir != the path in the WriteBufferCreationConfig, subsequent calls would get stuck in an infinite loop attempting to resolve the bad symlink.	2022-03-25 10:34:16 +00:00
Andrew Lamb	9b3f946c10	feat: all in 1 IOx NG mode (#3965 ) * feat: Add all_in_one mode * fix: doc * docs: fix truncated docs * refactor: correctly identify PG connections * refactor: resolve failed merge Co-authored-by: Dom Dwyer <dom@itsallbroken.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-03-15 16:28:37 +00:00
Raphael Taylor-Davies	7b3767628f	chore: reduce router log verbosity (#3805 ) (#3960 ) * chore: reduce router log verbosity (#3805) * fix: lint Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-03-07 11:38:51 +00:00
Raphael Taylor-Davies	02a2cfde54	feat: update rskafka (#3805 ) (#3910 )	2022-03-03 10:53:47 +00:00
Raphael Taylor-Davies	44cfdc7aca	feat: update to latest rskafka logging (#3805 ) (#3896 )	2022-03-02 11:05:03 +00:00
Raphael Taylor-Davies	143311f63f	feat: additional write buffer logging (#3805 ) (#3887 ) * feat: additional write buffer logging (#3805) * fix: assertion direction Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-03-01 17:08:57 +00:00
Raphael Taylor-Davies	792241c89d	feat: harden write buffer aggregator (#3805 ) (#3877 ) * feat: harden write buffer aggregator (#3805) * chore: more logs * fix: build Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-02-28 18:31:24 +00:00
Marco Neumann	33851be3a5	chore: upgrade Rust to 1.59 (#3875 ) Mostly a few new clippy crates around `flat_map`, `and_then`, and "underscore locks" (!!!): https://rust-lang.github.io/rust-clippy/master/index.html#let_underscore_lock Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-02-28 15:14:19 +00:00
dependabot[bot]	ad3868ed7c	chore(deps): Bump tokio from 1.16.1 to 1.17.0 (#3814 ) * chore(deps): Bump tokio from 1.16.1 to 1.17.0 Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.16.1 to 1.17.0. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.16.1...tokio-1.17.0) --- updated-dependencies: - dependency-name: tokio dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> * build: update workspace-hack Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Dom Dwyer <dom@itsallbroken.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-02-22 16:27:43 +00:00
Marco Neumann	b1215805a8	fix: upgrade rskafka for bugfixes and race-free start offsets (#3775 ) This fixes the concerns that were brought up during the review of #3748.	2022-02-17 11:22:25 +00:00
Marco Neumann	44ee0166a0	fix: start Kafka write buffer stream at "earliest" offset, not at "0" (#3748 )	2022-02-15 13:36:59 +00:00
dependabot[bot]	89105ccfab	chore(deps): Bump tokio-util from 0.6.9 to 0.7.0 (#3743 ) Bumps [tokio-util](https://github.com/tokio-rs/tokio) from 0.6.9 to 0.7.0. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](https://github.com/tokio-rs/tokio/commits) --- updated-dependencies: - dependency-name: tokio-util dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-02-15 11:33:41 +00:00
Marco Neumann	4db27eec68	fix: defined behaviour when seeking to an unknown sequence number (#3745 ) * chore: upgrade rskafka * refactor: less cloning * fix: defined behaviour when seeking to an unknown sequence number The new, defined behavior is: "return an error once and then end the stream". Co-authored-by: Edd Robinson <me@edd.io> Co-authored-by: Edd Robinson <me@edd.io>	2022-02-15 11:08:01 +00:00
Marco Neumann	cf5a5b77cb	fix: use encoded data size estimation instead of mutable batch (#3734 ) For sparse data the PB-encoded data (our Kafka wire format) is way smaller than the MutableBatch (up to a factor 20). So lets use this one to estimate the size during batching.	2022-02-14 16:58:38 +00:00
Marco Neumann	5aada6beb8	fix: hard-code prod kafka config (#3724 ) Prod has a larger max msg. size for Kafka (10MB instead of 1MB), but currently we're unable to wire all the write buffer configs through. As a quick fix lets hard code the config. This however breaks the write buffer when running under default Kafka (1MB), so we should reverse this (tracked under #3723).	2022-02-11 11:39:01 +00:00
dependabot[bot]	ad60dc6949	chore(deps): bump httparse from 1.5.1 to 1.6.0 (#3708 ) Bumps [httparse](https://github.com/seanmonstar/httparse) from 1.5.1 to 1.6.0. - [Release notes](https://github.com/seanmonstar/httparse/releases) - [Commits](https://github.com/seanmonstar/httparse/compare/v1.5.1...v1.6.0) --- updated-dependencies: - dependency-name: httparse dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-02-10 09:54:42 +00:00
Marco Neumann	70881270c6	fix: increase default `producer_max_batch_size` (#3686 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-02-09 12:09:58 +00:00
kodiakhq[bot]	35f60945e1	Merge branch 'main' into crepererum/wb_tracing_fixes	2022-02-08 16:44:30 +00:00
Raphael Taylor-Davies	ca331503a5	feat: add WriteBufferErrorKind (#3664 ) * feat: add WriteBufferErrorKind * fix: test_offset_after_broken_message Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-02-08 15:34:05 +00:00
Marco Neumann	fb5cfcdf23	fix: do not create aggregation span w/ invalid parent When creating a new aggregation span, you MUST NOT just create a new random span context and put its child span into a span recorder, because the then only the child will be reported to the trace collector. Instead create a new root span w/o any parent directly. This makes jaeger slightly more happy and it won't complain about broken spans anymore.	2022-02-08 15:56:59 +01:00
Marco Neumann	d8cc4c9b1e	fix: do not emit unnecessary spans during write aggregation	2022-02-08 15:56:59 +01:00
Marco Neumann	d9cc9f5a2a	feat: expose write buffer connection config via CLI (#3651 ) * feat: improve rskafka config error messages * feat: expose write buffer connection config via CLI	2022-02-07 16:24:28 +00:00
Marco Neumann	e2db1df11f	refactor: improve writer buffer consumer interface (#3631 ) * refactor: improve writer buffer consumer interface The change looks huge but is actually rather simple. To understand the interface change, let me first explain what we want: - be able to fetch watermarks for any sequencer - have streams: - each streams tracks a sequencer and has an offset state (no read multiplexing) - we can seek a stream - seeking and streaming cannot be done at the same time (that would be weird and likely leads to many bugs both in write buffer and in the user code) - ideally we don't need to create streams of all sequencers but can choose a subset Before this change we had one mutable consumer struct where you can get all streams and watermark functions (this mutable-borrows the consumer) or you can seek a single stream (this also mutable-borrows the consumer). This is a bit weird for multiple reasons: - you cannot seek a single stream without dropping all of them - the mutable-borrow construct makes it really difficult to pass the streams into separate threads - the consumer is boxed (because its mutable) which makes it more difficult to handle in a large-scale application What this change does is the following: - you have an immutable consumer (similar to the producer) - the consumer offers the following methods: - get the set of sequencer IDs - get watermark for any sequencer - get a stream handler (see next point) for any sequencer - the stream handler captures the stream state (offset) and provides you a standard `Stream<_>` interface as well as a seek function. Mutable-borrows ensure that you cannot use both at the same time. The stream handler provides you the stream via `handler.stream()`. It doesn't implement `Stream<_>` itself because the way boxing, dynamic dispatch work, and pinning interact (i.e. I couldn't get it to work without the indirection). As a bonus point (which we don't use however) you can now create multiple streams for the same sequencer and they all have their own offset. * fix: review comments Co-authored-by: Carol (Nichols \|\| Goulding) <193874+carols10cents@users.noreply.github.com> Co-authored-by: Carol (Nichols \|\| Goulding) <193874+carols10cents@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-02-07 12:24:17 +00:00
Marco Neumann	50cff27b01	chore: remove rdkafka dependency (#3625 ) All features are now covered by rskafka. This also removes the need to specify a server ID for write buffer consumers. This was only used for rdkafka since there we needed to specify a consumer group, even though we did not use any transactions.	2022-02-03 13:33:56 +00:00
Marco Neumann	bc4b7f8a5b	test: ensure that rskafka and rdkakfa work together (#3624 ) * chore: upgrade rskafka + enable snappy support * test: ensure that rskafka and rdkakfa work together Before removing rdkafka ensure that: - rskafka can consume existing messages produced by rdkafka so we do not need to drain existing topics - rdkafka can consume new messages produced by rskafka so we can roll back I ran the whole `write_buffer` test suite (including the newly added tests) using Apache Kafka as well as Redpanda. * test: ensure we handle consumer offset in error case correctly * docs: explain test setup Co-authored-by: Andrew Lamb <alamb@influxdata.com> Co-authored-by: Andrew Lamb <alamb@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-02-03 12:52:42 +00:00
Marco Neumann	9567acd621	feat: expose all relevant configs for rskafka write buffers (#3599 ) * feat: expose all relevant configs for rskafka write buffers * refactor: `CreationConfig` => `TopicCreationConfig`	2022-02-02 09:35:54 +00:00
Marco Neumann	36a7d9b8f3	feat: flush interface for write buffer producers (#3595 ) Closes #3504. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-02-01 15:16:23 +00:00
Marco Neumann	22778a3a80	chore: upgrade rskafka and parking_lot (#3592 )	2022-02-01 11:50:42 +00:00
Marco Neumann	b326b62b44	feat: buffer writes when writing to RSKafka (#3520 )	2022-02-01 10:07:52 +00:00
Raphael Taylor-Davies	4101d16f71	chore: feature flag consistency (#3574 ) * chore: feature flag consistency * chore: add aarch64-apple-darwin to hakari Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-01-28 16:38:59 +00:00
Paul Dix	16d584b2ff	feat: Add db_name/namespace to DmlWrite and DmlDelete (#3531 ) * feat: Add db_name/namespace to DmlWrite and DmlDelete This is required for the new ingester to be able to work with the write buffer. The protobuf that gets serialized over Kafka already includes the database name, it just wasn't getting carried through to the marshaled Dml operation. * fix: database != namespace, propagation through write buffer Co-authored-by: Marco Neumann <marco@crepererum.net> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-01-27 14:12:20 +00:00
Marco Neumann	2928254c0f	fix: test logging (#3536 ) - Use a more standard way to setup the tracing subsystem (as described in tracing-subscriber docs) - Also capture content from `log` crate - Play nice w/ Rust's libtest message capture Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-01-26 10:28:51 +00:00
Dom	b846ead320	feat(router2): shard writes/deletes into write buffer (#3499 ) * feat: Sequencer wrapper This type wraps an underlying WriteBufferWriter implementation, tagging it with a sequencer ID it should use when enqueuing operations to the buffer. * feat: mock sharder Implements a mock Sharder impl that returns pre-configured responses to shard(), and captures the input to the call. * feat: sharded write buffer Implements sharding of ops into an underlying WriteBuffer. Writes are sharded by some abstract Sharder impl, collated per shard to maximise the size of each op (and therefore compression efficiency), converted into a DML operation and then enqueued in parallel to the underlying WriteBuffer implementation. Deletes are modelled as being mapped to a single write buffer shard, which is the case while we support sharding based on the table & namespace only. Deletes will be extended to support (potentially) multiple shards when column overrides are implemented. * refactor: runtime write buffers Switch from using static dispatch, to using a runtime specified WriteBufferWriting implementation. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-01-25 15:19:48 +00:00
Marco Neumann	76dd62a6c2	feat: RSKafka-driven write buffer	2022-01-20 12:36:10 +01:00
Andrew Lamb	dd23056efd	chore: update datafusion, arrow, prost, tonic, pbjson, etc (#3455 ) * chore: update datafusion, arrow, prost, tonic, etc * fix: update pprof as well * chore: update hakari * fix: update pbjson * chore: update heappy * fix: hakari * fix: workaround https://github.com/influxdata/influxdb_iox/issues/3458 Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-01-13 17:07:15 +00:00
Raphael Taylor-Davies	4731996cf4	revert: #3243 (#3370 ) * revert: "Merge pull request #3243 from influxdata/crepererum/improve_kafka_client_usage" This reverts commit `6ebec4ff71`, reversing changes made to `7684794e5c`. * fix: merge conflicts	2021-12-13 20:33:39 +00:00

1 2 3 4 5 ...

293 Commits (07772e8d2254fb734e7f826298559658a4964015)