During initialisation, the ingester connects to the Kafka brokers - this
involves per-partition leadership discovery & connection establishment.
These connections are then retained for the lifetime of the process.
Prior to this commit, the ingester would establish a connection to all
partition leaders for a given topic. After this commit, the ingester
connects to only the partition leaders it is going to consume from
(for those shards that it is assigned.)
* ci: use same feature set in `build_dev` and `build_release`
* ci: also enable unstable tokio for `build_dev`
* chore: update tokio to 1.21 (to fix console-subscriber 0.1.8
* fix: "must use"
Adds instrumentation to the low-level (post-aggregation) Kafka client,
capturing the uncompressed, approximate message size (calculated as the
sum of all Record::approximate_size() returns, ignoring largely static
framing overhead).
Previously aggregated writes were merged into a single Kafka Record -
this meant that all merged ops would be placed into the same Record, and
therefore receive the same sequence number once published to Kafka.
The new aggregator batches at the Record level, therefore aggregated
writes now get their own distinct sequence number. This commit updates
the batching tests to reflect this new sequence number assignment
behaviour.
The previous aggregator impl would assert that writes had been
partitioned before aggregating them (or rather, that the DML write had a
partition key assigned).
This should be true for all writes passing through the write buffer,
irrespective of which aggregator is used, therefore this assert is moved
"up" into the write buffer itself.
Replaces the DmlAggregator with the simpler RecordAggregator.
Metrics gathered as part of #5323 shows there is practically no benefit
to the additional complexity of the DmlAggregator over the simpler
RecordAggregator impl.
This commit adds a new write buffer aggregator used by rskafka to
increase the size of Kafka messages on the wire. The Kafka write buffer
impl is the only impl to perform aggregation.
This Aggregator impl maps IOx-specific DML operations to rskafka Records
with no additional processing - it can be thought of as an IOx-specific
adaptor over rskafka's RecordAggregator.
By delegating batching of Record instances to rskakfa's simple
RecordAggregator, we minimise code complexity / bug surface area / LoC.
Changes the Kafka write buffer impl to parallelise initialisation of the
PartitionClient instances.
Now that the PartitionClient constructor also performs leader discovery
(using cached metadata, influxdata/rskafka#164) and establishes a broker
connection (influxdata/rskafka#166) executing them in parallel will
cause a proportional decrease in the time taken to bring IOx up.
* fix: do not loose data when Kafka reports that offset is above watermark
This can happen in certain cluster rebalance settings.
This is also linked to https://github.com/influxdata/rskafka/issues/147
but for the upstream issue I currently have no idea how to fix it, so
let's at least harden IOx against it.
Fixes#5128.
* refactor: panic for `SequenceNumberAfterWatermark`
Previously IOx mapped a single database to a single kafka topic - this
is no longer the case, so referring to the kafka topic name as the
"database name" name is confusing.
Adds a decorator over the underlying kafka client to capture the latency
distribution of the low-level kafka writes, independent of the
aggregation/DML batching framework that sits "above" this client.
The latency measurements include the serialisation overhead, protocol
overhead, and actual network I/O.
The Kafka write buffer implementation (and only the Kafka impl) merges
together successive DML writes for the same namespace & partition within
a window of time.
This commit records the number of DML writes that have been merged
together to form a single batched op before it is dispatched to Kafka.
* test: add test for timestamps in kafka write buffer
* refactor: move timestamp batching test to generic tests
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Fixes interaction of `maybe_skip_kafka_integration!` and `should_panic`
by ensuring that `maybe_skip_kafka_integration!` panics to skip
`should_panic` tests.
Without that it is not possible to just run `cargo test -p write_buffer`.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Changes the kafka message wire format to include the partition key for
serialised DML writes on the wire.
After this commit, the kafka messages will contain the partition key for
each op, but this information will go unused in the ingester - this
enables us to roll out the producer side, before making the value's
presence necessary on the consumer side.
A follow-up PR will change the ingester to utilise this embedded
partition key.
This has the unfortunate side effect of making the partition key part of
the public gRPC write API:
https://github.com/influxdata/influxdb_iox/issues/4866
This commit changes the kafka write aggregator to only merge DML ops
destined for the same partition.
Prior to this commit the aggregator was merging DML ops that had
different partition keys, causing data to be persisted in incorrect
partitions:
https://github.com/influxdata/influxdb_iox/issues/4787
The default behavior of the ingester is to panic if the min unpersisted
sequence number in the catalog is unknown to the write buffer due to the
retention policies having evicted that sequence number.
Specifying `--skip-to-oldest-available` changes this behavior to skip to
the oldest sequence number the write buffer does have available and go
from there.
Fixes#4624.