Changes the DmlDelete to contain the NamespaceId for which it should be
applied, propagating this value over the wire.
Like the existing IDs within the DmlWrite, these values are marked
unsafe to use due to avoid the consumers utilising them accidentally
during deployment. Unlike DmlWrite, the DmlDelete is completely unused,
so this is less of an issue.
This commit is part of a two-part change in order to add the table &
namespace IDs to the write buffer wire format. This commit forms the
first half; changing the producer to send the IDs.
In this commit the new ID values are never read on the consumer side,
ensuring there is no consumer dependency on them. This ensures they
remain operational during a rollout, where the consumer may be updated
to the latest code dependent on the IDs before the producer is updated
to send them. This also ensures we have a window of time where where the
consumers can be rolled back after being updated, and still handle
replaying messages in Kafka.
Changes the DmlWrite type to require a PartitionKey be specified,
instead of accepting an Option.
This requirement was already in place - the write buffer upheld an
invariant that all writes contained a partition key value (was not
"None") or it panicked at runtime when attempting to enqueue the write.
It is now possible to encode this invariant in the type system, which is
what this change does.
Since we log trace IDs to allow easier correlation of logs no matter
what the `sampled` flag says, we should also parse these logs if we
don't have a tace collector at all.
In practice, this won't make a difference since we always deploy with a
trace collector, but it also makes the code easier to reason about.
Helps with #5975.
* revert: "revert: rdkafka/rskafka swapping (#5800)"
This reverts commit b77c3540e1.
* test: Verify write buffer connection_config is parsed as expected
* test: Failing test reproducing the error seen when deploying rdkafka
* fix: Translate k8s-idpe configs to rdkafka configs
* feat: Add back rdkafka dependency
* feat: Remove RSKafkaProducer
* feat: Remove write buffer RecordAggregator
* feat: Add back rdkafka producer
Using code from 58a2a0b9c8311303c796495db4f167c99a2ea3aa then getting it
to compile with the latest
* feat: Add a metric around enqueue
* fix: Remove unused imports
* fix: Increase Kafka timeout to 20s
* docs: Clarify that Kafka topics should only be created in test/dev envs
* fix: Remove metrics that aren't needed for this experiment
Co-authored-by: Dom <dom@itsallbroken.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
In https://github.com/influxdata/influxdb_iox/pull/5754 I added code at
seek() time to check if the offset exists, and refuse to seek if that's
not the case, effectively making this check redundant - I left it in on
the assumption that some cases previously added would work!
Unfortunately this doesn't seem to be the case -
performing a read-ahead-of-data and read-behind-data seems to cause the
high_watermark to be returned as -1, meaning this code never worked?!
This new read-ahead-of-data match arm took priority over the
SequenceNumberNoLongerExists arm, effectively preventing the ingester
from taking the desired remediation (skipping to most recent write, or
erroring, depending on configuration).
Moves the "you've tried to seek into the future!" error to the point at
which the seek attempt was made.
This makes more sense than deferring the seek error until read time, and
is easier to determine this is the case rather than at read time (where
the read response error contains an invalid high_watermark value of -1,
making it impossible to conclusively determine what has happened).
In staging we observed an ingester panic due to the write buffer stream
yielding an WriteBufferErrorKind::SequenceNumberAfterWatermark,
suggesting the ingester was attempting to read from an offset that
exceeds the current max write offset in Kafka (high watermark offset).
This turned out not to be the case - the partition had a single write at
offset 2, and the ingester was attempting to seek to offset 1. The first
read would fail (offset 1 does not exist) and the error handling did not
account for the high watermark not being correctly set (-1 in the
response).
I have no idea why rskafka returns this watermark / doesn't retry / etc
but this change will allow the ingesters to recover.
* chore: Upgrade to Rust 1.64
* fix: Use iter find instead of a for loop, thanks clippy
* fix: Remove some needless borrows, thanks clippy
* fix: Use then_some rather than then with a closure, thanks clippy
* fix: Use iter retain rather than filter collect, thanks clippy
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
During initialisation, the ingester connects to the Kafka brokers - this
involves per-partition leadership discovery & connection establishment.
These connections are then retained for the lifetime of the process.
Prior to this commit, the ingester would establish a connection to all
partition leaders for a given topic. After this commit, the ingester
connects to only the partition leaders it is going to consume from
(for those shards that it is assigned.)
* ci: use same feature set in `build_dev` and `build_release`
* ci: also enable unstable tokio for `build_dev`
* chore: update tokio to 1.21 (to fix console-subscriber 0.1.8
* fix: "must use"
Adds instrumentation to the low-level (post-aggregation) Kafka client,
capturing the uncompressed, approximate message size (calculated as the
sum of all Record::approximate_size() returns, ignoring largely static
framing overhead).
Previously aggregated writes were merged into a single Kafka Record -
this meant that all merged ops would be placed into the same Record, and
therefore receive the same sequence number once published to Kafka.
The new aggregator batches at the Record level, therefore aggregated
writes now get their own distinct sequence number. This commit updates
the batching tests to reflect this new sequence number assignment
behaviour.
The previous aggregator impl would assert that writes had been
partitioned before aggregating them (or rather, that the DML write had a
partition key assigned).
This should be true for all writes passing through the write buffer,
irrespective of which aggregator is used, therefore this assert is moved
"up" into the write buffer itself.
Replaces the DmlAggregator with the simpler RecordAggregator.
Metrics gathered as part of #5323 shows there is practically no benefit
to the additional complexity of the DmlAggregator over the simpler
RecordAggregator impl.
This commit adds a new write buffer aggregator used by rskafka to
increase the size of Kafka messages on the wire. The Kafka write buffer
impl is the only impl to perform aggregation.
This Aggregator impl maps IOx-specific DML operations to rskafka Records
with no additional processing - it can be thought of as an IOx-specific
adaptor over rskafka's RecordAggregator.
By delegating batching of Record instances to rskakfa's simple
RecordAggregator, we minimise code complexity / bug surface area / LoC.
Changes the Kafka write buffer impl to parallelise initialisation of the
PartitionClient instances.
Now that the PartitionClient constructor also performs leader discovery
(using cached metadata, influxdata/rskafka#164) and establishes a broker
connection (influxdata/rskafka#166) executing them in parallel will
cause a proportional decrease in the time taken to bring IOx up.