When a Kafka broker pod is recreated (for whatever reason) and gets a
new IP while doing so, the following happened:
1. Old broker pod gets terminated, but is still reachable via DNS and
TCP.
2. rdkafka looses its connection, re-creates it using the old IP. The
TCP connection can be established (this heavily depends on the K8s
network setup), but won't be able to send any messages because the
old broker is already shutting down / dead.
3. New broker gets created w/ new IP (but same DNS name).
4. Somewhat in parallel to step 3: rdkafka gets informed by other
brokers that the topic lost its leader and then that the topic has
the new leader (which has the same identity as the old one). Since
leader changes in Kafka can also happen when brokers are totally
healthy, it doesn't conclude that its TCP connection might be broken
and tries to send messages to the new broker via the old TCP
connection.
5. It takes very long (~130s on my test setup) for the old
rdkafka->broker TCP connection to break. Since
`message.send.max.retries` has a default of `2147483647` rdkafka will
not give up on the application level.
5. rdkafka re-connects, while doing so resolves via DNS the new broker
IP and is happy.
An alternative fix that was tried: Use the `connect` rdkafka callback to
hook into the place where it would issue the UNIX `connect` call. There
we can manipulate the socket. Setting `TCP_USER_TIMEOUT` to 5000ms also
solves the issue somewhat, but might have different implications (also
it then takes around 5s to kill the connection). Since this is a more
hackish implementation and somewhat an unofficial way to configure
rdkafka, I decided against it.
Test Setup
==========
```rust
\#[tokio::test]
async fn write_forever() {
maybe_start_logging();
let conn = maybe_skip_kafka_integration!();
let adapter = KafkaTestAdapter::new(conn);
let ctx = adapter.new_context(NonZeroU32::new(1).unwrap()).await;
let writer = ctx.writing(true).await.unwrap();
let lp = "upc user=1 100";
let sequencer_id = set_pop_first(&mut writer.sequencer_ids()).unwrap();
for i in 1.. {
println!("{}", i);
let tables = mutable_batch_lp::lines_to_batches(lp, 0).unwrap();
let write = DmlWrite::new(tables, DmlMeta::unsequenced(None));
let operation = DmlOperation::Write(write);
let res = writer.store_operation(sequencer_id, &operation).await;
dbg!(res);
tokio::time::sleep(Duration::from_secs(1)).await;
}
}
```
Make sure to set the the rdkafka `log` config to `all`. Then use KinD,
setup a 3-node Strimzi cluster and start the test binary within the K8s
cluster. You need to start a debug container that is close enough to
your developer system (e.g. an old Debian DOES NOT work if you run
bleeding edge Arch):
```console
$(host) kubectl run -i --tty --rm debug --image=archlinux --restart=Never -n kafka -- bash
````
Then you copy over the test binary the container using [cargo-with](https://github.com/cbourjau/cargo-with):
```console
$(host) cargo with 'kubectl cp {bin} kafka/debug:/foo' -- test -p write_buffe
````
Within the container shell that you've just created, start the
forever-running test (make sure to set `KAFKA_CONNECT` according to your
Strimzi setup!):
```console
$(container) TEST_INTEGRATION=1 KAFKA_CONNECT=my-cluster-kafka-bootstrap:9092 RUST_BACKTRACE=1 RUST_LOG=debug ./foo write_forever --nocapture
````
The test should run and tell you that it is delivering messages. It also
tells you within the debug logs which broker it sends the messages to.
Now you need to kill the broker (in my example it was `my-cluster-kafka-1`):
```console
$(host) kubectl -n kafka delete pod my-cluster-kafka-1
````
The test should now stop to deliver messages and should error. Without
this patch it might take over 100s for it to recover even after the
deleted pod was re-created. With this patch it quickly is able to
deliver data again after the broker comes back online.
Fixes#3030.
* fix: Add tokio rt-multi-thread feature so cargo test -p client_util compiles
* fix: Alphabetize dependencies
* fix: Add the data_types_conversions feature to get tests passing
* fix: Remove dev dependencies already listed under normal dependencies
* fix: Make sure the workspace is using the new resolver
* chore: update one-shot Dockerfile to not depend on rust:ci
* chore: update Debian to bullseye
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
This will break `perf_image` until the new CI image is built due to the
newly required `--all-tags` parameter to `docker push` that isn't
available for the docker version we run on buster.
The std `DefaultHasher` is NOT guaranteed to stay the same, so let's
directly use the `SipHasher13` which at the moment (2021-11-15) is used
by the standard lib.
Fixes#3063.