Commit Graph

137 Commits (d289ba6d03a33a59c6918add4497228fc06cad9c)

Author SHA1 Message Date
Carol (Nichols || Goulding) 16d8ae5e04
fix: Match tokio features to what's actually in use in each crate
Some crates listed features they don't use; other crates ware relying on
feature flags enabled by something else. I tested these changes by
disabling the workspace hack crate and testing each crate.
2021-12-06 09:37:16 -05:00
Carol (Nichols || Goulding) 02c297e850
fix: Always specify the parking_lot feature of tokio to get potential perf boost 2021-12-06 09:37:15 -05:00
Carol (Nichols || Goulding) 1a899b939e
fix: Remove redundant closures identified by clippy
https://rust-lang.github.io/rust-clippy/master/index.html#redundant_closure
2021-12-02 11:52:02 -05:00
Marco Neumann 332485d2c9 fix: use correct `bytes_read` in `DmlMeta`
- for file-based write buffers: Use headers + payload
- for Kafka-based write buffers: Use the estimation that we also use for
  other metrics
- as a side effect we can now just use `PartialEq` for more types

Fixes #3186.
2021-12-01 15:57:21 +01:00
Marco Neumann 459c14035c test: ensure that Kafka producers also generate sane metrics 2021-11-29 10:55:08 +01:00
Marco Neumann b7952c15a6 refactor: improve Kafka client usage
With the new rust-rdkafka release (merged in #3234) managing multiple
consumer streams becomes a bit easier. Also we can just reuse consumer
clients for multiple metadata requests. In total that provides:

- use only a single client connection for consumers (we had multiple
  connection attempts during startup and one client per stream)
- use only two clients for producers (sadly we need a consumer client to
  probe the partitions during startup)
- consumers no longer need to poll the stream to receive statistics
2021-11-29 10:39:09 +01:00
dependabot[bot] c729fc6a25
chore(deps): bump rdkafka from 0.27.0 to 0.28.0
Bumps [rdkafka](https://github.com/fede1024/rust-rdkafka) from 0.27.0 to 0.28.0.
- [Release notes](https://github.com/fede1024/rust-rdkafka/releases)
- [Changelog](https://github.com/fede1024/rust-rdkafka/blob/master/changelog.md)
- [Commits](https://github.com/fede1024/rust-rdkafka/compare/v0.27.0...v0.28.0)

---
updated-dependencies:
- dependency-name: rdkafka
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-11-29 08:18:02 +00:00
Marco Neumann 7f2e4f4342 refactor: remove write buffer direction
The direction was required when a database could read or write from/to a
write buffer. Now it is clear from the usage context of a write buffer
context which of the two applications is meant (databases read, routers
write) so the direction flag is no longer required.
2021-11-26 12:38:40 +01:00
Marco Neumann edf7becd20 fix: address review comments 2021-11-24 12:09:52 +01:00
Marco Neumann f75c12351d fix: do not forget outputs of file-based write buffer
The existing channel construction could lead to cases where streams
would consume messages, put them into the channel but then when the
stream gets dropped the message would be gone forever. So lets move from
a channel-based implementation to directly invoke the generator future,
so this buffering doesn't occur.

Fixes #3179.
2021-11-24 11:38:41 +01:00
kodiakhq[bot] a6a0eda142
Merge branch 'main' into crepererum/issue3030 2021-11-23 08:08:34 +00:00
Carol (Nichols || Goulding) 9fd4a560f5
feat: Results of running cargo hakari manage-deps 2021-11-19 09:21:57 -05:00
Raphael Taylor-Davies e32d367e85
feat: flush delete mailbox on persist (#3126) (#3147)
* feat: flush delete mailbox on persist (#3126)

* chore: review feedback

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-11-19 09:45:29 +00:00
Marco Neumann 7c72a993a3 fix: don't retry "forever" sending Kafka messages
When a Kafka broker pod is recreated (for whatever reason) and gets a
new IP while doing so, the following happened:

1. Old broker pod gets terminated, but is still reachable via DNS and
   TCP.
2. rdkafka looses its connection, re-creates it using the old IP. The
   TCP connection can be established (this heavily depends on the K8s
   network setup), but won't be able to send any messages because the
   old broker is already shutting down / dead.
3. New broker gets created w/ new IP (but same DNS name).
4. Somewhat in parallel to step 3: rdkafka gets informed by other
   brokers that the topic lost its leader and then that the topic has
   the new leader (which has the same identity as the old one). Since
   leader changes in Kafka can also happen when brokers are totally
   healthy, it doesn't conclude that its TCP connection might be broken
   and tries to send messages to the new broker via the old TCP
   connection.
5. It takes very long (~130s on my test setup) for the old
   rdkafka->broker TCP connection to break. Since
   `message.send.max.retries` has a default of `2147483647` rdkafka will
   not give up on the application level.
5. rdkafka re-connects, while doing so resolves via DNS the new broker
   IP and is happy.

An alternative fix that was tried: Use the `connect` rdkafka callback to
hook into the place where it would issue the UNIX `connect` call. There
we can manipulate the socket. Setting `TCP_USER_TIMEOUT` to 5000ms also
solves the issue somewhat, but might have different implications (also
it then takes around 5s to kill the connection). Since this is a more
hackish implementation and somewhat an unofficial way to configure
rdkafka, I decided against it.

Test Setup
==========

```rust
\#[tokio::test]
async fn write_forever() {
    maybe_start_logging();
    let conn = maybe_skip_kafka_integration!();
    let adapter = KafkaTestAdapter::new(conn);
    let ctx = adapter.new_context(NonZeroU32::new(1).unwrap()).await;

    let writer = ctx.writing(true).await.unwrap();
    let lp = "upc user=1 100";
    let sequencer_id = set_pop_first(&mut writer.sequencer_ids()).unwrap();

    for i in 1.. {
        println!("{}", i);

        let tables = mutable_batch_lp::lines_to_batches(lp, 0).unwrap();
        let write = DmlWrite::new(tables, DmlMeta::unsequenced(None));
        let operation = DmlOperation::Write(write);
        let res = writer.store_operation(sequencer_id, &operation).await;
        dbg!(res);

        tokio::time::sleep(Duration::from_secs(1)).await;
    }
}
```

Make sure to set the the rdkafka `log` config to `all`. Then use KinD,
setup a 3-node Strimzi cluster and start the test binary within the K8s
cluster. You need to start a debug container that is close enough to
your developer system (e.g. an old Debian DOES NOT work if you run
bleeding edge Arch):

```console
$(host) kubectl run -i --tty --rm debug --image=archlinux --restart=Never -n kafka -- bash
````

Then you copy over the test binary the container using [cargo-with](https://github.com/cbourjau/cargo-with):

```console
$(host) cargo with 'kubectl cp {bin} kafka/debug:/foo' -- test -p write_buffe
````

Within the container shell that you've just created, start the
forever-running test (make sure to set `KAFKA_CONNECT` according to your
Strimzi setup!):

```console
$(container) TEST_INTEGRATION=1 KAFKA_CONNECT=my-cluster-kafka-bootstrap:9092 RUST_BACKTRACE=1 RUST_LOG=debug ./foo write_forever --nocapture
````

The test should run and tell you that it is delivering messages. It also
tells you within the debug logs which broker it sends the messages to.
Now you need to kill the broker (in my example it was `my-cluster-kafka-1`):

```console
$(host) kubectl -n kafka delete pod my-cluster-kafka-1
````

The test should now stop to deliver messages and should error. Without
this patch it might take over 100s for it to recover even after the
deleted pod was re-created. With this patch it quickly is able to
deliver data again after the broker comes back online.

Fixes #3030.
2021-11-19 09:53:57 +01:00
Carol (Nichols || Goulding) a2454b542d
fix: Small cleanups in Cargo.tomls (#3160)
* fix: Add tokio rt-multi-thread feature so cargo test -p client_util compiles

* fix: Alphabetize dependencies

* fix: Add the data_types_conversions feature to get tests passing

* fix: Remove dev dependencies already listed under normal dependencies

* fix: Make sure the workspace is using the new resolver
2021-11-18 22:26:33 +00:00
Raphael Taylor-Davies 8155747735
feat: add write buffer delete encoding (#2731) (#3127)
* feat: add write buffer delete encoding (#2731)

* chore: fix doc

* chore: review feedback

* chore: review feedback

* chore: fmt

* chore: review feedback
2021-11-17 16:12:19 +00:00
Andrew Lamb b5a7bf03da
feat: Add kafka write buffer consumer metrics (#3129)
* feat: Add kafka write buffer consumer metrics

* refactor: use unwrap_or_else

* fix: Update bucket boundaries
2021-11-17 14:35:40 +00:00
Andrew Lamb d6c6e9a6c7
fix: Default kafka timeout to be shorter than gRPC timeout (60 sec --> 10 sec) (#3131)
* fix: Default kafka timeout to be shorter than gRPC timeout

* docs: fix link style
2021-11-17 12:19:53 +00:00
Marco Neumann 79929c8cf4 feat: add more Kafka metrics 2021-11-16 17:18:41 +01:00
Marco Neumann 9ee004946e fix: do not overload rdkafka w/ statistics 2021-11-16 17:18:41 +01:00
Marco Neumann e6fdd79a0f feat: emit Kafka stats as metrics instead of logs
This maps a subset of Kafka stats as metrics. The set can -- of course
-- be changed in the future depending on our needs.

Fixes #3100.
2021-11-16 17:18:41 +01:00
Raphael Taylor-Davies 553e412226
refactor: DMLOperation write path (#2731) (#3121)
* refactor: DMLOperation write path (#2731)

* chore: fmt

* chore: review feedback
2021-11-16 12:42:19 +00:00
Raphael Taylor-Davies a6d83a3026
feat: WriteBufferReader use DmlOperation (#2731) (#3096)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-11-15 10:19:54 +00:00
Raphael Taylor-Davies 6f268f8260
refactor: extract DML types (#2731) (#3084)
* refactor: extract DML types (#2731)

* chore: fmt
2021-11-11 12:34:07 +00:00
Marco Neumann 11f4a1dee8 feat: add connection management for router 2021-11-08 11:11:52 +01:00
Raphael Taylor-Davies 60f0deaf1e
feat: remove flatbuffer entry (#3045) 2021-11-05 20:19:24 +00:00
Raphael Taylor-Davies 898567e221
feat: migrate server to DbWrite (#2724) (#3035)
* feat: migrate server to DbWrite (#2724)

* chore: print perf log output

* fix: don't suppress CI status code

* chore: review feedback

* fix: don't error on empty line protocol write payloads

* fix: test

* fix: test
2021-11-05 11:09:33 +00:00
Raphael Taylor-Davies 07ba629e2b
feat: migrate write buffer producer to MutableBatch and pbdata (#2743) (#3021)
* feat: migrate write buffer producer to MutableBatch and pbdata (#2743)

* fix: Kafka message content type header

* chore: fix doc
2021-11-04 10:20:40 +00:00
Raphael Taylor-Davies 08fcd87337
feat: use protobuf encoding in write buffer (#2724) (#3018) 2021-11-03 15:19:05 +00:00
Raphael Taylor-Davies 51c6348e54
refactor: extract codec module for write buffer (#2724) (#3017) 2021-11-03 14:07:33 +00:00
Marco Neumann 0d0c0cb42b refactor: move write buffer configs to new home
Write buffer configs will partially be shared by database and router
nodes, so lets move them into a shared home.
2021-11-02 10:17:01 +01:00
Raphael Taylor-Davies f1a6468e7b
feat: migrate write buffer consumer to use DbWrite (#2724) (#3003)
* feat: migrate write buffer consumer to use DbWrite (#2724)

* fix: doc

* chore: fmt

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-11-01 16:38:48 +00:00
dependabot[bot] c540b40f05
chore(deps): bump tokio from 1.12.0 to 1.13.0
Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.12.0 to 1.13.0.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.12.0...tokio-1.13.0)

---
updated-dependencies:
- dependency-name: tokio
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-11-01 11:21:59 +00:00
Raphael Taylor-Davies 6ceab054ab
refactor: move DbWrite to mutable_batch (#2986)
* refactor: move DbWrite to mutable_batch

* chore: fix doc

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-10-29 15:13:05 +00:00
Raphael Taylor-Davies 8a2410e161
feat: mutable batch write entry (#2724) (#2973)
* feat: mutable batch write entry (#2724)

* chore: lint

* chore: review feedback

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-10-28 20:15:28 +00:00
Marco Neumann 3af1504ed2 fix: race condition in file-based write buffer 2021-10-26 10:09:34 +02:00
Marco Neumann 93f6519c34 fix: address review comments 2021-10-26 10:09:34 +02:00
Marco Neumann d527775aec feat: allow gaps in file-based WB + improve error handling 2021-10-26 10:09:34 +02:00
Marco Neumann bc7244c48e chore: use Rust edition 2021 2021-10-25 10:58:20 +02:00
Marco Neumann a5f15e6e76 refactor: improve directory names 2021-10-19 16:15:22 +02:00
Marco Neumann 6ec0bd5bab feat: file-based write write_buffer
Closes #2849.
2021-10-19 15:26:43 +02:00
Marco Neumann b2698cca44 feat: add `format_jaeger_trace_context` 2021-10-19 15:26:05 +02:00
dependabot[bot] 32e18b6436
chore(deps): bump rdkafka from 0.26.0 to 0.27.0
Bumps [rdkafka](https://github.com/fede1024/rust-rdkafka) from 0.26.0 to 0.27.0.
- [Release notes](https://github.com/fede1024/rust-rdkafka/releases)
- [Changelog](https://github.com/fede1024/rust-rdkafka/blob/master/changelog.md)
- [Commits](https://github.com/fede1024/rust-rdkafka/compare/v0.26.0...v0.27.0)

---
updated-dependencies:
- dependency-name: rdkafka
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-10-19 11:49:34 +00:00
kodiakhq[bot] 45c2c26168
Merge branch 'main' into crepererum/write_buffer_optional_span_ctx 2021-10-15 07:25:21 +00:00
Marco Neumann 85be39de40 test: make `headers_case_handling` test easier to understand 2021-10-15 09:20:41 +02:00
Marco Neumann 2850487877 feat: make trace collector in Kafka consumer optional
The whole application might not have a trace collector configured in
which case we don't wanna produce any spans.
2021-10-15 09:20:40 +02:00
Marco Neumann d6da68b762 fix: do not panic when writing to an unknown sequencer 2021-10-14 17:27:19 +02:00
kodiakhq[bot] 61ec559eee
Merge branch 'main' into crepererum/write_buffer_span_ctx 2021-10-14 11:50:07 +00:00
Raphael Taylor-Davies e911cf9ac1
refactor: make WriteBufferConfigFactory interior mutable (#2829)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-10-14 10:30:59 +00:00
Marco Neumann 5e06519afb feat: propagate trace information through write buffer 2021-10-14 11:07:41 +02:00