Commit Graph

46 Commits (5d835d5047c583d97a42feecfa68d39198c19bbb)

Author SHA1 Message Date
Dom Dwyer 5d835d5047 revert: rdkafka/rskafka swapping (#5844)
This reverts commit 442a7ff2a4.

This commit restores rskafka as the producer Kafka client, effectively
undoing the change made (and follow-up PRs) in:

    https://github.com/influxdata/influxdb_iox/pull/5800
2022-10-17 12:34:28 +02:00
Carol (Nichols || Goulding) 442a7ff2a4
revert: "revert: rdkafka/rskafka swapping (#5800)" (#5844)
* revert: "revert: rdkafka/rskafka swapping (#5800)"

This reverts commit b77c3540e1.

* test: Verify write buffer connection_config is parsed as expected

* test: Failing test reproducing the error seen when deploying rdkafka

* fix: Translate k8s-idpe configs to rdkafka configs
2022-10-13 09:33:06 +00:00
Dom Dwyer b77c3540e1 revert: rdkafka/rskafka swapping (#5800)
This reverts commit 33391af973.
2022-10-11 13:01:10 +02:00
Carol (Nichols || Goulding) 33391af973
feat: Swap Kafka Producer implementation back to rdkafka as diagnosis of latency problem (#5800)
* feat: Add back rdkafka dependency

* feat: Remove RSKafkaProducer

* feat: Remove write buffer RecordAggregator

* feat: Add back rdkafka producer

Using code from 58a2a0b9c8311303c796495db4f167c99a2ea3aa then getting it
to compile with the latest

* feat: Add a metric around enqueue

* fix: Remove unused imports

* fix: Increase Kafka timeout to 20s

* docs: Clarify that Kafka topics should only be created in test/dev envs

* fix: Remove metrics that aren't needed for this experiment

Co-authored-by: Dom <dom@itsallbroken.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-11 09:14:45 +00:00
Dom Dwyer d1ca29c029 fix(ingester): connect to assigned Kafka partition
During initialisation, the ingester connects to the Kafka brokers - this
involves per-partition leadership discovery & connection establishment.
These connections are then retained for the lifetime of the process.

Prior to this commit, the ingester would establish a connection to all
partition leaders for a given topic. After this commit, the ingester
connects to only the partition leaders it is going to consume from
(for those shards that it is assigned.)
2022-09-07 13:21:06 +02:00
Carol (Nichols || Goulding) 74c9529062
fix: Rename KafkaPartition to ShardIndex 2022-08-29 14:07:18 -04:00
Dom Dwyer a66d16576d refactor: use dyn TimeProvider in RecordAggregator
For ease of integration with the existing tests, use dyn TimeProvider in
the RecordAggregator.
2022-08-22 12:50:50 +02:00
Dom Dwyer 59c2d84d1e refactor: use RecordAggregator
Replaces the DmlAggregator with the simpler RecordAggregator.

Metrics gathered as part of #5323 shows there is practically no benefit
to the additional complexity of the DmlAggregator over the simpler
RecordAggregator impl.
2022-08-18 17:12:23 +02:00
Dom Dwyer 77fd967517 feat: instrument kafka aggregated DML batch size
The Kafka write buffer implementation (and only the Kafka impl) merges
together successive DML writes for the same namespace & partition within
a window of time.

This commit records the number of DML writes that have been merged
together to form a single batched op before it is dispatched to Kafka.
2022-08-04 16:48:56 +02:00
Carol (Nichols || Goulding) 068096e7e1
fix: Rename data_types2 to data_types 2022-05-06 14:45:39 -04:00
Carol (Nichols || Goulding) 44209faa8e
fix: Move write buffer data types to write_buffer crate 2022-05-06 14:45:38 -04:00
Carol (Nichols || Goulding) afdff2b1db
fix: Move DatabaseName to data_types2 2022-05-06 14:45:37 -04:00
二手掉包工程师 4b47d723b1
refactor: Rename time to iox_time (#4416)
Signed-off-by: hi-rustin <rustin.liu@gmail.com>

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-04-26 00:19:59 +00:00
Raphael Taylor-Davies ca331503a5
feat: add WriteBufferErrorKind (#3664)
* feat: add WriteBufferErrorKind

* fix: test_offset_after_broken_message

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-08 15:34:05 +00:00
Marco Neumann e2db1df11f
refactor: improve writer buffer consumer interface (#3631)
* refactor: improve writer buffer consumer interface

The change looks huge but is actually rather simple. To
understand the interface change, let me first explain what we want:

- be able to fetch watermarks for any sequencer
- have streams:
  - each streams tracks a sequencer and has an offset state (no read
    multiplexing)
  - we can seek a stream
  - seeking and streaming cannot be done at the same time (that would be
    weird and likely leads to many bugs both in write buffer and in the
    user code)
- ideally we don't need to create streams of all sequencers but can
  choose a subset

Before this change we had one mutable consumer struct where you can get
all streams and watermark functions (this mutable-borrows the consumer)
or you can seek a single stream (this also mutable-borrows the
consumer). This is a bit weird for multiple reasons:

- you cannot seek a single stream without dropping all of them
- the mutable-borrow construct makes it really difficult to pass the
  streams into separate threads
- the consumer is boxed (because its mutable) which makes it more
  difficult to handle in a large-scale application

What this change does is the following:

- you have an immutable consumer (similar to the producer)
- the consumer offers the following methods:
  - get the set of sequencer IDs
  - get watermark for any sequencer
  - get a stream handler (see next point) for any sequencer
- the stream handler captures the stream state (offset) and provides you
  a standard `Stream<_>` interface as well as a seek function.
  Mutable-borrows ensure that you cannot use both at the same time.

The stream handler provides you the stream via `handler.stream()`. It
doesn't implement `Stream<_>` itself because the way boxing, dynamic
dispatch work, and pinning interact (i.e. I couldn't get it to work
without the indirection).

As a bonus point (which we don't use however) you can now create
multiple streams for the same sequencer and they all have their own
offset.

* fix: review comments

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-07 12:24:17 +00:00
Marco Neumann 50cff27b01
chore: remove rdkafka dependency (#3625)
All features are now covered by rskafka. This also removes the need to
specify a server ID for write buffer consumers. This was only used for
rdkafka since there we needed to specify a consumer group, even though
we did not use any transactions.
2022-02-03 13:33:56 +00:00
Marco Neumann 9567acd621
feat: expose all relevant configs for rskafka write buffers (#3599)
* feat: expose all relevant configs for rskafka write buffers

* refactor: `CreationConfig` => `TopicCreationConfig`
2022-02-02 09:35:54 +00:00
Marco Neumann b326b62b44
feat: buffer writes when writing to RSKafka (#3520) 2022-02-01 10:07:52 +00:00
Marco Neumann 76dd62a6c2 feat: RSKafka-driven write buffer 2022-01-20 12:36:10 +01:00
Carol (Nichols || Goulding) 87d8f4a85f
fix: Return error instead of panicking if Kafka support is requested but not included
Also add some tests around this behavior.
2021-12-09 10:04:27 -05:00
Carol (Nichols || Goulding) 8c7b3966de
fix: Organize imports 2021-12-09 08:49:34 -05:00
Carol (Nichols || Goulding) 403dcae93c
feat: Put kafka write_buffer code behind a feature flag
Which is off by default. This makes rdkafka optional to minimize
build-time dependencies for users that don't plan on using a Kafka write
buffer.
2021-12-09 08:49:34 -05:00
Marco Neumann 7f2e4f4342 refactor: remove write buffer direction
The direction was required when a database could read or write from/to a
write buffer. Now it is clear from the usage context of a write buffer
context which of the two applications is meant (databases read, routers
write) so the direction flag is no longer required.
2021-11-26 12:38:40 +01:00
Marco Neumann e6fdd79a0f feat: emit Kafka stats as metrics instead of logs
This maps a subset of Kafka stats as metrics. The set can -- of course
-- be changed in the future depending on our needs.

Fixes #3100.
2021-11-16 17:18:41 +01:00
Marco Neumann 0d0c0cb42b refactor: move write buffer configs to new home
Write buffer configs will partially be shared by database and router
nodes, so lets move them into a shared home.
2021-11-02 10:17:01 +01:00
Marco Neumann 6ec0bd5bab feat: file-based write write_buffer
Closes #2849.
2021-10-19 15:26:43 +02:00
Marco Neumann 2850487877 feat: make trace collector in Kafka consumer optional
The whole application might not have a trace collector configured in
which case we don't wanna produce any spans.
2021-10-15 09:20:40 +02:00
kodiakhq[bot] 61ec559eee
Merge branch 'main' into crepererum/write_buffer_span_ctx 2021-10-14 11:50:07 +00:00
Raphael Taylor-Davies e911cf9ac1
refactor: make WriteBufferConfigFactory interior mutable (#2829)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-10-14 10:30:59 +00:00
Marco Neumann 5e06519afb feat: propagate trace information through write buffer 2021-10-14 11:07:41 +02:00
Raphael Taylor-Davies 0554173684
feat: migrate write buffer to TimeProvider (#2722) (#2804)
* feat: migrate write buffer to TimeProvider (#2722)

* chore: review feedback

Co-authored-by: Marco Neumann <marco@crepererum.net>

Co-authored-by: Marco Neumann <marco@crepererum.net>
2021-10-12 10:32:34 +00:00
Raphael Taylor-Davies c33e5c22e6
feat: pull WriteBuffer consumer out of Db and onto Database (#2243) (#2525)
* feat: pull WriteBuffer consumer out of Db and onto Database (#2243)

* chore: restore WritingOnlyAllowedThroughWriteBuffer error

* refactor: remove WriteBufferConfig

* chore: fix docs

* chore: move WriteBufferConsumer tests out of db.rs

* chore: document WriteBufferFactory member functions

* chore: fmt

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-09-14 16:04:58 +00:00
Marco Neumann bbb8898d36 refactor: make writer buffer auto-creation types nicer to read 2021-09-08 11:13:48 +02:00
Marco Neumann 801cf08be7 feat: auto-creation of sequencers by write buffer
For Kafka, that basically means that we create a topic if it doesn't
exist yet.

Closed #2455.
Fixes #2189.
2021-09-07 18:24:57 +02:00
Marco Neumann 924e460bf7 feat: sequencer auto-creation for mocked write buffer 2021-09-07 18:18:20 +02:00
Marco Neumann d5662328b0 refactor: `n_sequencers` should be non-zero 2021-09-07 18:18:20 +02:00
Marco Neumann a63eb53ac5 feat: forward connection config to Kafka write buffer 2021-09-02 16:53:31 +02:00
Marco Neumann ecf1f99ddb refactor: more flexible writer buffer config
This allows:

- different types (instead of guessing through the connection URL)
- sequencer counts (not used yet but will be by #2455)
- extensible configs (e.g. to configure Kafka in a more granular way,
  not wired up yet)
- future extensions (since we use a message now instead of a single
  string)

**BREAKING: This requires changes for deployed systems / existing DBs!**
2021-09-02 16:41:35 +02:00
Marco Neumann 4a3fe01743 test: don't overdo it 2021-08-16 18:31:45 +02:00
Marco Neumann 5caa2ad8ec fix: typo 2021-08-16 18:31:45 +02:00
Marco Neumann 1a7293015b test: allow write buffer mocks that always fail 2021-08-16 18:27:09 +02:00
Marco Neumann a72bacae67 test: use proper ignore instead of commenting out 2021-08-12 11:38:02 +02:00
Marco Neumann a5c74f2798 feat: ability to inject mocked write buffers into server/database 2021-08-12 10:46:16 +02:00
Marco Neumann ec7ebdff29 refactor: use lifetimes to ensure single stream / no seek while streaming 2021-07-20 13:52:33 +02:00
Marco Neumann 592424c896 refactor: use one stream per sequencer/partition
Advantages are:

- for large DBs w/ many partitions we can ingest data in-parallel
- on top of this change we can implement per-sequencer seeking, which is
  required for replay
2021-07-19 12:26:58 +02:00
Marco Neumann 9cb9ae0874 chore: move write buffer into its own crate 2021-07-14 14:09:18 +02:00