RoutingRules such as RoutingConfig and ShardConfig use a sink to decide where to write
the entries.
The write buffer is currently implemented in the `db` and is accessed by using the `write_local_entry`
code path. This PR simply invokes that legacy code path whenever a "kafka" sink is selected.
This allows us immediately to benefit from the ability of the ShardingConfig to select or reject
tables and send some to kafka, some to devnull.
This PR does not allow us yet to split an input batch into mulitiple shards and send each
to a different kafka topic. For that, we'll need to pull out the write buffer code path out of
the `db` and do something similar to a ConnectionManager but for write buffers. TODO
Advantages are:
- for large DBs w/ many partitions we can ingest data in-parallel
- on top of this change we can implement per-sequencer seeking, which is
required for replay
This refactors the write buffer a bit for:
- **Testing:** Add generic tests for the Kafka and the mocking
implementation. The same interface can be used easily add new
implementations (e.g. via Redis, filesystem, ...).
- **Partition on Write:** The caller of the writer operation must now
specify the partition/sequencer ID. The implicit partitioning of the
Kafka writer would have lead to broken data since we must never spill
entries w/ the same primary key over multiple partitions. At the
moment we will only use partition 0 but we can easily implement
better logic in the future.
- **Improved Mocking:** The mocked implementation now simulates a system
that feels more real. Especially the handling around multiple streams
and "write while read" has been improved. This will be helpful for
testing and for new features like seeking (during replay). A solid
realistic mock also helps us to ensure that the tests using the mock
do not rely on unrealistic behavior too much.
When reading from the Kafka write buffer, subscribe to all partitions in
a topic and start from the smallest offset available, instead of
assuming there will only be 1 partition per topic.
A database on one IOx server can, exclusively:
- Not interact with Kafka at all
- Send writes to Kafka
- Read writes from Kafka
Notably, a database on a particular server will never write *and* read from Kafka at the same time.