The default behavior of the ingester is to panic if the min unpersisted
sequence number in the catalog is unknown to the write buffer due to the
retention policies having evicted that sequence number.
Specifying `--skip-to-oldest-available` changes this behavior to skip to
the oldest sequence number the write buffer does have available and go
from there.
Fixes#4624.
Derive the debug impl so it prints all the fields (specifically the
"number of sequencers configured" is pretty helpful in a test).
Manual impls drift over time and are more effort than the derive!
Fixes symlinking in maybe_auto_create_directories() - previously it
would create a symlink specifying the target path relative to the
working dir, and not relative to the symlink.
If the working dir != the path in the WriteBufferCreationConfig,
subsequent calls would get stuck in an infinite loop attempting to
resolve the bad symlink.
* chore: upgrade rskafka
* refactor: less cloning
* fix: defined behaviour when seeking to an unknown sequence number
The new, defined behavior is: "return an error once and then end the
stream".
Co-authored-by: Edd Robinson <me@edd.io>
Co-authored-by: Edd Robinson <me@edd.io>
For sparse data the PB-encoded data (our Kafka wire format) is way
smaller than the MutableBatch (up to a factor 20). So lets use this one
to estimate the size during batching.
Prod has a larger max msg. size for Kafka (10MB instead of 1MB), but
currently we're unable to wire all the write buffer configs through. As
a quick fix lets hard code the config. This however breaks the write
buffer when running under default Kafka (1MB), so we should reverse this
(tracked under #3723).
When creating a new aggregation span, you MUST NOT just create a new
random span context and put its child span into a span recorder, because
the then only the child will be reported to the trace collector. Instead
create a new root span w/o any parent directly.
This makes jaeger slightly more happy and it won't complain about broken
spans anymore.
* refactor: improve writer buffer consumer interface
The change looks huge but is actually rather simple. To
understand the interface change, let me first explain what we want:
- be able to fetch watermarks for any sequencer
- have streams:
- each streams tracks a sequencer and has an offset state (no read
multiplexing)
- we can seek a stream
- seeking and streaming cannot be done at the same time (that would be
weird and likely leads to many bugs both in write buffer and in the
user code)
- ideally we don't need to create streams of all sequencers but can
choose a subset
Before this change we had one mutable consumer struct where you can get
all streams and watermark functions (this mutable-borrows the consumer)
or you can seek a single stream (this also mutable-borrows the
consumer). This is a bit weird for multiple reasons:
- you cannot seek a single stream without dropping all of them
- the mutable-borrow construct makes it really difficult to pass the
streams into separate threads
- the consumer is boxed (because its mutable) which makes it more
difficult to handle in a large-scale application
What this change does is the following:
- you have an immutable consumer (similar to the producer)
- the consumer offers the following methods:
- get the set of sequencer IDs
- get watermark for any sequencer
- get a stream handler (see next point) for any sequencer
- the stream handler captures the stream state (offset) and provides you
a standard `Stream<_>` interface as well as a seek function.
Mutable-borrows ensure that you cannot use both at the same time.
The stream handler provides you the stream via `handler.stream()`. It
doesn't implement `Stream<_>` itself because the way boxing, dynamic
dispatch work, and pinning interact (i.e. I couldn't get it to work
without the indirection).
As a bonus point (which we don't use however) you can now create
multiple streams for the same sequencer and they all have their own
offset.
* fix: review comments
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
All features are now covered by rskafka. This also removes the need to
specify a server ID for write buffer consumers. This was only used for
rdkafka since there we needed to specify a consumer group, even though
we did not use any transactions.
* chore: upgrade rskafka + enable snappy support
* test: ensure that rskafka and rdkakfa work together
Before removing rdkafka ensure that:
- rskafka can consume existing messages produced by rdkafka so we do not
need to drain existing topics
- rdkafka can consume new messages produced by rskafka so we can roll
back
I ran the whole `write_buffer` test suite (including the newly added
tests) using Apache Kafka as well as Redpanda.
* test: ensure we handle consumer offset in error case correctly
* docs: explain test setup
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>