* refactor: have the deduplicate work without chunk statistics
* test: more tests for duplicates data on different combinations of record batches
* refactor: address review comments
* feat: Sequencer wrapper
This type wraps an underlying WriteBufferWriter implementation, tagging
it with a sequencer ID it should use when enqueuing operations to the
buffer.
* feat: mock sharder
Implements a mock Sharder impl that returns pre-configured responses to
shard(), and captures the input to the call.
* feat: sharded write buffer
Implements sharding of ops into an underlying WriteBuffer.
Writes are sharded by some abstract Sharder impl, collated per shard to
maximise the size of each op (and therefore compression efficiency),
converted into a DML operation and then enqueued in parallel to the
underlying WriteBuffer implementation.
Deletes are modelled as being mapped to a single write buffer shard,
which is the case while we support sharding based on the table &
namespace only. Deletes will be extended to support (potentially)
multiple shards when column overrides are implemented.
* refactor: runtime write buffers
Switch from using static dispatch, to using a runtime specified
WriteBufferWriting implementation.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: support line protocol precision parameter (#3522)
* chore: format imports
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
This updates the catalog API to make it easier to work with for consumers. I also found a bug in the MemCatalog implementation while refactoring the tests to work with the new API definition. Consumers will now be able to Arc wrap the catalog and use it across awaits.
Linking of span contexts was introduced in #2803 but the high-level
interface was never used. This adds the missing bits to allow links to
be used with `Span` and `SpanRecorder`.
This commit defines the Sharder trait that should allow us to implement
multiple sharding strategies over a defined set of input types (such as
a MutableBatch for writes, DeletePredicate for deletes, etc).
This commit also includes a jump hash implementation that consistently
shards (table name, namespace) tuples to a given shard for all input
types.
Changes the DmlHandler::delete() trait method to accept required params,
and accept a DeletePredicate instead of a HttpDeleteRequest so that it
can be re-used in the gRPC handler.