* fix(REPL): Don't buffer lines until a trailing semicolon is found
The repl would silently buffer all lines until a trailing semicolon were found which
resulted in some very confusing error messages as I would input invalid commands followed
by a command I thought were valid, except I'd still get an error due to the previous command being buffered.
This uses rustyline's helper feature to detect incomplete input (no trailing semicolon) and makes
it accept multiline input until the input is completed.
I also included some of rustyline's default hint and highlighting while I was at it.
* chore: cargo clippy
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
All features are now covered by rskafka. This also removes the need to
specify a server ID for write buffer consumers. This was only used for
rdkafka since there we needed to specify a consumer group, even though
we did not use any transactions.
* chore: upgrade rskafka + enable snappy support
* test: ensure that rskafka and rdkakfa work together
Before removing rdkafka ensure that:
- rskafka can consume existing messages produced by rdkafka so we do not
need to drain existing topics
- rdkafka can consume new messages produced by rskafka so we can roll
back
I ran the whole `write_buffer` test suite (including the newly added
tests) using Apache Kafka as well as Redpanda.
* test: ensure we handle consumer offset in error case correctly
* docs: explain test setup
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
This adds the scaffolding for the ingester server to consume data from Kafka. This ingests data in an in memory structure while creating records in the catalog for any partitions that don't yet exist.
I've removed catalog_update.rs in ingester for now. That was mostly a placeholder and will be going in a combination of handler.rs and data.rs on my next PR which will have some primitive lifecycle wired up.
There's one ugly bit here where the DML write is cloned because it's getting borrowed to output spans and metrics. I'll need to follow up with a refactor to make it so that the DML write's tables can be consumed without it gumming up the metrics stuff.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Adds an in-memory cache of table schemas to the SchemaValidator DML
handler.
The cache pulls from the global catalog when observing a column for the
first time, and pushes the column type to set it for subsequent requests
if it does not exist (this pull & push is done by atomically by the
catalog in an "upsert" call).
The in-memory cache is sharded by namespace, with each shard guarded by
an individual lock to minimise contention between readers (the expected
average case) and writers (only when adding new columns/tables).
Relies on the catalog to serialise new column creation and validate
parallel creation requests.
Implements a write schema validation DML handler, denying requests that
conflict with the schema within the global catalog. Additive schema
changes are accepted, incrementally updating the global catalog schema.
Deletes are passed through unchanged and unvalidated.
* refactor: Debug bounds on Catalog trait
* feat: validate MutableBatch schema
Changes the schema validation code to validate MutableBatch instances
(coming from a pre-parsed LP write, and non-LP-based writes) instead of
parsed LP lines.
* refactor: Send bound on boxed errors
* refactor: clippy
Allow assert_eq!(bool, bool) for readability.
* refactor: no PartialEq<MB Column> for ColumnSchema
Remove the PartialEq<mutable_buffer::Column> for ColumnSchema - it's
definitely more readable as a method call.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* build: don't pull in all of tokio
We already specify the tokio features we need so "full" (all features)
is not necessary.
* build: remove chrono dependency
Appears unused.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: Implement a snapshot method on DataBuffer
Fixes#3510.
* test: Add a test snapshotting batches with different but compatible schemas
* fix: Simplify min/max sequencer number collection
The first batch should always have the min sequencer number. The last
batch should always have the max sequencer number. The min should always
be less than (or equal to, in case there's only one batch) the max.
- Use a more standard way to setup the tracing subsystem (as described
in tracing-subscriber docs)
- Also capture content from `log` crate
- Play nice w/ Rust's libtest message capture
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: Sequencer wrapper
This type wraps an underlying WriteBufferWriter implementation, tagging
it with a sequencer ID it should use when enqueuing operations to the
buffer.
* feat: mock sharder
Implements a mock Sharder impl that returns pre-configured responses to
shard(), and captures the input to the call.
* feat: sharded write buffer
Implements sharding of ops into an underlying WriteBuffer.
Writes are sharded by some abstract Sharder impl, collated per shard to
maximise the size of each op (and therefore compression efficiency),
converted into a DML operation and then enqueued in parallel to the
underlying WriteBuffer implementation.
Deletes are modelled as being mapped to a single write buffer shard,
which is the case while we support sharding based on the table &
namespace only. Deletes will be extended to support (potentially)
multiple shards when column overrides are implemented.
* refactor: runtime write buffers
Switch from using static dispatch, to using a runtime specified
WriteBufferWriting implementation.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
This commit defines the Sharder trait that should allow us to implement
multiple sharding strategies over a defined set of input types (such as
a MutableBatch for writes, DeletePredicate for deletes, etc).
This commit also includes a jump hash implementation that consistently
shards (table name, namespace) tuples to a given shard for all input
types.