Now partition cache entries are smaller, the number of entries held in
memory can be increased - this now uses ~2MiB of memory and drains the
cache during execution, amortising to 0.
* feat: Add a feature flag to switch to the router RPC write path
Fixes#6242.
* refactor: Remove a weird arc clone/rename that's not needed
I'm sure this was needed at some point, but it doesn't make much sense.
I wasn't going to change this, but I'm now trying to minimize the
differences between this function and the write path init function, so
make this one better too.
* fix: Add the namespace autocreation to the RPC write path too
The topic/query pool don't really apply to this case, but use them
anyway to be able to use the existing catalog methods.
Also add a bunch of comments pointing out where the RPC write path
initializer and the old router's initializer are the same and where
they're different, so that perhaps it'll be easier to keep them in sync
while they both exist.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
This commit removes the invariant asserts of monotonicity carried over
from the "ingester" crate - ingester2 does not define any ordering of
writes within the system.
This commit also removes the SequenceNumberRange as it is no longer
useful to indirectly check the equality of two sets of ops -
non-monotonic writes means overlapping ranges does not guarantee a full
set of overlapping operations (gaps may be present). Likewise bounding
values (such as "min persisted") provide no useful meaning in an
out-of-order system.
Document the arbitrary reordering of concurrent writes in an ingester,
and the potential divergence of WAL entries / buffered state.
Also documents that in ingester2, sequence numbers only identify writes,
not their ordering.
Rather than naming WAL files with a UUID, give them a number that
indicates the order they were created in so that they can be read back
in order.
Fixes#6227.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
`RecordBatch` offers zero-copy slicing, so there is no need to store the
row range manually. This makes #6216 simpler.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Adds WalSink, an implementation of the DmlSink trait that commits DML
operations to the write-ahead log before passing the DML op into the
decorated inner DmlSink chain.
By structuring the WAL logic as a decorator, the chain of inner DmlSink
handlers within it are only ever invoked after the WAL commit has
successfully completed, which keeps all the WAL commit code in a
singly-responsible component. This also lets us layer on the WAL commit
logic to the DML sink chain after replaying any existing WAL files,
avoiding a circular WAL mess.
The application-logic level WAL code abstracts over the underlying WAL
implementation & codec through the WalAppender trait. This decouples the
business logic from the WAL implementation both for testing purposes,
and for trying different WAL implementations in the future.
* chore: Update Datafusion and arrow/arrow-flight/parquet to `28.0.0`
* chore: Update thrift to 0.17
* fix: use workspace arrow-flight in ingester2
* chore: Update for API changes
* fix: test
* chore: Update hakari
* chore: Update hakari again
* chore: Update trace_exporters to latest thrift
* fix: update test
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* chore: remove unused/moved ns_autocreation dml handler
* feat(router): expose new ns retention as config
* fix: forgot to set default value for router retention arg
* chore: make new namespace retention param an option
Sequence all gRPC write requests to (internally) order the resulting DML
operations.
These sequence numbers are assigned from a timestamp oracle and passed
through to the downstream DmlSink implementers.
`None` was only used for testing and even than we should probably have a
proper executor instead of panicking for some methods.
Found while working on #6216.