`RecordBatch` offers zero-copy slicing, so there is no need to store the
row range manually. This makes #6216 simpler.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Adds WalSink, an implementation of the DmlSink trait that commits DML
operations to the write-ahead log before passing the DML op into the
decorated inner DmlSink chain.
By structuring the WAL logic as a decorator, the chain of inner DmlSink
handlers within it are only ever invoked after the WAL commit has
successfully completed, which keeps all the WAL commit code in a
singly-responsible component. This also lets us layer on the WAL commit
logic to the DML sink chain after replaying any existing WAL files,
avoiding a circular WAL mess.
The application-logic level WAL code abstracts over the underlying WAL
implementation & codec through the WalAppender trait. This decouples the
business logic from the WAL implementation both for testing purposes,
and for trying different WAL implementations in the future.
* chore: Update Datafusion and arrow/arrow-flight/parquet to `28.0.0`
* chore: Update thrift to 0.17
* fix: use workspace arrow-flight in ingester2
* chore: Update for API changes
* fix: test
* chore: Update hakari
* chore: Update hakari again
* chore: Update trace_exporters to latest thrift
* fix: update test
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* chore: remove unused/moved ns_autocreation dml handler
* feat(router): expose new ns retention as config
* fix: forgot to set default value for router retention arg
* chore: make new namespace retention param an option
Sequence all gRPC write requests to (internally) order the resulting DML
operations.
These sequence numbers are assigned from a timestamp oracle and passed
through to the downstream DmlSink implementers.
`None` was only used for testing and even than we should probably have a
proper executor instead of panicking for some methods.
Found while working on #6216.
This commit implements the QueryExec trait for the BufferTree, allow it
to be queried for the partition data it contains. With this change, the
BufferTree now provides "read your writes" functionality.
Notably the implementation streams the contents of individual partitions
to the caller on demand (pull-based execution), deferring acquiring the
partition lock until actually necessary and minimising the duration of
time a strong reference to a specific RecordBatch is held in order to
minimise the memory overhead.
During query execution a client sees a consistent snapshot of
partitions: once a client begins streaming the query response, incoming
writes that create new partitions do not become visible. However
incoming writes to an existing partition that forms part of the snapshot
set become visible iff they are ordered before the acquisition of the
partition lock when streaming that partition data to the client.
Allow the return type of the QueryExec trait's query_exec() method to be
parametrised by the implementer.
This allows the trait to be reused across different data sources that
return differing concrete types.
Changes the query code (taken from the ingester crate) to stream data
for query execution, tidy up unnecessary Result types and removing
unnecessary indirection/boxing.
Previously the query data sourcing would collect the set of RecordBatch
for a query response during execution, prior to sending the data to the
caller. Any data that was dropped or modified during this time meant the
underlying ref-counted data could not be released from memory until all
outstanding queries referencing it completed. When faced with multiple
concurrent queries and ongoing ingest, this meant multiple copies of
data could be held in memory at any one time.
After this commit, data is streamed to the user, minimising the duration
of time a reference to specific partition data is held, and therefore
eliminating the memory overhead of holding onto all the data necessary
for a query for as long as the client takes to read the data.
When combined with an upcoming PR to stream RecordBatch out of the
BufferTree, this should provide performant query execution with minimal
memory overhead, even for a maliciously slow reading client.
Useful because it updates `zstd` to 0.12. With the upcoming `parquet`
update, we can than drop `zstd` 0.11.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* refactor: avoid channels to to create a one-element stream
* refactor: move `StreamWithPermit` into its own module
* refactor: make `QueryCompletedToken` handling stream-based
For #6216.
* docs: improve
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Co-authored-by: Andrew Lamb <alamb@influxdata.com>