* fix: check schemas in `pretty_print_batches`
I think most users of this function (and `assert_batches_eq`) assume
that all batches have the same schema. If not, `pretty_print_batches`
may either fail producing an actual table (some rows may have more or
less columns) or silently produce a table that looks "alright".
* fix: equalize schemas where it is required/desired
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Have a single global test executor w/ reasonable defaults. Also don't
require tests to join/await executor shutdowns (most tests forget this
anyways and will get a runtime warning).
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Reorder all imports in the ingester to match a consistent order:
* stdlib
* external crates
* intra-crate imports
This helps prevent merge conflicts & keeps everything tidy.
Splits out the nested tree of namespace -> tables -> partitions
(referred to as the "buffer tree") from the Shard which previously held
the namespace map.
This allows the BufferTree to exist without a shard, or many trees to
exist within a shard, etc.
* fix: slice flight response batches
Same as #6094 but for the Apache Flight interface.
Ref https://github.com/influxdata/idpe/issues/16073.
* refactor: use `RecordBatch::slice`
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Moves the SequenceNumberRange type out of "data" and into the root to be
reused outside of the data module. This construct is universally useful
across all the ingester code.
Allows different DmlSink implementations to return different error
types. This allows for small, concise errors that are local to the
DmlSink implementation and specific to it. This helps avoid bloated
"kitchen sink" error types.
* feat: create namespace API call in router
Co-authored-by: Nga Tran <nga-tran@live.com>
* chore: treat retention as ns except in CLI
* fix: overflow in nanosecond calc
* fix: retention test after changing it from hours to ns
* chore: comment clarification in cli; better response type for error in ns API
* fix: correct some rebase mistakes
* chore: merge namespace create & create_with_retention; renamed ns create test helper fn & const
* fix: ns autocreation test was wrong after rebase
* fix: mem catalog has default 1hr retention, accidently removed in rebase
* chore: remove mem catalogs default 1hr retention; make it settable in sets & router
Co-authored-by: Luke Bond <luke.n.bond@gmail.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: reject writes that are outside the retention period
* feat: add retention validator into handler stack
* chore: Apply suggestions from code review
Co-authored-by: Dom <dom@itsallbroken.com>
* refactor: address review comments
* test: unit tests fot retention validation
* chore: address review comments
* test: more unit tests and integration tests
* refactor: make time inside retention period for emphemeral_mode test
* fix: 2 hours
Co-authored-by: Dom <dom@itsallbroken.com>
Changes the TableData within the ingester to utilise a TableNameResolver
to fetch the TableName via the catalog on demand / in the background,
instead of using the table name sent over the write.
This change causes the ingester to perform a catalog query in the
background (or on demand) to resolve the table name. This is a
pre-requisite for removing the table name from the write wire format.
Like the NamespaceNameProvider, this commit adds a TableNameProvider to
provide decoupled initialisation of a DeferredLoad<TableName> instead of
hard-coding in a catalog instance / query code, and plumbs it into
position to be used when initialising a TableName.
Changes the buffer tree to address TableData by their ID only (removing
support for addressing tables by their string names). This removes the
double reference book keeping / twin indexes and associated overhead.
As part of this change, the TableName is now wrapped in a DeferredLoad
in preparation for removal of the names in the DmlOperation wire format.
This commit also switches the map of TableData within the NamespaceData
(the parent node) to use the ArcMap for faster lookups and DRY
exactly-once initialisation.
Removes the need to leak the PartitionProvider outside of the ingester
crate.
This will allow the PartitionProvider to utilise a
DeferredLoad<TableName> without having to make the DeferredLoad and
TableName pub.