Reorder all imports in the ingester to match a consistent order:
* stdlib
* external crates
* intra-crate imports
This helps prevent merge conflicts & keeps everything tidy.
Splits out the nested tree of namespace -> tables -> partitions
(referred to as the "buffer tree") from the Shard which previously held
the namespace map.
This allows the BufferTree to exist without a shard, or many trees to
exist within a shard, etc.
* fix: slice flight response batches
Same as #6094 but for the Apache Flight interface.
Ref https://github.com/influxdata/idpe/issues/16073.
* refactor: use `RecordBatch::slice`
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Moves the SequenceNumberRange type out of "data" and into the root to be
reused outside of the data module. This construct is universally useful
across all the ingester code.
Allows different DmlSink implementations to return different error
types. This allows for small, concise errors that are local to the
DmlSink implementation and specific to it. This helps avoid bloated
"kitchen sink" error types.
* feat: create namespace API call in router
Co-authored-by: Nga Tran <nga-tran@live.com>
* chore: treat retention as ns except in CLI
* fix: overflow in nanosecond calc
* fix: retention test after changing it from hours to ns
* chore: comment clarification in cli; better response type for error in ns API
* fix: correct some rebase mistakes
* chore: merge namespace create & create_with_retention; renamed ns create test helper fn & const
* fix: ns autocreation test was wrong after rebase
* fix: mem catalog has default 1hr retention, accidently removed in rebase
* chore: remove mem catalogs default 1hr retention; make it settable in sets & router
Co-authored-by: Luke Bond <luke.n.bond@gmail.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: reject writes that are outside the retention period
* feat: add retention validator into handler stack
* chore: Apply suggestions from code review
Co-authored-by: Dom <dom@itsallbroken.com>
* refactor: address review comments
* test: unit tests fot retention validation
* chore: address review comments
* test: more unit tests and integration tests
* refactor: make time inside retention period for emphemeral_mode test
* fix: 2 hours
Co-authored-by: Dom <dom@itsallbroken.com>
Changes the TableData within the ingester to utilise a TableNameResolver
to fetch the TableName via the catalog on demand / in the background,
instead of using the table name sent over the write.
This change causes the ingester to perform a catalog query in the
background (or on demand) to resolve the table name. This is a
pre-requisite for removing the table name from the write wire format.
Like the NamespaceNameProvider, this commit adds a TableNameProvider to
provide decoupled initialisation of a DeferredLoad<TableName> instead of
hard-coding in a catalog instance / query code, and plumbs it into
position to be used when initialising a TableName.
Changes the buffer tree to address TableData by their ID only (removing
support for addressing tables by their string names). This removes the
double reference book keeping / twin indexes and associated overhead.
As part of this change, the TableName is now wrapped in a DeferredLoad
in preparation for removal of the names in the DmlOperation wire format.
This commit also switches the map of TableData within the NamespaceData
(the parent node) to use the ArcMap for faster lookups and DRY
exactly-once initialisation.
Removes the need to leak the PartitionProvider outside of the ingester
crate.
This will allow the PartitionProvider to utilise a
DeferredLoad<TableName> without having to make the DeferredLoad and
TableName pub.
Removes reliance on string name identifiers for namespaces in the
ingester buffer tree, reducing the memory usage of the namespace index
and associated overhead.
The namespace name is required (though unused by IOx) in the IoxMetadata
embedded within a parquet file, and therefore the name is necessary at
persist time. For this reason, a DeferredLoad is used to query the
catalog (by ID) for the name, at some uniformly random duration of time
after initialisation of the NamespaceData, up to a maximum of 1 minute
later. This ensures the query remains off the hot ingest path, and the
jitter prevents spikes in catalog load during replay/ingester startup.
As an additional / easy optimisation, the persist code causes a
pre-fetch of the name in the background while compacting, hiding the
query latency should it not have already been resolved.
In order to keep the the ingester buffer & catalog decoupled / easily
testable, this commit uses a provider/factory trait
NamespaceNameProvider and corresponding implementation
(NamespaceNameResolver) in a similar fashion to the PartitionResolver,
allowing easy mocking for tests, and composition for prod code, allowing
future optimisations such as pre-fetching / caching the "hot" namespace
names at startup.
Internal string identifier removal is a pre-requisite for removing
string identifiers from the write wire format (#4880).