This commit introduces a new (composable) trait; a NamespaceResolver is
an abstraction responsible for taking a string namespace from a user
request, and mapping to it's catalog ID.
This allows the NamespaceId to be injected through the DmlHandler chain
in addition to the namespace name.
As part of this change, the NamespaceAutocreation layer was changed from
an implementator of the DmlHandler trait, to a NamespaceResolver as it
is a more appropriate abstraction for the functionality it provides.
* feat: Only get files that aren't already on disk with the reported size
* feat: Stream Parquet file bytes to file on disk
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Use the table summary instead. This allows us to have a single mechanism
that both IOx and DataFusion understand. This basically lifts the "basic
table summary" mechanism that the querier uses to `iox_query` and let
the compactor and ingester use the same mechanism.
While not strictly necessary, simplifying the `QueryChunk[Meta]`
interface helps with #5897.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* chore: Keep types in their respective modules
Also adds required documentation now that the individual modules are
public.
* chore: Fix incomplete docs
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Changes the DmlWrite type to require a PartitionKey be specified,
instead of accepting an Option.
This requirement was already in place - the write buffer upheld an
invariant that all writes contained a partition key value (was not
"None") or it panicked at runtime when attempting to enqueue the write.
It is now possible to encode this invariant in the type system, which is
what this change does.
* refactor: enforce name of the one-and-only time column
We currently only support a single time dimension and some parts of
other stack rely on the name of the time column. So lets enforce the
name (note that `schema::try_from_arrow` already checks for duplicate
column, so we are now left with a single dimension).
* refactor: mark a few errors as "internal"
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
This commit makes use of the partition buffer state machine introduced
in https://github.com/influxdata/influxdb_iox/pull/5943.
This commit significantly changes the buffering, and querying, of data
from a partition, swapping out the existing "DataBuffer" for the new
state machine implementation (itself simplified due to temporary lack of
incremental snapshot generation, see #5944).
This commit simplifies the query path, removing multiple types that
wrapped one-another to pass around various state necessary to perform a
query, with various query functions needing different types or
combinations of types. The query path now operates using a single type
(named "QueryAdaptor") that provides a queryable interface over the set
of RecordBatch returned from a partition.
There is significantly increased testing of the PartitionData itself,
covering data in various states and the ordering of returned RecordBatch
(to ensure correct materialisation of updates). There are also
invariants upheld by the type system / compiler to minimise the
complexities of working with empty batches & states, and many asserts
that ensure (mostly existing!) invariants are upheld.
This will be helpful to see when the querier or router is too slow and
we timeout. In contrast to the existing metrics, this also helps w/ log
correlation (i.e. "when did we get stuck").
Closes#5975.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
We basically assume everywhere that a column falls into one of the three
known categories (time, tag, field), so lets encode this in our type
system instead of defining "unknown" as "undefined behavior, may or may
not crash".
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Since we log trace IDs to allow easier correlation of logs no matter
what the `sampled` flag says, we should also parse these logs if we
don't have a tace collector at all.
In practice, this won't make a difference since we always deploy with a
trace collector, but it also makes the code easier to reason about.
Helps with #5975.
* feat: have a logical plan that is aware of no-deduplication
* feat: build physical scan plan that does not do deduplication
* chore: cleaup
* test: logical plans for scan with and without deduplication
* chore: clean up and a small refactor
* refactor: remove asserts on plan and rename make enable_deduplication default
* refactor: rename disable_deduplication to enable_deduplication
* chore: temporarily disable circle filter to build & push PRs
* chore: allow build & push of container image for branches using param
* chore: indentation fix in circle config
* chore: rename build_perf to release_branch
We don't support non-null tags, non-null fields, or nullable timestamps. Let's
just remove this from `schema` so that this never happens on accident.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>