Adds benchmarks that exercise partition pruning during query execution
within the ingester, for varying partition counts within a table, and
varying row counts within each partition.
This adds an integration test that writes some data to the ingester,
waits for the WAL to be rotated and then ensures that the segment file
has been dropped.
* fix(ingester): re-transmit schema over flight if it changes
Fixes https://github.com/influxdata/idpe/issues/17408 .
So a `[Sendable]RecordBatchStream` contains `RecordBatch`es of the SAME
schema. When the ingester crafts a response for a specific partition,
this is also almost always the case however when there's a persist job
running (I think) it may have multiple snapshots for a partition. These
snapshots may have different schemas (since the ingester only creates
columns if the contain any data). Now the current implementation munches
all these snapshots into a single stream, and hands them over to arrow
flight which has a high-perf encode routine (i.e. it does not re-check
every single schema) so it sends the schema once and then sends the data
for every batch (the data only, schema data is NOT repeated). On the
receiver side (= querier) we decode that data and get confused why on
earth some batches have a different column count compared to the schema.
For the OG ingester I carefully crafted the response to ensure that we
do not run into this problem, but apparently a number of rewrites and
refactors broke that. So here is the fix:
- remove the stream that isn't really as stream (and cannot error)
- for each partition go over the `RecordBatch`es and chunk them
according to the schema (because this check is likely cheaper than
re-transmitting the schema for every `RecordBatch`)
- adjust a bunch of testing code to cope with this
* refactor: nicify code
* test: adjust test
Adds a test that asserts (manually triggered) persistence generates a
file, uploads it to object storage, inserts metadata into the catalog,
and emits various persistence metrics.
This commit adds initial support for "soft" namespace deletion, where
the actual records & data remain, but are no longer queryable /
writeable.
Soft deletion is eventually consistent - users can expect to continue
writing to and reading from a bucket after issuing a soft delete call,
until the various components either restart, or have their caches
flushed.
The components treat soft-deleted namespaces differently:
* router: ignore soft deleted namespaces
* ingester: accept soft deleted namespaces
* compactor: accept soft deleted namespaces
* querier: ignore soft deleted namespaces
* various gRPC services: ignore soft deleted namespaces
This ensures that the ingester & compactor do not see rows "vanishing"
from the database, and continue to make forward progress.
Writes for the deleted namespace that are buffered in the ingester will
be persisted as normal, allowing us to support "un-delete" operations
where the system is restored to a the state at which the delete was
issued (rather than loosing the buffered data).
Follow-on work is required to ensure GC drops the orphaned parquet files
after the configured GC time, and optimisations such as not compacting
parquet from soft-deleted namespaces seems like a trivial win.
The maximum number of tables is part of the Namespace, which is already
loaded in its entirety. This commit copies the value into the
NamespaceSchema, making it available for the router to utilise.
Rust 1.67 now says:
warning: `#[track_caller]` on async functions is a no-op
= note: see issue #87417 <https://github.com/rust-lang/rust/issues/87417> for more information
= note: `#[warn(ungated_async_fn_track_caller)]` on by default
Have a single global test executor w/ reasonable defaults. Also don't
require tests to join/await executor shutdowns (most tests forget this
anyways and will get a runtime warning).
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: create namespace API call in router
Co-authored-by: Nga Tran <nga-tran@live.com>
* chore: treat retention as ns except in CLI
* fix: overflow in nanosecond calc
* fix: retention test after changing it from hours to ns
* chore: comment clarification in cli; better response type for error in ns API
* fix: correct some rebase mistakes
* chore: merge namespace create & create_with_retention; renamed ns create test helper fn & const
* fix: ns autocreation test was wrong after rebase
* fix: mem catalog has default 1hr retention, accidently removed in rebase
* chore: remove mem catalogs default 1hr retention; make it settable in sets & router
Co-authored-by: Luke Bond <luke.n.bond@gmail.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: reject writes that are outside the retention period
* feat: add retention validator into handler stack
* chore: Apply suggestions from code review
Co-authored-by: Dom <dom@itsallbroken.com>
* refactor: address review comments
* test: unit tests fot retention validation
* chore: address review comments
* test: more unit tests and integration tests
* refactor: make time inside retention period for emphemeral_mode test
* fix: 2 hours
Co-authored-by: Dom <dom@itsallbroken.com>
* refactor: NS+table ID (instead of name) in querier<>ingester
* feat(ingester): use IDs for query API
Changes the ingester to utilise the ID fields (instead of names) sent
over the query wire message wrapped within the Flight API.
BREAKING: this changes the "query-ingester" CLI command arguments which
now expects the namespace & table IDs, rather than their names.
* refactor(ingester): add more query logging context
Updates the log messages during query execution to include more context
fields.
* style: remove unused import
Co-authored-by: Marco Neumann <marco@crepererum.net>
This commit is part of a two-part change in order to add the table &
namespace IDs to the write buffer wire format. This commit forms the
first half; changing the producer to send the IDs.
In this commit the new ID values are never read on the consumer side,
ensuring there is no consumer dependency on them. This ensures they
remain operational during a rollout, where the consumer may be updated
to the latest code dependent on the IDs before the producer is updated
to send them. This also ensures we have a window of time where where the
consumers can be rolled back after being updated, and still handle
replaying messages in Kafka.
Changes the DmlWrite type to require a PartitionKey be specified,
instead of accepting an Option.
This requirement was already in place - the write buffer upheld an
invariant that all writes contained a partition key value (was not
"None") or it panicked at runtime when attempting to enqueue the write.
It is now possible to encode this invariant in the type system, which is
what this change does.
Asserts write buffer seeking behaviour, including:
* Seeking past already persisted data correctly
* Skipping to next available op in non-contiguous offset stream
* Skipping to next available op for dropped ops due to retention
* Panics when seeking beyond available data (into the future)
Removes a pair of tests that covered some of the above due to their
tight coupling with ingester internals.
This commit adds a new test that exercises all major external APIs of
the ingester:
* Writing data via the write buffer
* Waiting for data to be readable via the progress API
* Querying data and and asserting the contents
This should provide basic integration coverage for the Ingester
internals. This commit also removes a similar test (though with less
coverage) that was tightly coupled to the existing buffering structures.
Adds a test helper type that maintains the in-memory state for a single
ingester integration test, and provides easy-to-use methods to
manipulate and inspect the ingester instance.