* fix: support InfluxRPC OR-chains w/ arbitrary child nodes
Also convert another assertion regarding child nodes of Eq-nodes into a
proper error.
See https://github.com/influxdata/idpe/issues/16582 .
* test: more tests
* fix: gRPC errors regarding group cols
- missing group col prev. produced an "internal error" but should be
"invalid argument"
- duplicate group cols produced a panic but should also be "invalid
argument"
* docs: clarify
* refactor: DF-driven on-demand mem limit instead of ahead-of-time heuristics
Closes#6310.
* refactor: rename and tune default exec mem limits
* fix: ingester2 bits after rebase
The ingester will replay the ops, and then immediately trigger a
persist! This comment is done :)
Outstanding is actually deleting the replayed files, which will come
with the ref-counted WAL segment change later.
Changes the WAL rotation task to drop the WAL segment after the
partition data has completely persisted.
This logic contains a race condition outlined (at length) in the code
comments, along with a plan to resolve it once I'm back from holiday.
For now, the extremely low likelihood and minor impact of the race
occurring is likely acceptable for testing purposes.
Changes the WAL rotation code to cause the current set of partitions to
be submitted for persistence.
Because rotating the WAL segment and triggering persistence is not
atomic (not under an exclusive lock preventing writes) the persisted
buffers MAY contain writes that appear in the new segment file.
In the happy/non-crash path, this will have no effect, however if the
ingester crashes and replays the WAL files, these writes will be
duplicated into object storage (where compaction will resolve the issue)
- this seems like a good trade-off, allowing us to avoid blocking write
requests for as long as it takes to rotate the WAL and mark all
partitions as persisting.
(though currently WAL files are not dropped and thus everything is
replayed all the time.)
Implements actor-based, parallel persistence in ingester2 with
controllable fan-out parallelism and queue depths.
This implementation encapsulates the complexity of persistence, queuing
and parallelism - the caller simply uses the handle to persist a
partition, while the actor handles fan-out to a set of persistence
workers, compaction in a separate thread-pool, and optional completion
notifications.
By consistently hashing persist jobs onto workers, parallelism is
achieved across partitions, but serialisation of partition persists is
enforced so that the sort key update is correctly serialised.
This commit swaps the existing single "persisting batch" slot (field) in
a PartitionData for an ordered queue of outstanding partitions.
This decouples marking a partition buffer for persistence from the
actual persistence operation, allowing them to proceed at different
rates. This reduces the complexity of persistence management, but also
allows us to gracefully handle "hot" partitions; for example, this
problematic scenario in the ingester(1) implementation during recovery:
* Writes come into a partition, reaching a size/row/hotness limit
* Partition is enqueued for persistence
* More writes come into the new buffer, exceeding the same limit
* Cannot persist the hot buffer because of outstanding persist op
Without this change the only possibilities in this situation are:
* stop ingest for the partition and error all writes that
(partially!) write to the partition, or
* continue accepting writes, allowing the partition to exceed the
limit that marked it for persistence in the first place
The latter is what the ingester(1) implementation does today, which
results in partitions exceeding their row/size/age limits, which exist
to limit the cost of generating the Parquet file from the buffer - this
is a significant contributor to instability during recovery.
This strategy enforces configured the limits on the partition buffer,
but does not block / slow down recovery while persistence is completed.
* fix: check schemas in `pretty_print_batches`
I think most users of this function (and `assert_batches_eq`) assume
that all batches have the same schema. If not, `pretty_print_batches`
may either fail producing an actual table (some rows may have more or
less columns) or silently produce a table that looks "alright".
* fix: equalize schemas where it is required/desired
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Have a single global test executor w/ reasonable defaults. Also don't
require tests to join/await executor shutdowns (most tests forget this
anyways and will get a runtime warning).
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>