* feat: introduce a new way of max_sequence_number for ingester, compactor and querier
* chore: cleanup
* feat: new column max_l0_created_at to order files for deduplication
* chore: cleanup
* chore: debug info for chnaging cpu.parquet
* fix: update test parquet file
Co-authored-by: Marco Neumann <marco@crepererum.net>
Record latency histograms for DmlSink::apply() calls, configuring
ingester2 to report the overall write path latency, and separately the
buffer apply latency.
Adds metrics to track the distribution duration spent actively
persisting a batch of partition data (compacting, generating parquet,
uploading, DB entries, etc) and another tracking the duration of time an
entry spent in the persist queue.
Together these provide a measurement of the latency of persist requests,
and as they contain event counters, they also provide the throughput and
number of outstanding jobs.
Changes the persist system to call into an abstract
PersistCompletionObserver after the persist task has completed, but
before releasing the job permit / notifying the enqueuer.
This call happens synchronously, driven by the persist worker to
completion. A sync construct can easily be made async (by enqueuing work
into a channel), but not the other way around, so this gives the best
flexibility.
This trait allows pluggable logic to be inserted into the persist
system, without tightly coupling it to the implementer's logic (for
example, replication). One or more observers may be chained together to
construct an arbitrary sequence of actors.
This commit uses a no-op observer, causing no functional change to the
system.
Adds an integration test of the persist system, covering:
* Node A starts a persist operation
* Node B starts a persist operation for the same partition
* Node A completes, setting the catalog sort key to a new value
* Node B attempts to update the catalog, observing the new sort key
* Node B re-compacts the data, re-uploads, and drives to completion
This scenario is/was tracked in:
https://github.com/influxdata/influxdb_iox/issues/6439
The persist::Context struct carries the data to be persisted, a
reference to the partition from which it came, and various cached fields
to avoid re-acquiring the partition read lock all the time.
Prior to this commit, the Context also had the full persist logic as
methods, invoked by the persist worker. This tightly couples the data &
logic - it's fairly clear a worker should implement the work, and
operate on the data - not commingling the two. I even knew the mess I
was making when I wrote it, but effectively copy-pasted it from
ingester1 because deadlines.
This commit decouples the persist logic from the Context.
The query API exposes a unique-per-instance UUID to allow callers to
detect a crash of the ingester process - this was initialised directly
in the query RPC handler.
This commit turns the bare UUID into a type, and initialises it in the
top-level initialisation of the ingester, plumbing it down into the
query RPC handler.
This allows the UUID to be reused by other components/handlers.
The ingester no longer needs to access a specific PartitionData by ID
(they are addressed either via an iterator over the BufferTree, or
shared by Arc reference).
This allows us to remove the extra map maintaining ID -> PartitionData
references, and the shared access lock protecting it.
Prior to this commit, the (happy path) shutdown sequence of an IOx
process was hard coded to:
1. Stop gRPC & HTTP servers
2. Stop backend server (i.e. ingester2)
After this commit, the execution of step 1 is delegated to the handler
for step 2; the server implementation (router / ingester / querier /
etc) now chooses when to shut down the RPC & HTTP servers.
This allows the server shutdown delegate to correctly sequence the
shutdown of all components of the IOx server. This allows ingester2 to
correctly sequence the shutdown of the query RPC server w.r.t the
graceful stop & persist, ensuring queries continue to be serviced.
Persist all buffered data when gracefully stopping an ingester2
instance.
This implementation accounts for both late-arriving writes, and
concurrent persist tasks - it's carefully constructed in a way that it
can discover the presence of, and wait for, outstanding persist tasks
started by other code without having to know about all the possible
places a persist task can be started (currently WAL rotation & hot
partition persistence, but later also a RPC endpoint).
There exists a small race that seems to be so incredibly unlikely to
occur, I didn't cover off (it would have a RPC write cost for little
gain). This is documented in the code comments.
Prior to this commit, when initialising the persist system it would
return a PersistState instance, used to communicate the saturation
status of the persistence system. The RPC write path used this
information to accept or deny write requests accordingly.
This was unfortunate in that it tightly coupled the ingest handler to
the persist system - in order to initialise the RPC handler, you had to
provide a PersistState; this required us to initialise a persist system
when testing only the RPC handler (which had nothing to do with
persisting). This smells!
This commit inverts the dependency, and decouples the subsystems via a
shared type (IngestState). Instead of the persist system telling ingest
to stop, the ingest system provides a means to be told to stop - this
subtle difference decouples the ingest handler from all components that
need to block ingest. This allows a fast O(1) error state read for N
components and prevents us from having to start N components to test a
RPC handler.
Additionally this commit introduces an unused ingest error state
(GracefulStop) as part of figuring out the API (to be used shortly).
Include the number of DML operations applied to the persisted buffer
in the "persisted partition" message.
Partly because I'm intrigued / it's useful information, and partly to
ensure LLVM doesn't get snazzy and dead-code the sequence number
tracking because it was never read.
Changes the ingester2 buffer FSM to track the sequence numbers that have
been applied to it.
This is a pre-requisite for replication & correct WAL segment dropping.
Previously the ingester(1) required ordered writes to be applied, this
requirement has been relaxed, and the asserts (previously) removed in
ingester2.
* feat: function to get parttion candidates from partition table
* chore: cleanup
* fix: make new_file_at the same value as created_at
* chore: cleanup
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>